Research Agendas

TagLast edit: 16 Sep 2021 15:08 UTC by plex

Research Agendas lay out the areas of research which individuals or groups are working on, or those that they believe would be valuable for others to work on. They help make research more legible and encourage discussion of priorities.

The Learning-Theoretic AI Alignment Research Agenda

Vanessa Kosoy4 Jul 2018 9:53 UTC

92 points

37 comments32 min readLW link

New safety research agenda: scalable agent alignment via reward modeling

Vika20 Nov 2018 17:29 UTC

34 points

12 comments1 min readLW link

(medium.com)

Embedded Agents

abramdemski and Scott Garrabrant

29 Oct 2018 19:53 UTC

222 points

41 comments1 min readLW link 2 reviews

On how various plans miss the hard bits of the alignment challenge

So8res12 Jul 2022 2:49 UTC

304 points

88 comments29 min readLW link 3 reviews

Research Agenda v0.9: Synthesising a human’s preferences into a utility function

Stuart_Armstrong17 Jun 2019 17:46 UTC

70 points

26 comments33 min readLW link

AI Governance: A Research Agenda

habryka5 Sep 2018 18:00 UTC

25 points

3 comments1 min readLW link

(www.fhi.ox.ac.uk)

Paul’s research agenda FAQ

zhukeepa1 Jul 2018 6:25 UTC

126 points

74 comments19 min readLW link 1 review

Our take on CHAI’s research agenda in under 1500 words

Alex Flint17 Jun 2020 12:24 UTC

112 points

18 comments5 min readLW link

the QACI alignment plan: table of contents

Tamsin Leake21 Mar 2023 20:22 UTC

106 points

1 comment1 min readLW link

(carado.moe)

An overview of 11 proposals for building safe advanced AI

evhub29 May 2020 20:38 UTC

211 points

36 comments38 min readLW link 2 reviews

AISC Project: Modelling Trajectories of Language Models

NickyP13 Nov 2023 14:33 UTC

26 points

0 comments12 min readLW link

Trying to isolate objectives: approaches toward high-level interpretability

Jozdien9 Jan 2023 18:33 UTC

48 points

14 comments8 min readLW link

The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda

Cameron Berg, Judd Rosenblatt, AE Studio and Marc Carauleanu

18 Dec 2023 20:35 UTC

160 points

20 comments12 min readLW link

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

15 Nov 2018 19:49 UTC

184 points

17 comments54 min readLW link

Deconfusing Human Values Research Agenda v1

Gordon Seidoh Worley23 Mar 2020 16:25 UTC

28 points

12 comments4 min readLW link

Thoughts on Human Models

Ramana Kumar and Scott Garrabrant

21 Feb 2019 9:10 UTC

126 points

32 comments10 min readLW link 1 review

Some conceptual alignment research projects

Richard_Ngo25 Aug 2022 22:51 UTC

174 points

15 comments3 min readLW link

The Learning-Theoretic Agenda: Status 2023

Vanessa Kosoy19 Apr 2023 5:21 UTC

135 points

13 comments55 min readLW link

Research agenda update

Steven Byrnes6 Aug 2021 19:24 UTC

55 points

40 comments7 min readLW link

Preface to CLR’s Research Agenda on Cooperation, Conflict, and TAI

JesseClifton13 Dec 2019 21:02 UTC

62 points

10 comments2 min readLW link

MIRI’s technical research agenda

So8res23 Dec 2014 18:45 UTC

55 points

52 comments3 min readLW link

Davidad’s Bold Plan for Alignment: An In-Depth Explanation

Charbel-Raphaël and Gabin

19 Apr 2023 16:09 UTC

157 points

33 comments21 min readLW link

Announcing the Alignment of Complex Systems Research Group

Jan_Kulveit and technicalities

4 Jun 2022 4:10 UTC

91 points

20 comments5 min readLW link

New year, new research agenda post

Charlie Steiner12 Jan 2022 17:58 UTC

29 points

4 comments16 min readLW link

Key Questions for Digital Minds

Jacy Reese Anthis22 Mar 2023 17:13 UTC

22 points

0 comments7 min readLW link

(www.sentienceinstitute.org)

Theories of impact for Science of Deep Learning

Marius Hobbhahn1 Dec 2022 14:39 UTC

21 points

0 comments11 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC

93 points

30 comments9 min readLW link

The space of systems and the space of maps

Jan_Kulveit, rosehadshar, Nora_Ammann and clem_acs

22 Mar 2023 14:59 UTC

39 points

0 comments5 min readLW link

Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs and Gurkenglas

3 Sep 2020 18:27 UTC

68 points

11 comments2 min readLW link

Sparsify: A mechanistic interpretability research agenda

Lee Sharkey3 Apr 2024 12:34 UTC

93 points

22 comments22 min readLW link

a narrative explanation of the QACI alignment plan

Tamsin Leake15 Feb 2023 3:28 UTC

56 points

29 comments6 min readLW link

(carado.moe)

Selection Theorems: A Program For Understanding Agents

johnswentworth28 Sep 2021 5:03 UTC

123 points

28 comments6 min readLW link 2 reviews

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasper21 May 2024 20:15 UTC

155 points

16 comments3 min readLW link

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

20 Dec 2023 17:11 UTC

21 points

8 comments16 min readLW link

The Shortest Path Between Scylla and Charybdis

Thane Ruthenis18 Dec 2023 20:08 UTC

50 points

8 comments5 min readLW link

Announcing Human-aligned AI Summer School

Jan_Kulveit and Tomáš Gavenčiak

22 May 2024 8:55 UTC

50 points

0 comments1 min readLW link

(humanaligned.ai)

Assessment of AI safety agendas: think about the downside risk

Roman Leventov19 Dec 2023 9:00 UTC

13 points

1 comment1 min readLW link

The Plan − 2023 Version

johnswentworth29 Dec 2023 23:34 UTC

146 points

39 comments31 min readLW link

Research Jan/Feb 2024

Stephen Fowler1 Jan 2024 6:02 UTC

8 points

0 comments2 min readLW link

Four visions of Transformative AI success

Steven Byrnes17 Jan 2024 20:45 UTC

112 points

22 comments15 min readLW link

Worrisome misunderstanding of the core issues with AI transition

Roman Leventov18 Jan 2024 10:05 UTC

5 points

2 comments4 min readLW link

Gradient Descent on the Human Brain

Jozdien and gaspode

1 Apr 2024 22:39 UTC

46 points

4 comments2 min readLW link

Constructability: Plainly-coded AGIs may be feasible in the near future

Épiphanie Gédéon and Charbel-Raphaël

27 Apr 2024 16:04 UTC

72 points

12 comments13 min readLW link

What and Why: Developmental Interpretability of Reinforcement Learning

Garrett Baker9 Jul 2024 14:09 UTC

66 points

3 comments6 min readLW link

Ultra-simplified research agenda

Stuart_Armstrong22 Nov 2019 14:29 UTC

34 points

4 comments1 min readLW link

Embedded Curiosities

Scott Garrabrant and abramdemski

8 Nov 2018 14:19 UTC

91 points

1 comment2 min readLW link

Subsystem Alignment

abramdemski and Scott Garrabrant

6 Nov 2018 16:16 UTC

99 points

12 comments1 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

4 Nov 2018 16:38 UTC

116 points

10 comments1 min readLW link

Embedded World-Models

abramdemski and Scott Garrabrant

2 Nov 2018 16:07 UTC

95 points

16 comments1 min readLW link

Decision Theory

abramdemski and Scott Garrabrant

31 Oct 2018 18:41 UTC

119 points

45 comments1 min readLW link

Research agenda: Supervising AIs improving AIs

Quintin Pope, Owen D, Roman Engeler and jacquesthibs

29 Apr 2023 17:09 UTC

76 points

5 comments19 min readLW link

Deep Forgetting & Unlearning for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC

112 points

29 comments13 min readLW link

Sections 1 & 2: Introduction, Strategy and Governance

JesseClifton17 Dec 2019 21:27 UTC

35 points

8 comments14 min readLW link

Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms

JesseClifton17 Dec 2019 21:46 UTC

20 points

2 comments12 min readLW link

Sections 5 & 6: Contemporary Architectures, Humans in the Loop

JesseClifton20 Dec 2019 3:52 UTC

27 points

4 comments10 min readLW link

Section 7: Foundations of Rational Agency

JesseClifton22 Dec 2019 2:05 UTC

14 points

4 comments8 min readLW link

Acknowledgements & References

JesseClifton14 Dec 2019 7:04 UTC

6 points

0 comments14 min readLW link

Alignment proposals and complexity classes

evhub16 Jul 2020 0:27 UTC

40 points

26 comments13 min readLW link

Orthogonal’s Formal-Goal Alignment theory of change

Tamsin Leake5 May 2023 22:36 UTC

68 points

12 comments4 min readLW link

(carado.moe)

The Goodhart Game

John_Maxwell18 Nov 2019 23:22 UTC

13 points

5 comments5 min readLW link

[Linkpost] Interpretability Dreams

DanielFilan24 May 2023 21:08 UTC

39 points

2 comments2 min readLW link

(transformer-circuits.pub)

My AI Alignment Research Agenda and Threat Model, right now (May 2023)

Nicholas / Heather Kross28 May 2023 3:23 UTC

25 points

0 comments6 min readLW link

(www.thinkingmuchbetter.com)

Abstraction is Bigger than Natural Abstraction

Nicholas / Heather Kross31 May 2023 0:00 UTC

18 points

0 comments5 min readLW link

(www.thinkingmuchbetter.com)

[Question] Does anyone’s full-time job include reading and understanding all the most-promising formal AI alignment work?

Nicholas / Heather Kross16 Jun 2023 2:24 UTC

15 points

2 comments1 min readLW link

My research agenda in agent foundations

Alex_Altair28 Jun 2023 18:00 UTC

70 points

9 comments11 min readLW link

My Alignment Timeline

Nicholas / Heather Kross3 Jul 2023 1:04 UTC

22 points

0 comments2 min readLW link

My Central Alignment Priority (2 July 2023)

Nicholas / Heather Kross3 Jul 2023 1:46 UTC

12 points

1 comment3 min readLW link

Immobile AI makes a move: anti-wireheading, ontology change, and model splintering

Stuart_Armstrong17 Sep 2021 15:24 UTC

32 points

3 comments2 min readLW link

Testing The Natural Abstraction Hypothesis: Project Update

johnswentworth20 Sep 2021 3:44 UTC

87 points

17 comments8 min readLW link 1 review

AI, learn to be conservative, then learn to be less so: reducing side-effects, learning preserved features, and going beyond conservatism

Stuart_Armstrong20 Sep 2021 11:56 UTC

14 points

4 comments3 min readLW link

The Plan

johnswentworth10 Dec 2021 23:41 UTC

254 points

78 comments14 min readLW link 1 review

Paradigm-building: Introduction

Cameron Berg8 Feb 2022 0:06 UTC

28 points

0 comments2 min readLW link

Acceptability Verification: A Research Agenda

David Udell and evhub

12 Jul 2022 20:11 UTC

50 points

0 comments1 min readLW link

(docs.google.com)

(My understanding of) What Everyone in Technical Alignment is Doing and Why

Thomas Larsen and elifland

29 Aug 2022 1:23 UTC

412 points

90 comments38 min readLW link 1 review

Distilled Representations Research Agenda

Hoagy and mishajw

18 Oct 2022 20:59 UTC

15 points

2 comments8 min readLW link

My AGI safety research—2022 review, ’23 plans

Steven Byrnes14 Dec 2022 15:15 UTC

51 points

10 comments7 min readLW link

An overview of some promising work by junior alignment researchers

Akash26 Dec 2022 17:23 UTC

34 points

0 comments4 min readLW link

World-Model Interpretability Is All We Need

Thane Ruthenis14 Jan 2023 19:37 UTC

35 points

22 comments21 min readLW link

Why I’m not working on {debate, RRM, ELK, natural abstractions}

Steven Byrnes10 Feb 2023 19:22 UTC

71 points

19 comments9 min readLW link

Remarks 1–18 on GPT (compressed)

Cleo Nardo20 Mar 2023 22:27 UTC

146 points

35 comments31 min readLW link

Research Agenda in reverse: what would a solution look like?

Stuart_Armstrong25 Jun 2019 13:52 UTC

35 points

25 comments1 min readLW link

Forecasting AI Progress: A Research Agenda

rossg and axioman

10 Aug 2020 1:04 UTC

39 points

4 comments1 min readLW link

Technical AGI safety research outside AI

Richard_Ngo18 Oct 2019 15:00 UTC

43 points

3 comments3 min readLW link

Why I am not currently working on the AAMLS agenda

jessicata1 Jun 2017 17:57 UTC

28 points

3 comments5 min readLW link

Inference from a Mathematical Description of an Existing Alignment Research: a proposal for an outer alignment research program

Christopher King2 Jun 2023 21:54 UTC

7 points

4 comments16 min readLW link

EIS XII: Summary

scasper23 Feb 2023 17:45 UTC

17 points

0 comments6 min readLW link

The AI Control Problem in a wider intellectual context

philosophybear13 Jan 2023 0:28 UTC

11 points

3 comments12 min readLW link

A Multidisciplinary Approach to Alignment (MATA) and Archetypal Transfer Learning (ATL)

MiguelDev19 Jun 2023 2:32 UTC

4 points

2 comments7 min readLW link

Gaia Network: An Illustrated Primer

Rafael Kaufmann Nedal and Roman Leventov

18 Jan 2024 18:23 UTC

3 points

2 comments15 min readLW link

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarks17 Jun 2023 13:55 UTC

16 points

0 comments10 min readLW link

[UPDATE: deadline extended to July 24!] New wind in rationality’s sails: Applications for Epistea Residency 2023 are now open

Jana Meixnerová and Irena Kotíková

11 Jul 2023 11:02 UTC

80 points

7 comments3 min readLW link

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC

53 points

0 comments59 min readLW link

Research agenda: Formalizing abstractions of computations

Erik Jenner2 Feb 2023 4:29 UTC

92 points

10 comments31 min readLW link

Introducing Leap Labs, an AI interpretability startup

Jessica Rumbelow6 Mar 2023 16:16 UTC

100 points

12 comments1 min readLW link

Which of these five AI alignment research projects ideas are no good?

rmoehn8 Aug 2019 7:17 UTC

25 points

13 comments1 min readLW link

Funding Good Research

lukeprog27 May 2012 6:41 UTC

38 points

44 comments2 min readLW link

The Löbian Obstacle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC

18 points

6 comments2 min readLW link

Please voice your support for stem cell research

zaph22 May 2009 18:45 UTC

−5 points

4 comments1 min readLW link

Early Experiments in Reward Model Interpretation Using Sparse Autoencoders

lukemarks, Amirali Abdullah, Rauno Arike, Fazl and nothoughtsheadempty

3 Oct 2023 7:45 UTC

16 points

0 comments5 min readLW link

Thoughts On (Solving) Deep Deception

Jozdien21 Oct 2023 22:40 UTC

66 points

2 comments6 min readLW link

Notes on effective-altruism-related research, writing, testing fit, learning, and the EA Forum

MichaelA28 Mar 2021 23:43 UTC

14 points

0 comments4 min readLW link

Labor Participation is a High-Priority AI Alignment Risk

alex17 Jun 2024 18:09 UTC

4 points

0 comments17 min readLW link

The Metaethics and Normative Ethics of AGI Value Alignment: Many Questions, Some Implications

Eleos Arete Citrini16 Sep 2021 16:13 UTC

6 points

0 comments8 min readLW link

A multi-disciplinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC

43 points

4 comments26 min readLW link

AI Safety in a World of Vulnerable Machine Learning Systems

AdamGleave and EuanMcLean

8 Mar 2023 2:40 UTC

70 points

28 comments29 min readLW link

(far.ai)

EIS IV: A Spotlight on Feature Attribution/Saliency

scasper15 Feb 2023 18:46 UTC

19 points

1 comment4 min readLW link

EIS II: What is “Interpretability”?

scasper9 Feb 2023 16:48 UTC

28 points

6 comments4 min readLW link

AI learns betrayal and how to avoid it

Stuart_Armstrong30 Sep 2021 9:39 UTC

30 points

4 comments2 min readLW link

A FLI postdoctoral grant application: AI alignment via causal analysis and design of agents

PabloAMC13 Nov 2021 1:44 UTC

4 points

0 comments7 min readLW link

Framing approaches to alignment and the hard problem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC

16 points

15 comments27 min readLW link

EIS III: Broad Critiques of Interpretability Research

scasper14 Feb 2023 18:24 UTC

20 points

2 comments11 min readLW link

An Open Philanthropy grant proposal: Causal representation learning of human preferences

PabloAMC11 Jan 2022 11:28 UTC

19 points

6 comments8 min readLW link

[Question] What should I do? (long term plan about starting an AI lab)

not_a_cat9 Jun 2024 0:45 UTC

2 points

1 comment2 min readLW link

Paradigm-building: The hierarchical question framework

Cameron Berg9 Feb 2022 16:47 UTC

11 points

15 comments3 min readLW link

Question 1: Predicted architecture of AGI learning algorithm(s)

Cameron Berg10 Feb 2022 17:22 UTC

13 points

1 comment7 min readLW link

Question 2: Predicted bad outcomes of AGI learning architecture

Cameron Berg11 Feb 2022 22:23 UTC

5 points

1 comment10 min readLW link

Question 3: Control proposals for minimizing bad outcomes

Cameron Berg12 Feb 2022 19:13 UTC

5 points

1 comment7 min readLW link

Question 5: The timeline hyperparameter

Cameron Berg14 Feb 2022 16:38 UTC

8 points

3 comments7 min readLW link

Paradigm-building: Conclusion and practical takeaways

Cameron Berg15 Feb 2022 16:11 UTC

5 points

1 comment2 min readLW link

What should AI safety be trying to achieve?

EuanMcLean23 May 2024 11:17 UTC

16 points

0 comments13 min readLW link

Elicit: Language Models as Research Assistants

stuhlmueller and jungofthewon

9 Apr 2022 14:56 UTC

71 points

6 comments13 min readLW link

The Prop-room and Stage Cognitive Architecture

Robert Kralisch29 Apr 2024 0:48 UTC

13 points

4 comments14 min readLW link

EIS V: Blind Spots In AI Safety Interpretability Research

scasper16 Feb 2023 19:09 UTC

54 points

23 comments10 min readLW link

Conditioning Generative Models for Alignment

Jozdien18 Jul 2022 7:11 UTC

59 points

8 comments20 min readLW link

How I think about alignment

Linda Linsefors13 Aug 2022 10:01 UTC

31 points

11 comments5 min readLW link

Towards White Box Deep Learning

Maciej Satkiewicz27 Mar 2024 18:20 UTC

17 points

5 comments1 min readLW link

(arxiv.org)

EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety

scasper17 Feb 2023 20:48 UTC

49 points

9 comments12 min readLW link

Shard Theory: An Overview

David Udell11 Aug 2022 5:44 UTC

162 points

34 comments10 min readLW link

Eliciting Latent Knowledge (ELK) - Distillation/Summary

Marius Hobbhahn8 Jun 2022 13:18 UTC

69 points

2 comments21 min readLW link

[Question] How can we secure more research positions at our universities for x-risk researchers?

Neil Crawford6 Sep 2022 17:17 UTC

11 points

0 comments1 min readLW link

Alignment Org Cheat Sheet

Akash and Thomas Larsen

20 Sep 2022 17:36 UTC

70 points

8 comments4 min readLW link

Generative, Episodic Objectives for Safe AI

Michael Glass5 Oct 2022 23:18 UTC

11 points

3 comments8 min readLW link

Science of Deep Learning—a technical agenda

Marius Hobbhahn18 Oct 2022 14:54 UTC

36 points

7 comments4 min readLW link

EIS VII: A Challenge for Mechanists

scasper18 Feb 2023 18:27 UTC

35 points

4 comments3 min readLW link

AI researchers announce NeuroAI agenda

Cameron Berg24 Oct 2022 0:14 UTC

37 points

12 comments6 min readLW link

(arxiv.org)

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

27 Oct 2022 1:32 UTC

135 points

14 comments12 min readLW link

For alignment, we should simultaneously use multiple theories of cognition and value

Roman Leventov24 Apr 2023 10:37 UTC

23 points

5 comments5 min readLW link

AI Existential Safety Fellowships

mmfli28 Oct 2023 18:07 UTC

5 points

0 comments1 min readLW link

Trying to understand John Wentworth’s research agenda

johnswentworth, habryka and David Lorell

20 Oct 2023 0:05 UTC

92 points

12 comments12 min readLW link

Towards empathy in RL agents and beyond: Insights from cognitive science for AI Alignment

Marc Carauleanu3 Apr 2023 19:59 UTC

15 points

6 comments1 min readLW link

(clipchamp.com)

AISC project: TinyEvals

Jett22 Nov 2023 20:47 UTC

22 points

0 comments4 min readLW link

All life’s helpers’ beliefs

Tehdastehdas28 Oct 2022 5:47 UTC

−12 points

1 comment5 min readLW link

AISC 2024 - Project Summaries

NickyP27 Nov 2023 22:32 UTC

48 points

3 comments18 min readLW link

Reinforcement Learning using Layered Morphology (RLLM)

MiguelDev1 Dec 2023 5:18 UTC

7 points

0 comments29 min readLW link

A call for a quantitative report card for AI bioterrorism threat models

Juno4 Dec 2023 6:35 UTC

12 points

0 comments10 min readLW link

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

41 points

0 comments5 min readLW link

(far.ai)

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetillo4 Dec 2023 22:58 UTC

36 points

0 comments35 min readLW link

My summary of “Pragmatic AI Safety”

Eleni Angelou5 Nov 2022 12:54 UTC

3 points

0 comments5 min readLW link

Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research

evhub, Nicholas Schiefer, Carson Denison and Ethan Perez

8 Aug 2023 1:30 UTC

308 points

26 comments18 min readLW link

EIS VIII: An Engineer’s Understanding of Deceptive Alignment

scasper19 Feb 2023 15:25 UTC

30 points

5 comments4 min readLW link

EIS IX: Interpretability and Adversaries

scasper20 Feb 2023 18:25 UTC

30 points

7 comments8 min readLW link

Natural abstractions are observer-dependent: a conversation with John Wentworth

Martín Soto12 Feb 2024 17:28 UTC

38 points

13 comments7 min readLW link

EIS X: Continual Learning, Modularity, Compression, and Biological Brains

scasper21 Feb 2023 16:59 UTC

14 points

4 comments3 min readLW link

Announcing: The Independent AI Safety Registry

Shoshannah Tekofsky26 Dec 2022 21:22 UTC

53 points

9 comments1 min readLW link

Resources for AI Alignment Cartography

Gyrodiot4 Apr 2020 14:20 UTC

45 points

8 comments9 min readLW link

Introducing the Longevity Research Institute

sarahconstantin8 May 2018 3:30 UTC

54 points

20 comments1 min readLW link

(srconstantin.wordpress.com)

Announcement: AI alignment prize round 3 winners and next round

cousin_it15 Jul 2018 7:40 UTC

93 points

7 comments1 min readLW link

Machine Learning Projects on IDA

Owain_Evans, William_S and stuhlmueller

24 Jun 2019 18:38 UTC

49 points

3 comments2 min readLW link

AI Alignment Research Overview (by Jacob Steinhardt)

Ben Pace6 Nov 2019 19:24 UTC

44 points

0 comments7 min readLW link

(docs.google.com)

Creating Welfare Biology: A Research Proposal

ozymandias16 Nov 2017 19:06 UTC

20 points

5 comments4 min readLW link

[Question] Research ideas (AI Interpretability & Neurosciences) for a 2-months project

flux8 Jan 2023 15:36 UTC

3 points

1 comment1 min readLW link

Annotated reply to Bengio’s “AI Scientists: Safe and Useful AI?”

Roman Leventov8 May 2023 21:26 UTC

18 points

2 comments7 min readLW link

(yoshuabengio.org)

H-JEPA might be technically alignable in a modified form

Roman Leventov8 May 2023 23:04 UTC

12 points

2 comments7 min readLW link

Roadmap for a collaborative prototype of an Open Agency Architecture

Deger Turan10 May 2023 17:41 UTC

30 points

0 comments12 min readLW link

Notes on the importance and implementation of safety-first cognitive architectures for AI

Brendon_Wong11 May 2023 10:03 UTC

3 points

0 comments3 min readLW link

EIS XI: Moving Forward

scasper22 Feb 2023 19:05 UTC

19 points

2 comments9 min readLW link

No comments.

Re­search Agendas

Research Agendas