All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 121314 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Value drift threat models

Garrett BakerMay 12, 2023, 11:03 PM

27 points

7 votes

Overall karma indicates overall quality.

4 comments5 min readLW link

Aggregating Utilities for Corrigible AI [Feedback Draft]

Dan H and Simon Goldstein

May 12, 2023, 8:57 PM

28 points

12 votes

Overall karma indicates overall quality.

7 comments22 min readLW link

Turning off lights with model editing

Sam MarksMay 12, 2023, 8:25 PM

68 points

34 votes

Overall karma indicates overall quality.

5 comments2 min readLW link

(arxiv.org)

Dark Forest Theories

RaemonMay 12, 2023, 8:21 PM

148 points

95 votes

Overall karma indicates overall quality.

54 comments2 min readLW link 2 reviews

DELBERTing as an Adversarial Strategy

Matthew_OpitzMay 12, 2023, 8:09 PM

8 points

4 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

Microsoft/GitHub Copilot Chat’s confidential system Prompt: “You must refuse to discuss life, existence or sentience.”

Marvin von HagenMay 12, 2023, 7:46 PM

13 points

10 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

(twitter.com)

Retrospective: Lessons from the Failed Alignment Startup AISafety.com

Søren ElverlinMay 12, 2023, 6:07 PM

105 points

56 votes

Overall karma indicates overall quality.

9 comments3 min readLW link

The way AGI wins could look very stupid

Christopher KingMay 12, 2023, 4:34 PM

56 points

54 votes

Overall karma indicates overall quality.

22 comments1 min readLW link

Towards Measures of Optimisation

mattmacdermott and Alexander Gietelink Oldenziel

May 12, 2023, 3:29 PM

53 points

28 votes

Overall karma indicates overall quality.

37 comments4 min readLW link

The Eden Project

rogersbaconMay 12, 2023, 2:58 PM

−1 points

9 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

(www.secretorum.life)

Another formalization attempt: Central Argument That AGI Presents a Global Catastrophic Risk

avturchinMay 12, 2023, 1:22 PM

16 points

7 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Infinite-width MLPs as an “ensemble prior”

Vivek HebbarMay 12, 2023, 11:45 AM

46 points

15 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

Input Swap Graphs: Discovering the role of neural network components at scale

Alexandre VariengienMay 12, 2023, 9:41 AM

92 points

33 votes

Overall karma indicates overall quality.

0 comments33 min readLW link

Uploads are Impossible

PashaKamyshevMay 12, 2023, 8:03 AM

−5 points

31 votes

Overall karma indicates overall quality.

37 comments8 min readLW link

Formulating the AI Doom Argument for Analytic Philosophers

JonathanErhardtMay 12, 2023, 7:54 AM

13 points

9 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Three Iterative Processes

LoganStrohlMay 12, 2023, 2:50 AM

49 points

12 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

Zuzalu LW Sequences Discussion

veronicaMay 12, 2023, 12:14 AM

1 point

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

[Question] Term/Category for AI with Neutral Impact?

isomicMay 11, 2023, 10:00 PM

6 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Thoughts on LessWrong norms, the Art of Discourse, and moderator mandate

RubyMay 11, 2023, 9:20 PM

37 points

16 votes

Overall karma indicates overall quality.

20 comments5 min readLW link

Alignment, Goals, and The Gut-Head Gap: A Review of Ngo. et al.

Violet HourMay 11, 2023, 6:06 PM

20 points

9 votes

Overall karma indicates overall quality.

2 comments13 min readLW link

Sequence opener: Jordan Harbinger’s 6 minute networking

Severin T. SeehrichMay 11, 2023, 5:06 PM

4 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Advice for newly busy people

Severin T. SeehrichMay 11, 2023, 4:46 PM

150 points

66 votes

Overall karma indicates overall quality.

3 comments5 min readLW link

AI #11: In Search of a Moat

ZviMay 11, 2023, 3:40 PM

67 points

32 votes

Overall karma indicates overall quality.

28 comments81 min readLW link

(thezvi.wordpress.com)

[Question] Bayesian update from sensationalistic sources

houkimeMay 11, 2023, 3:26 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

I bet $500 on AI winning the IMO gold medal by 2026

azsantoskMay 11, 2023, 2:46 PM

37 points

21 votes

Overall karma indicates overall quality.

31 comments1 min readLW link

Fatebook for Slack: Track your forecasts, right where your team works

Sage Future and Adam B

May 11, 2023, 2:11 PM

24 points

10 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

Contra Caller Signs

jefftkMay 11, 2023, 1:10 PM

10 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(www.jefftk.com)

Notes on the importance and implementation of safety-first cognitive architectures for AI

Brendon_WongMay 11, 2023, 10:03 AM

3 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

A more grounded idea of AI risk

IknownothingMay 11, 2023, 9:48 AM

3 points

8 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Separating the “control problem” from the “alignment problem”

Yi-YangMay 11, 2023, 9:41 AM

12 points

8 votes

Overall karma indicates overall quality.

1 comment4 min readLW link

[Question] Is Infra-Bayesianism Applicable to Value Learning?

RogerDearnaleyMay 11, 2023, 8:17 AM

5 points

1 vote

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] How should we think about the decision relevance of models estimating p(doom)?

Mo PuteraMay 11, 2023, 4:16 AM

12 points

9 votes

Overall karma indicates overall quality.

1 comment3 min readLW link

The Academic Field Pyramid—any point to encouraging broad but shallow AI risk engagement?

Matthew_OpitzMay 11, 2023, 1:32 AM

20 points

12 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

[Question] How should one feel morally about using chatbots?

Adam ZernerMay 11, 2023, 1:01 AM

18 points

7 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

[Question] AI interpretability could be harmful?

Roman LeventovMay 10, 2023, 8:43 PM

13 points

8 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Athens, Greece – ACX Meetups Everywhere Spring 2023

Spyros DovasMay 10, 2023, 7:45 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Better debates

TsviBTMay 10, 2023, 7:34 PM

78 points

27 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)

Chris Scammell and DivineMango

May 10, 2023, 7:04 PM

265 points

134 votes

Overall karma indicates overall quality.

54 comments21 min readLW link

A Corrigibility Metaphore—Big Gambles

WCargoMay 10, 2023, 6:13 PM

16 points

11 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

Roadmap for a collaborative prototype of an Open Agency Architecture

Deger TuranMay 10, 2023, 5:41 PM

31 points

12 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

AGI-Automated Interpretability is Suicide

__RicG__May 10, 2023, 2:20 PM

25 points

64 votes

Overall karma indicates overall quality.

33 comments7 min readLW link

Class-Based Addressing

jefftkMay 10, 2023, 1:40 PM

22 points

17 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

(www.jefftk.com)

In defence of epistemic modesty [distillation]

LuiseMay 10, 2023, 9:44 AM

17 points

8 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

[Question] How much of a concern are open-source LLMs in the short, medium and long terms?

JavierCCMay 10, 2023, 9:14 AM

5 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

10 great reasons why Lex Fridman should invite Eliezer and Robin to re-do the FOOM debate on his podcast

chaosmageMay 10, 2023, 8:27 AM

−7 points

5 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

(www.reddit.com)

New OpenAI Paper—Language models can explain neurons in language models

MrThinkMay 10, 2023, 7:46 AM

47 points

19 votes

Overall karma indicates overall quality.

14 comments1 min readLW link

Naturalist Experimentation

LoganStrohlMay 10, 2023, 4:28 AM

62 points

18 votes

Overall karma indicates overall quality.

14 comments10 min readLW link

[Question] Could A Superintelligence Out-Argue A Doomer?

tjaffeeMay 10, 2023, 2:40 AM

−16 points

9 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

Gradient hacking via actual hacking

Max HMay 10, 2023, 1:57 AM

12 points

4 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

Red teaming: challenges and research directions

joshcMay 10, 2023, 1:40 AM

31 points

10 votes

Overall karma indicates overall quality.

1 comment10 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer