All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Three reasons to cooperate

paulfchristianoDec 24, 2022, 5:40 PM

86 points

14 comments10 min readLW link

(sideways-view.com)

A hundredth of a bit of extra entropy

Adam ScherlisDec 24, 2022, 9:12 PM

83 points

4 comments3 min readLW link

Reflections on my 5-month alignment upskilling grant

Jay BaileyDec 27, 2022, 10:51 AM

82 points

4 comments8 min readLW link

An Open Agency Architecture for Safe Transformative AI

davidadDec 20, 2022, 1:04 PM

80 points

22 comments4 min readLW link

Proper scoring rules don’t guarantee predicting fixed points

Johannes Treutlein, Rubi J. Hudson and Caspar Oesterheld

Dec 16, 2022, 6:22 PM

79 points

8 comments21 min readLW link

Results from a survey on tool use and workflows in alignment research

jacquesthibs, Jan, janus and Logan Riggs

Dec 19, 2022, 3:19 PM

79 points

2 comments19 min readLW link

Probably good projects for the AI safety ecosystem

Ryan KiddDec 5, 2022, 2:26 AM

78 points

40 comments2 min readLW link

On sincerity

Joe CarlsmithDec 23, 2022, 5:13 PM

76 points

6 comments42 min readLW link

MrBeast’s Squid Game Tricked Me

lsusrDec 3, 2022, 5:50 AM

75 points

1 comment2 min readLW link

10 Years of LessWrong

SebastianG Dec 30, 2022, 5:15 PM

73 points

2 comments4 min readLW link

Verification Is Not Easier Than Generation In General

johnswentworthDec 6, 2022, 5:20 AM

73 points

27 comments1 min readLW link

«Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_CritchDec 14, 2022, 10:34 PM

72 points

7 comments13 min readLW link

[Question] Who are some prominent reasonable people who are confident that AI won’t kill everyone?

Optimization ProcessDec 5, 2022, 9:12 AM

72 points

54 comments1 min readLW link

AI Safety Seems Hard to Measure

HoldenKarnofskyDec 8, 2022, 7:50 PM

71 points

6 comments14 min readLW link

(www.cold-takes.com)

It’s time to worry about online privacy again

MalmesburyDec 25, 2022, 9:05 PM

69 points

23 comments6 min readLW link

The True Spirit of Solstice?

RaemonDec 19, 2022, 8:00 AM

69 points

31 comments9 min readLW link

Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)

LawrenceCDec 16, 2022, 10:12 PM

68 points

11 comments1 min readLW link

(www.anthropic.com)

AI Neorealism: a threat model & success criterion for existential safety

davidadDec 15, 2022, 1:42 PM

67 points

1 comment3 min readLW link

AGI Timelines in Governance: Different Strategies for Different Timeframes

simeon_c and AmberDawn

Dec 19, 2022, 9:31 PM

65 points

28 comments10 min readLW link

Can we efficiently explain model behaviors?

paulfchristianoDec 16, 2022, 7:40 PM

64 points

3 comments9 min readLW link

(ai-alignment.com)

Systems of Survival

VaniverDec 9, 2022, 5:13 AM

63 points

5 comments5 min readLW link

Key Mostly Outward-Facing Facts From the Story of VaccinateCA

ZviDec 14, 2022, 1:30 PM

61 points

2 comments23 min readLW link

(thezvi.wordpress.com)

Notice when you stop reading right before you understand

just_browsingDec 20, 2022, 5:09 AM

61 points

6 comments1 min readLW link

Summary of a new study on out-group hate (and how to fix it)

DirectedEvolutionDec 4, 2022, 1:53 AM

60 points

30 comments3 min readLW link

(www.pnas.org)

Predicting GPU performance

Marius Hobbhahn and Tamay

Dec 14, 2022, 4:27 PM

60 points

26 comments1 min readLW link

(epochai.org)

Update on Harvard AI Safety Team and MIT AI Alignment

Xander Davies, Sam Marks, kaivu, tlevin, eleni, maxnadeau and Naomi Bashkansky

Dec 2, 2022, 12:56 AM

60 points

4 comments8 min readLW link

The Meditation on Winter

RaemonDec 25, 2022, 4:12 PM

59 points

3 comments3 min readLW link

MIRI’s “Death with Dignity” in 60 seconds.

Cleo NardoDec 6, 2022, 5:18 PM

59 points

4 comments1 min readLW link

CIRL Corrigibility is Fragile

Rachel Freedman and AdamGleave

Dec 21, 2022, 1:40 AM

58 points

8 comments12 min readLW link

High-level hopes for AI alignment

HoldenKarnofskyDec 15, 2022, 6:00 PM

58 points

3 comments19 min readLW link

(www.cold-takes.com)

Concrete Steps to Get Started in Transformer Mechanistic Interpretability

Neel NandaDec 25, 2022, 10:21 PM

57 points

7 comments12 min readLW link

(www.neelnanda.io)

YCombinator fraud rates

XodarapDec 25, 2022, 7:21 PM

56 points

3 comments LW link

In defense of probably wrong mechanistic models

evhubDec 6, 2022, 11:24 PM

55 points

10 comments2 min readLW link

My thoughts on OpenAI’s alignment plan

Orpheus16Dec 30, 2022, 7:33 PM

55 points

3 comments20 min readLW link

Formalization as suspension of intuition

adamShimiDec 11, 2022, 3:16 PM

54 points

18 comments1 min readLW link

(epistemologicalvigilance.substack.com)

Take 13: RLHF bad, conditioning good.

Charlie SteinerDec 22, 2022, 10:44 AM

54 points

4 comments2 min readLW link

Nook Nature

Duncan Sabien (Inactive)Dec 5, 2022, 4:10 AM

54 points

18 comments10 min readLW link

Reframing inner alignment

davidadDec 11, 2022, 1:53 PM

53 points

13 comments4 min readLW link

The “Minimal Latents” Approach to Natural Abstractions

johnswentworthDec 20, 2022, 1:22 AM

53 points

24 comments12 min readLW link

Announcing: The Independent AI Safety Registry

Shoshannah TekofskyDec 26, 2022, 9:22 PM

53 points

9 comments1 min readLW link

Air-gapping evaluation and support

Ryan KiddDec 26, 2022, 10:52 PM

53 points

1 comment2 min readLW link

Positive values seem more robust and lasting than prohibitions

TurnTroutDec 17, 2022, 9:43 PM

52 points

13 comments2 min readLW link

My AGI safety research—2022 review, ’23 plans

Steven ByrnesDec 14, 2022, 3:15 PM

51 points

10 comments7 min readLW link

Looking Back on Posts From 2022

ZviDec 26, 2022, 1:20 PM

50 points

8 comments17 min readLW link

(thezvi.wordpress.com)

China Covid #4

ZviDec 22, 2022, 4:30 PM

50 points

2 comments11 min readLW link

(thezvi.wordpress.com)

Take 7: You should talk about “the human’s utility function” less.

Charlie SteinerDec 8, 2022, 8:14 AM

50 points

22 comments2 min readLW link

Next Level Seinfeld

ZviDec 19, 2022, 1:30 PM

50 points

8 comments1 min readLW link

(thezvi.wordpress.com)

My Reservations about Discovering Latent Knowledge (Burns, Ye, et al)

Robert_AIZIDec 27, 2022, 5:27 PM

50 points

0 comments4 min readLW link

(aizi.substack.com)

Applications open for AGI Safety Fundamentals: Alignment Course

Richard_NgoDec 13, 2022, 6:31 PM

49 points

0 comments2 min readLW link

Basic building blocks of dependent type theory

Thomas KehrenbergDec 15, 2022, 2:54 PM

49 points

9 comments13 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer