re: Yud­kowsky on biolog­i­cal materials

bhauthDec 11, 2023, 1:28 PM
182 points
30 comments5 min readLW link

A re­port about LessWrong karma volatility from a differ­ent universe

Ben PaceApr 1, 2023, 9:48 PM
181 points
7 comments1 min readLW link

There should be more AI safety orgs

Marius HobbhahnSep 21, 2023, 2:53 PM
181 points
25 comments17 min readLW link

Neu­ral net­works gen­er­al­ize be­cause of this one weird trick

Jesse HooglandJan 18, 2023, 12:10 AM
181 points
34 comments15 min readLW link1 review
(www.jessehoogland.com)

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcsMar 15, 2023, 5:55 PM
180 points
42 comments1 min readLW link

[Link] A com­mu­nity alert about Ziz

DanielFilanFeb 24, 2023, 12:06 AM
180 points
166 comments2 min readLW link4 reviews
(medium.com)

Talk­ing pub­li­cly about AI risk

Jan_KulveitApr 21, 2023, 11:28 AM
180 points
9 comments6 min readLW link

I still think it’s very un­likely we’re ob­serv­ing alien aircraft

dynomightJun 15, 2023, 1:01 PM
180 points
70 comments5 min readLW link
(dynomight.net)

When is Good­hart catas­trophic?

May 9, 2023, 3:59 AM
180 points
29 comments8 min readLW link1 review

Alexan­der and Yud­kowsky on AGI goals

Jan 24, 2023, 9:09 PM
178 points
53 comments26 min readLW link1 review

LLMs Some­times Gen­er­ate Purely Nega­tively-Re­in­forced Text

Fabien RogerJun 16, 2023, 4:31 PM
177 points
11 comments7 min readLW link

The ‘Ne­glected Ap­proaches’ Ap­proach: AE Stu­dio’s Align­ment Agenda

Dec 18, 2023, 8:35 PM
177 points
23 comments12 min readLW link1 review

Crit­i­cal re­view of Chris­ti­ano’s dis­agree­ments with Yudkowsky

Vanessa KosoyDec 27, 2023, 4:02 PM
176 points
40 comments15 min readLW link

A rough and in­com­plete re­view of some of John Went­worth’s research

So8resMar 28, 2023, 6:52 PM
175 points
18 comments18 min readLW link

[Linkpost] In­tro­duc­ing Superalignment

berenJul 5, 2023, 6:23 PM
175 points
69 comments1 min readLW link
(openai.com)

De­ci­sion The­ory with the Magic Parts Highlighted

moridinamaelMay 16, 2023, 5:39 PM
175 points
24 comments5 min readLW link

De­fund­ing My Mistake

ymeskhoutSep 4, 2023, 2:43 PM
175 points
41 comments6 min readLW link

Thomas Kwa’s MIRI re­search experience

Oct 2, 2023, 4:42 PM
173 points
53 comments1 min readLW link

Tun­ing your Cog­ni­tive Strategies

Apr 27, 2023, 8:32 PM
173 points
59 comments9 min readLW link1 review
(bewelltuned.com)

An­thropic’s Core Views on AI Safety

Zac Hatfield-DoddsMar 9, 2023, 4:55 PM
172 points
39 comments2 min readLW link
(www.anthropic.com)

Why Are Bac­te­ria So Sim­ple?

aysjaFeb 6, 2023, 3:00 AM
172 points
33 comments10 min readLW link

Para­met­ri­cally re­tar­getable de­ci­sion-mak­ers tend to seek power

TurnTroutFeb 18, 2023, 6:41 PM
172 points
10 comments2 min readLW link
(arxiv.org)

AI #1: Syd­ney and Bing

ZviFeb 21, 2023, 2:00 PM
171 points
45 comments61 min readLW link1 review
(thezvi.wordpress.com)

What I mean by “al­ign­ment is in large part about mak­ing cog­ni­tion aimable at all”

So8resJan 30, 2023, 3:22 PM
171 points
25 comments2 min readLW link

Pres­i­dent Bi­den Is­sues Ex­ec­u­tive Order on Safe, Se­cure, and Trust­wor­thy Ar­tifi­cial Intelligence

Tristan WilliamsOct 30, 2023, 11:15 AM
171 points
39 commentsLW link
(www.whitehouse.gov)

How to (hope­fully eth­i­cally) make money off of AGI

Nov 6, 2023, 11:35 PM
171 points
95 comments32 min readLW link1 review

[April Fools’] Defini­tive con­fir­ma­tion of shard theory

TurnTroutApr 1, 2023, 7:27 AM
170 points
8 comments2 min readLW link

Will the grow­ing deer prion epi­demic spread to hu­mans? Why not?

eukaryoteJun 25, 2023, 4:31 AM
170 points
33 comments13 min readLW link
(eukaryotewritesblog.com)

Ar­chi­tects of Our Own Demise: We Should Stop Devel­op­ing AI Carelessly

RokoOct 26, 2023, 12:36 AM
170 points
75 comments3 min readLW link

Ra­tion­al­ity !== Winning

RaemonJul 24, 2023, 2:53 AM
170 points
51 comments4 min readLW link

Thoughts on the AI Safety Sum­mit com­pany policy re­quests and responses

So8resOct 31, 2023, 11:54 PM
169 points
14 comments10 min readLW link

2023 Unoffi­cial LessWrong Cen­sus/​Survey

ScrewtapeDec 2, 2023, 4:41 AM
169 points
81 comments1 min readLW link

A stylized di­alogue on John Went­worth’s claims about mar­kets and optimization

So8resMar 25, 2023, 10:32 PM
169 points
22 comments8 min readLW link

Davi­dad’s Bold Plan for Align­ment: An In-Depth Explanation

Apr 19, 2023, 4:09 PM
168 points
40 comments21 min readLW link2 reviews

How use­ful is mechanis­tic in­ter­pretabil­ity?

Dec 1, 2023, 2:54 AM
167 points
54 comments25 min readLW link

Why it’s so hard to talk about Consciousness

Rafael HarthJul 2, 2023, 3:56 PM
167 points
215 comments9 min readLW link3 reviews

The Brain is Not Close to Ther­mo­dy­namic Limits on Computation

DaemonicSigilApr 24, 2023, 8:21 AM
167 points
58 comments5 min readLW link

You can just spon­ta­neously call peo­ple you haven’t met in years

lcNov 13, 2023, 5:21 AM
167 points
21 comments1 min readLW link

My un­der­stand­ing of An­thropic strategy

Swimmer963 (Miranda Dixon-Luinenburg) Feb 15, 2023, 1:56 AM
166 points
31 comments4 min readLW link

When can we trust model eval­u­a­tions?

evhubJul 28, 2023, 7:42 PM
166 points
10 comments10 min readLW link1 review

What Dis­cov­er­ing La­tent Knowl­edge Did and Did Not Find

Fabien RogerMar 13, 2023, 7:29 PM
166 points
17 comments11 min readLW link

A list of core AI safety prob­lems and how I hope to solve them

davidadAug 26, 2023, 3:12 PM
165 points
29 comments5 min readLW link

$20 Million in NSF Grants for Safety Research

Dan HFeb 28, 2023, 4:44 AM
165 points
12 comments1 min readLW link

Loudly Give Up, Don’t Quietly Fade

ScrewtapeNov 13, 2023, 11:30 PM
165 points
12 comments6 min readLW link1 review

Gra­di­ent hack­ing is ex­tremely difficult

berenJan 24, 2023, 3:45 PM
164 points
22 comments5 min readLW link

Towards un­der­stand­ing-based safety evaluations

evhubMar 15, 2023, 6:18 PM
164 points
16 comments5 min readLW link

Prizes for ma­trix com­ple­tion problems

paulfchristianoMay 3, 2023, 11:30 PM
164 points
52 comments1 min readLW link
(www.alignment.org)

RSPs are pauses done right

evhubOct 14, 2023, 4:06 AM
164 points
73 comments7 min readLW link1 review

Holly El­more and Rob Miles di­alogue on AI Safety Advocacy

Oct 20, 2023, 9:04 PM
162 points
30 comments27 min readLW link

The Dial of Progress

ZviJun 13, 2023, 1:40 PM
161 points
119 comments11 min readLW link
(thezvi.wordpress.com)