Paus­ing AI Devel­op­ments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky

jacquesthibsMar 29, 2023, 11:16 PM
292 points
297 comments3 min readLW link
(time.com)

Othello-GPT: Reflec­tions on the Re­search Process

Neel NandaMar 29, 2023, 10:13 PM
38 points
0 comments15 min readLW link
(neelnanda.io)

Othello-GPT: Fu­ture Work I Am Ex­cited About

Neel NandaMar 29, 2023, 10:13 PM
48 points
2 comments33 min readLW link
(neelnanda.io)

Ac­tu­ally, Othello-GPT Has A Lin­ear Emer­gent World Representation

Neel NandaMar 29, 2023, 10:13 PM
211 points
26 comments19 min readLW link
(neelnanda.io)

Draft: De­tect­ing optimization

Alex_AltairMar 29, 2023, 8:17 PM
23 points
2 comments6 min readLW link

“Sorcerer’s Ap­pren­tice” from Fan­ta­sia as an anal­ogy for alignment

awgMar 29, 2023, 6:21 PM
9 points
4 comments1 min readLW link
(video.disney.com)

The Chang­ing Face of Twitter

ZviMar 29, 2023, 5:50 PM
23 points
8 comments26 min readLW link
(thezvi.wordpress.com)

No­body’s on the ball on AGI alignment

leopoldMar 29, 2023, 5:40 PM
94 points
38 comments9 min readLW link
(www.forourposterity.com)

Want to win the AGI race? Solve al­ign­ment.

leopoldMar 29, 2023, 5:40 PM
21 points
3 comments5 min readLW link
(www.forourposterity.com)

ChatGPT and Bing Chat can’t play Botticelli

Asha SaavossMar 29, 2023, 5:39 PM
11 points
0 comments6 min readLW link

The Ra­tion­al­ist Guide to Hinduism

Harsha G.Mar 29, 2023, 5:03 PM
25 points
12 comments9 min readLW link
(somestrangeloops.substack.com)

“Un­in­ten­tional AI safety re­search”: Why not sys­tem­at­i­cally mine AI tech­ni­cal re­search for safety pur­poses?

Jemal YoungMar 29, 2023, 3:56 PM
27 points
3 comments6 min readLW link

The open letter

kornaiMar 29, 2023, 3:09 PM
−21 points
2 comments1 min readLW link

I made AI Risk Propaganda

monkymindMar 29, 2023, 2:26 PM
−3 points
0 comments1 min readLW link

Strong Cheap Signals

trevorMar 29, 2023, 2:18 PM
29 points
3 comments2 min readLW link
(betonit.substack.com)

Miss­ing fore­cast­ing tools: from cat­a­logs to a new kind of pre­dic­tion market

MichaelLatowickiMar 29, 2023, 9:55 AM
14 points
3 comments5 min readLW link

Spread­sheet for 200 Con­crete Prob­lems In Interpretability

Jay BaileyMar 29, 2023, 6:51 AM
13 points
0 comments1 min readLW link

[Question] Which parts of the ex­ist­ing in­ter­net are already likely to be in (GPT-5/​other soon-to-be-trained LLMs)’s train­ing cor­pus?

AnnaSalamonMar 29, 2023, 5:17 AM
49 points
2 comments1 min readLW link

[Question] Are there spe­cific books that it might slightly help al­ign­ment to have on the in­ter­net?

AnnaSalamonMar 29, 2023, 5:08 AM
77 points
25 comments1 min readLW link

FLI open let­ter: Pause gi­ant AI experiments

Zach Stein-PerlmanMar 29, 2023, 4:04 AM
126 points
123 comments2 min readLW link
(futureoflife.org)

Run Posts By Orgs

jefftkMar 29, 2023, 2:40 AM
16 points
74 comments3 min readLW link
(www.jefftk.com)

De­sen­si­tiz­ing Deepfakes

worseMar 29, 2023, 1:20 AM
1 point
0 comments1 min readLW link

Large lan­guage mod­els aren’t trained enough

sanxiynMar 29, 2023, 12:56 AM
5 points
4 comments1 min readLW link
(finbarr.ca)

Job Board (28 March 2033)

dr_sMar 28, 2023, 10:44 PM
20 points
1 comment3 min readLW link

Four lenses on AI risks

jasoncrawfordMar 28, 2023, 9:52 PM
23 points
5 comments3 min readLW link
(rootsofprogress.org)

Some com­mon con­fu­sion about in­duc­tion heads

Alexandre VariengienMar 28, 2023, 9:51 PM
64 points
4 comments5 min readLW link

Draft: The op­ti­miza­tion toolbox

Alex_AltairMar 28, 2023, 8:40 PM
20 points
1 comment7 min readLW link

Inch­ing “Kubla Khan” and GPT into the same in­tel­lec­tual frame­work @ 3 Quarks Daily

Bill BenzonMar 28, 2023, 7:50 PM
5 points
0 comments3 min readLW link

A rough and in­com­plete re­view of some of John Went­worth’s research

So8resMar 28, 2023, 6:52 PM
175 points
18 comments18 min readLW link

[Question] How do you man­age your in­puts?

Mateusz BagińskiMar 28, 2023, 6:26 PM
15 points
2 comments1 min readLW link

Chat­bot con­vinces Bel­gian to com­mit suicide

Jeroen De RyckMar 28, 2023, 6:14 PM
60 points
18 comments3 min readLW link
(www.standaard.be)

A Primer On Chaos

johnswentworthMar 28, 2023, 6:01 PM
53 points
9 comments9 min readLW link

[Question] How likely are sce­nar­ios where AGI ends up overtly or de facto tor­tur­ing us? How likely are sce­nar­ios where AGI pre­vents us from com­mit­ting suicide or dy­ing?

JohnGreerMar 28, 2023, 6:00 PM
11 points
4 comments1 min readLW link

How do we al­ign hu­mans and what does it mean for the new Con­jec­ture’s strategy

Igor IvanovMar 28, 2023, 5:54 PM
7 points
4 comments7 min readLW link

Govern­ing High-Im­pact AI Sys­tems: Un­der­stand­ing Canada’s Pro­posed AI Bill. April 15, Car­leton Univer­sity, Ottawa

Liav KorenMar 28, 2023, 5:48 PM
11 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

I had a chat with GPT-4 on the fu­ture of AI and AI safety

Kristian FreedMar 28, 2023, 5:47 PM
1 point
0 comments8 min readLW link

LessWrong Hangout

Raymond KoopmanschapMar 28, 2023, 5:47 PM
0 points
0 comments1 min readLW link

Half-baked al­ign­ment idea

ozbMar 28, 2023, 5:47 PM
6 points
27 comments1 min readLW link

Some of My Cur­rent Im­pres­sions En­ter­ing AI Safety

worseMar 28, 2023, 5:46 PM
2 points
0 comments2 min readLW link

[Question] Why do the Se­quences say that “Löb’s The­o­rem shows that a math­e­mat­i­cal sys­tem can­not as­sert its own sound­ness with­out be­com­ing in­con­sis­tent.”?

Thoth HermesMar 28, 2023, 5:19 PM
12 points
30 comments1 min readLW link

Cor­rigi­bil­ity, Self-Dele­tion, and Iden­ti­cal Strawberries

Robert_AIZIMar 28, 2023, 4:54 PM
9 points
2 comments6 min readLW link
(aizi.substack.com)

[Question] Why no ma­jor LLMs with mem­ory?

Kaj_SotalaMar 28, 2023, 4:34 PM
42 points
15 comments1 min readLW link

Re­sponse to Tyler Cowen’s Ex­is­ten­tial risk, AI, and the in­evitable turn in hu­man history

ZviMar 28, 2023, 4:00 PM
72 points
27 comments20 min readLW link
(thezvi.wordpress.com)

Adapt­ing to Change: Over­com­ing Chronos­ta­sis in AI Lan­guage Models

RationalMindsetMar 28, 2023, 2:32 PM
−1 points
0 comments6 min readLW link

Feel­ing Progress as Motivation

SableMar 28, 2023, 9:11 AM
4 points
1 comment3 min readLW link
(affablyevil.substack.com)

Creat­ing a fam­ily with GPT-4

Kaj_SotalaMar 28, 2023, 6:40 AM
23 points
3 comments10 min readLW link
(kajsotala.fi)

Some 2-4-6 problems

abstractapplicMar 28, 2023, 6:32 AM
28 points
9 comments1 min readLW link
(h-b-p.github.io)

[Question] Deep fold­ing docs site?

mcintMar 28, 2023, 6:01 AM
−1 points
2 comments1 min readLW link

[Question] Why does ad­vanced AI want not to be shut down?

RedFishBlueFishMar 28, 2023, 4:26 AM
2 points
19 comments1 min readLW link

100 Din­ners And A Work­shop: In­for­ma­tion Preser­va­tion And Goals

Stephen FowlerMar 28, 2023, 3:13 AM
8 points
0 comments7 min readLW link