Three pillars for avoid­ing AGI catas­tro­phe: Tech­ni­cal al­ign­ment, de­ploy­ment de­ci­sions, and coordination

Alex Lintz3 Aug 2022 23:15 UTC
22 points
0 comments12 min readLW link

Pre­cur­sor check­ing for de­cep­tive alignment

evhub3 Aug 2022 22:56 UTC
24 points
0 comments14 min readLW link

Trans­former lan­guage mod­els are do­ing some­thing more general

Numendil3 Aug 2022 21:13 UTC
53 points
6 comments2 min readLW link

[Question] Some doubts about Non Su­per­in­tel­li­gent AIs

aditya malik3 Aug 2022 19:55 UTC
0 points
4 comments1 min readLW link

An­nounc­ing Squig­gle: Early Access

ozziegooen3 Aug 2022 19:48 UTC
51 points
7 comments7 min readLW link
(forum.effectivealtruism.org)

Sur­vey: What (de)mo­ti­vates you about AI risk?

Daniel_Friedrich3 Aug 2022 19:17 UTC
1 point
0 comments1 min readLW link
(forms.gle)

Ex­ter­nal­ized rea­son­ing over­sight: a re­search di­rec­tion for lan­guage model alignment

tamera3 Aug 2022 12:03 UTC
130 points
23 comments6 min readLW link

Open & Wel­come Thread—Aug/​Sep 2022

Thomas3 Aug 2022 10:22 UTC
9 points
32 comments1 min readLW link

[Question] How does one rec­og­nize in­for­ma­tion and differ­en­ti­ate it from noise?

M. Y. Zuo3 Aug 2022 3:57 UTC
4 points
29 comments1 min readLW link

Law-Fol­low­ing AI 4: Don’t Rely on Vi­car­i­ous Liability

Cullen2 Aug 2022 23:26 UTC
5 points
2 comments3 min readLW link

Two-year up­date on my per­sonal AI timelines

Ajeya Cotra2 Aug 2022 23:07 UTC
288 points
60 comments16 min readLW link

What are the Red Flags for Neu­ral Net­work Suffer­ing? - Seeds of Science call for reviewers

rogersbacon2 Aug 2022 22:37 UTC
24 points
6 comments1 min readLW link

Againstness

CFAR!Duncan2 Aug 2022 19:29 UTC
47 points
7 comments9 min readLW link

(Sum­mary) Se­quence High­lights—Think­ing Bet­ter on Purpose

qazzquimby2 Aug 2022 17:45 UTC
33 points
3 comments11 min readLW link

Progress links and tweets, 2022-08-02

jasoncrawford2 Aug 2022 17:03 UTC
9 points
0 comments1 min readLW link
(rootsofprogress.org)

[Question] I want to donate some money (not much, just what I can af­ford) to AGI Align­ment re­search, to what­ever or­ga­ni­za­tion has the best chance of mak­ing sure that AGI goes well and doesn’t kill us all. What are my best op­tions, where can I make the most differ­ence per dol­lar?

lumenwrites2 Aug 2022 12:08 UTC
15 points
9 comments1 min readLW link

Think­ing with­out pri­ors?

Q Home2 Aug 2022 9:17 UTC
7 points
0 comments9 min readLW link

[Question] Would quan­tum im­mor­tal­ity mean sub­jec­tive im­mor­tal­ity?

n0ah2 Aug 2022 4:54 UTC
2 points
10 comments1 min readLW link

Turbocharging

CFAR!Duncan2 Aug 2022 0:01 UTC
50 points
3 comments9 min readLW link

Let­ter from lead­ing Soviet Aca­demi­ci­ans to party and gov­ern­ment lead­ers of the Soviet Union re­gard­ing signs of de­cline and struc­tural prob­lems of the eco­nomic-poli­ti­cal sys­tem (1970)

M. Y. Zuo1 Aug 2022 22:35 UTC
20 points
10 comments16 min readLW link

Tech­ni­cal AI Align­ment Study Group

Eric K1 Aug 2022 18:33 UTC
5 points
0 comments1 min readLW link

[Question] Is there any writ­ing about prompt en­g­ineer­ing for hu­mans?

Alex Hollow1 Aug 2022 12:52 UTC
18 points
8 comments1 min readLW link

Med­i­ta­tion course claims 65% en­light­en­ment rate: my review

KatWoods1 Aug 2022 11:25 UTC
111 points
33 comments14 min readLW link

[Question] Which in­tro-to-AI-risk text would you recom­mend to...

Sherrinford1 Aug 2022 9:36 UTC
12 points
1 comment1 min readLW link

Po­laris, Five-Se­cond Ver­sions, and Thought Lengths

CFAR!Duncan1 Aug 2022 7:14 UTC
46 points
12 comments8 min readLW link

A Word is Worth 1,000 Pictures

Kully1 Aug 2022 4:08 UTC
1 point
0 comments2 min readLW link

On akra­sia: start­ing at the bottom

seecrow1 Aug 2022 4:08 UTC
33 points
2 comments3 min readLW link

[Question] How likely do you think worse-than-ex­tinc­tion type fates to be?

span11 Aug 2022 4:08 UTC
3 points
3 comments1 min readLW link

Don’t be a Maxi

Cole Killian31 Jul 2022 23:59 UTC
15 points
7 comments2 min readLW link
(colekillian.com)

Ab­strac­tion sac­ri­fices causal clarity

Marv K31 Jul 2022 19:24 UTC
2 points
0 comments3 min readLW link

Time-log­ging pro­grams and/​or spread­sheets (2022)

mikbp31 Jul 2022 18:18 UTC
3 points
3 comments1 min readLW link

Con­ser­vatism is a ra­tio­nal re­sponse to epistemic uncertainty

contrarianbrit31 Jul 2022 18:04 UTC
2 points
11 comments9 min readLW link
(thomasprosser.substack.com)

South Bay ACX/​LW Meetup

IS31 Jul 2022 15:30 UTC
2 points
0 comments1 min readLW link

Per­verse In­de­pen­dence Incentives

jefftk31 Jul 2022 14:40 UTC
58 points
3 comments1 min readLW link
(www.jefftk.com)

Wolfram Re­search v Cook

Kenny31 Jul 2022 13:35 UTC
7 points
2 comments8 min readLW link

Wanted: No­ta­tion for credal resilience

PeterH31 Jul 2022 7:35 UTC
21 points
12 comments1 min readLW link

Anatomy of a Dat­ing Document

squidious31 Jul 2022 2:40 UTC
26 points
24 comments4 min readLW link
(opalsandbonobos.blogspot.com)

chin­chilla’s wild implications

nostalgebraist31 Jul 2022 1:18 UTC
415 points
128 comments10 min readLW link1 review

AGI-level rea­soner will ap­pear sooner than an agent; what the hu­man­ity will do with this rea­soner is critical

Roman Leventov30 Jul 2022 20:56 UTC
24 points
10 comments1 min readLW link

[Question] What job should I do?

Tom Paine30 Jul 2022 9:15 UTC
2 points
8 comments1 min readLW link

How trans­parency changed over time

ViktoriaMalyasova30 Jul 2022 4:36 UTC
21 points
0 comments6 min readLW link

Trans­lat­ing be­tween La­tent Spaces

30 Jul 2022 3:25 UTC
27 points
2 comments8 min readLW link

Drexler’s Nan­otech Forecast

PeterMcCluskey30 Jul 2022 0:45 UTC
25 points
28 comments3 min readLW link
(www.bayesianinvestor.com)

Hu­mans Reflect­ing on HRH

leogao29 Jul 2022 21:56 UTC
26 points
4 comments2 min readLW link

Com­par­ing Four Ap­proaches to In­ner Alignment

Lucas Teixeira29 Jul 2022 21:06 UTC
35 points
1 comment9 min readLW link

Ques­tions for a The­ory of Narratives

Marv K29 Jul 2022 19:31 UTC
5 points
4 comments4 min readLW link

Focusing

CFAR!Duncan29 Jul 2022 19:15 UTC
107 points
23 comments14 min readLW link

Con­jec­ture: In­ter­nal In­fo­haz­ard Policy

29 Jul 2022 19:07 UTC
131 points
6 comments19 min readLW link

Ab­stract­ing The Hard­ness of Align­ment: Un­bounded Atomic Optimization

adamShimi29 Jul 2022 18:59 UTC
66 points
3 comments16 min readLW link

Bucket Errors

CFAR!Duncan29 Jul 2022 18:50 UTC
40 points
7 comments11 min readLW link