[Link] Why I’m op­ti­mistic about OpenAI’s al­ign­ment approach

janleike5 Dec 2022 22:51 UTC
98 points
15 comments1 min readLW link
(aligned.substack.com)

The No Free Lunch the­o­rem for dummies

Steven Byrnes5 Dec 2022 21:46 UTC
37 points
16 comments3 min readLW link

ChatGPT and Ide­olog­i­cal Tur­ing Test

Viliam5 Dec 2022 21:45 UTC
42 points
1 comment1 min readLW link

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill Benzon5 Dec 2022 21:10 UTC
5 points
0 comments4 min readLW link

Up­dat­ing my AI timelines

Matthew Barnett5 Dec 2022 20:46 UTC
143 points
50 comments2 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

Col­lege Ad­mis­sions as a Bru­tal One-Shot Game

devansh5 Dec 2022 20:05 UTC
8 points
26 comments2 min readLW link

Anal­y­sis of AI Safety sur­veys for field-build­ing insights

Ash Jafari5 Dec 2022 19:21 UTC
11 points
2 comments5 min readLW link

Test­ing Ways to By­pass ChatGPT’s Safety Features

Robert_AIZI5 Dec 2022 18:50 UTC
7 points
4 comments5 min readLW link
(aizi.substack.com)

Fore­sight for AGI Safety Strat­egy: Miti­gat­ing Risks and Iden­ti­fy­ing Golden Opportunities

jacquesthibs5 Dec 2022 16:09 UTC
28 points
6 comments8 min readLW link

Aligned Be­hav­ior is not Ev­i­dence of Align­ment Past a Cer­tain Level of Intelligence

Ronny Fernandez5 Dec 2022 15:19 UTC
19 points
5 comments7 min readLW link

[Question] How should I judge the im­pact of giv­ing $5k to a fam­ily of three kids and two men­tally ill par­ents?

Blake5 Dec 2022 13:42 UTC
10 points
10 comments1 min readLW link

Is the “Valley of Con­fused Ab­strac­tions” real?

jacquesthibs5 Dec 2022 13:36 UTC
19 points
11 comments2 min readLW link

Take 4: One prob­lem with nat­u­ral ab­strac­tions is there’s too many of them.

Charlie Steiner5 Dec 2022 10:39 UTC
36 points
4 comments1 min readLW link

[Question] What are some good Less­wrong-re­lated ac­counts or hash­tags on Mastodon that I should fol­low?

SpectrumDT5 Dec 2022 9:42 UTC
2 points
0 comments1 min readLW link

[Question] Who are some promi­nent rea­son­able peo­ple who are con­fi­dent that AI won’t kill ev­ery­one?

Optimization Process5 Dec 2022 9:12 UTC
72 points
54 comments1 min readLW link

Monthly Shorts 11/​22

Celer5 Dec 2022 7:30 UTC
8 points
0 comments3 min readLW link
(keller.substack.com)

A ChatGPT story about ChatGPT doom

SurfingOrca5 Dec 2022 5:40 UTC
6 points
2 comments4 min readLW link

A Ten­ta­tive Timeline of The Near Fu­ture (2022-2025) for Self-Accountability

Yitz5 Dec 2022 5:33 UTC
26 points
0 comments4 min readLW link

Nook Nature

[DEACTIVATED] Duncan Sabien5 Dec 2022 4:10 UTC
52 points
18 comments10 min readLW link

Prob­a­bly good pro­jects for the AI safety ecosystem

Ryan Kidd5 Dec 2022 2:26 UTC
77 points
31 comments2 min readLW link

His­tor­i­cal Notes on Char­i­ta­ble Funds

jefftk4 Dec 2022 23:30 UTC
28 points
0 comments3 min readLW link
(www.jefftk.com)

AGI as a Black Swan Event

Stephen McAleese4 Dec 2022 23:00 UTC
8 points
8 comments7 min readLW link

South Bay ACX/​LW Pre-Holi­day Get-Together

IS4 Dec 2022 22:57 UTC
10 points
0 comments1 min readLW link

ChatGPT is set­tling the Chi­nese Room argument

averros4 Dec 2022 20:25 UTC
−7 points
7 comments1 min readLW link

Race to the Top: Bench­marks for AI Safety

Isabella Duan4 Dec 2022 18:48 UTC
28 points
6 comments1 min readLW link

Open & Wel­come Thread—De­cem­ber 2022

niplav4 Dec 2022 15:06 UTC
8 points
22 comments1 min readLW link

AI can ex­ploit safety plans posted on the Internet

Peter S. Park4 Dec 2022 12:17 UTC
−15 points
4 comments1 min readLW link

ChatGPT seems over­con­fi­dent to me

qbolec4 Dec 2022 8:03 UTC
19 points
3 comments16 min readLW link

Could an AI be Reli­gious?

mk544 Dec 2022 5:00 UTC
−12 points
14 comments1 min readLW link

Can GPT-3 Write Con­tra Dances?

jefftk4 Dec 2022 3:00 UTC
6 points
4 comments10 min readLW link
(www.jefftk.com)

Take 3: No in­de­scrib­able heav­en­wor­lds.

Charlie Steiner4 Dec 2022 2:48 UTC
23 points
12 comments2 min readLW link

Sum­mary of a new study on out-group hate (and how to fix it)

DirectedEvolution4 Dec 2022 1:53 UTC
60 points
30 comments3 min readLW link
(www.pnas.org)

[Question] Will the first AGI agent have been de­signed as an agent (in ad­di­tion to an AGI)?

nahoj3 Dec 2022 20:32 UTC
1 point
8 comments1 min readLW link

Log­i­cal in­duc­tion for soft­ware engineers

Alex Flint3 Dec 2022 19:55 UTC
160 points
8 comments27 min readLW link1 review

Chat GPT’s views on Me­ta­physics and Ethics

Cole Killian3 Dec 2022 18:12 UTC
5 points
3 comments1 min readLW link
(twitter.com)

Utili­tar­i­anism is the only op­tion

aelwood3 Dec 2022 17:14 UTC
−13 points
7 comments1 min readLW link

Our 2022 Giving

jefftk3 Dec 2022 15:40 UTC
33 points
0 comments1 min readLW link
(www.jefftk.com)

[Question] Is school good or bad?

tailcalled3 Dec 2022 13:14 UTC
10 points
76 comments1 min readLW link

MrBeast’s Squid Game Tricked Me

lsusr3 Dec 2022 5:50 UTC
73 points
1 comment2 min readLW link

Great Cry­on­ics Sur­vey of 2022

Mati_Roy3 Dec 2022 5:10 UTC
16 points
0 comments1 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
1 comment17 min readLW link

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
34 points
2 comments30 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
17 points
4 comments20 min readLW link

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
195 points
35 comments20 min readLW link1 review

Take 2: Build­ing tools to help build FAI is a le­gi­t­i­mate strat­egy, but it’s dual-use.

Charlie Steiner3 Dec 2022 0:54 UTC
17 points
1 comment2 min readLW link

D&D.Sci De­cem­ber 2022: The Boojumologist

abstractapplic2 Dec 2022 23:39 UTC
29 points
9 comments2 min readLW link

Sub­sets and quo­tients in interpretability

Erik Jenner2 Dec 2022 23:13 UTC
26 points
1 comment7 min readLW link

Re­search Prin­ci­ples for 6 Months of AI Align­ment Studies

Shoshannah Tekofsky2 Dec 2022 22:55 UTC
23 points
3 comments6 min readLW link

Three Fables of Mag­i­cal Girls and Longtermism

Ulisse Mini2 Dec 2022 22:01 UTC
31 points
11 comments2 min readLW link