AI Safety − 7 months of dis­cus­sion in 17 minutes

Zoe Williams15 Mar 2023 23:41 UTC
25 points
0 comments1 min readLW link

How well did Man­i­fold pre­dict GPT-4?

David Chee15 Mar 2023 23:19 UTC
48 points
5 comments2 min readLW link

Over­ton’s Basilisk

Alex Beyman15 Mar 2023 21:54 UTC
−20 points
0 comments5 min readLW link

80k pod­cast epi­sode on sen­tience in AI systems

Robbo15 Mar 2023 20:19 UTC
15 points
0 comments13 min readLW link
(80000hours.org)

GPT-4: What we (I) know about it

Robert_AIZI15 Mar 2023 20:12 UTC
40 points
29 comments12 min readLW link
(aizi.substack.com)

Grad­ing on Word Count

niederman15 Mar 2023 19:17 UTC
13 points
6 comments1 min readLW link
(maxniederman.com)

How to Es­cape From the Si­mu­la­tion (Seeds of Science)

rogersbacon15 Mar 2023 18:46 UTC
1 point
1 comment1 min readLW link

Towards un­der­stand­ing-based safety evaluations

evhub15 Mar 2023 18:18 UTC
152 points
16 comments5 min readLW link

New­comb’s para­dox com­plete solu­tion.

Augs SMSHacks15 Mar 2023 17:56 UTC
−12 points
13 comments3 min readLW link

Why not just boy­cott LLMs?

lmbp15 Mar 2023 17:55 UTC
11 points
5 comments3 min readLW link

The Ethics of Eat­ing Seafood: A Ra­tional Discussion

Jonathan Grant15 Mar 2023 17:55 UTC
1 point
2 comments2 min readLW link

ChatGPT (and now GPT4) is very eas­ily dis­tracted from its rules

dmcs15 Mar 2023 17:55 UTC
178 points
41 comments1 min readLW link

[Question] What hap­pened to the OpenPhil OpenAI board seat?

ChristianKl15 Mar 2023 16:59 UTC
65 points
2 comments1 min readLW link

No­kens: A po­ten­tial method of in­ves­ti­gat­ing glitch tokens

Hoagy15 Mar 2023 16:23 UTC
20 points
0 comments4 min readLW link

The epistemic virtue of scope matching

jasoncrawford15 Mar 2023 13:31 UTC
85 points
15 comments5 min readLW link
(rootsofprogress.org)

POC || GTFO cul­ture as par­tial an­ti­dote to al­ign­ment wordcelism

lc15 Mar 2023 10:21 UTC
144 points
10 comments7 min readLW link

Just Pivot to AI: The se­cret is out

sapphire15 Mar 2023 6:26 UTC
16 points
1 comment2 min readLW link

Bushels Are Com­mod­ity-Specific

jefftk15 Mar 2023 2:00 UTC
29 points
0 comments2 min readLW link
(www.jefftk.com)

ARC tests to see if GPT-4 can es­cape hu­man con­trol; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC
116 points
22 comments2 min readLW link

Shut­ting Down the Light­cone Offices

14 Mar 2023 22:47 UTC
337 points
93 comments17 min readLW link

[Question] What are some ideas that LessWrong has rein­vented?

RomanHauksson14 Mar 2023 22:27 UTC
4 points
13 comments1 min readLW link

Hu­man prefer­ences as RL critic val­ues—im­pli­ca­tions for alignment

Seth Herd14 Mar 2023 22:10 UTC
21 points
6 comments6 min readLW link

Paper­clipGPT(-4)

Michael Tontchev14 Mar 2023 22:03 UTC
7 points
0 comments11 min readLW link

GPT-4 de­vel­oper livestream

Gerald Monroe14 Mar 2023 20:55 UTC
9 points
0 comments1 min readLW link
(www.youtube.com)

[Question] Main ac­tors in the AI race

Marta14 Mar 2023 20:50 UTC
3 points
1 comment1 min readLW link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC
74 points
8 comments15 min readLW link

GPT can write Quines now (GPT-4)

Andrew_Critch14 Mar 2023 19:18 UTC
111 points
30 comments1 min readLW link

Vec­tor se­man­tics and the (in-con­text) con­struc­tion of mean­ing in Col­eridge’s “Kubla Khan”

Bill Benzon14 Mar 2023 19:16 UTC
4 points
0 comments7 min readLW link

A bet­ter anal­ogy and ex­am­ple for teach­ing AI takeover: the ML Inferno

Christopher King14 Mar 2023 19:14 UTC
18 points
0 comments5 min readLW link

PaLM API & MakerSuite

Gabriel Mukobi14 Mar 2023 19:08 UTC
20 points
1 comment1 min readLW link
(developers.googleblog.com)

What is a defi­ni­tion, how can it be ex­trap­o­lated?

Stuart_Armstrong14 Mar 2023 18:08 UTC
34 points
5 comments7 min readLW link

Cam­bridge LW: Ra­tion­al­ity Prac­tice: The Map is Not the Territory

Darmani14 Mar 2023 17:56 UTC
6 points
0 comments1 min readLW link

[Question] Benefi­cial ini­tial con­di­tions for AGI

mikbp14 Mar 2023 17:41 UTC
1 point
3 comments1 min readLW link

[Question] “The elephant in the room: the biggest risk of ar­tifi­cial in­tel­li­gence may not be what we think” What to say about that?

Obladi Oblada14 Mar 2023 17:37 UTC
−5 points
0 comments3 min readLW link

GPT-4

nz14 Mar 2023 17:02 UTC
150 points
149 comments1 min readLW link
(openai.com)

Sto­ry­tel­ling Makes GPT-3.5 Deon­tol­o­gist: Un­ex­pected Effects of Con­text on LLM Behavior

14 Mar 2023 8:44 UTC
17 points
0 comments12 min readLW link

Fore­cast­ing Author­i­tar­ian and Sovereign Power uses of Large Lan­guage Models

K. Liam Smith14 Mar 2023 8:44 UTC
7 points
0 comments8 min readLW link
(taboo.substack.com)

Fixed points in mor­tal pop­u­la­tion games

ViktoriaMalyasova14 Mar 2023 7:10 UTC
24 points
0 comments12 min readLW link
(www.lesswrong.com)

To de­ter­mine al­ign­ment difficulty, we need to know the ab­solute difficulty of al­ign­ment generalization

Jeffrey Ladish14 Mar 2023 3:52 UTC
12 points
3 comments2 min readLW link

EA & LW Fo­rum Weekly Sum­mary (6th − 12th March 2023)

Zoe Williams14 Mar 2023 3:01 UTC
7 points
0 comments1 min readLW link

Al­paca: A Strong Open-Source In­struc­tion-Fol­low­ing Model

sanxiyn14 Mar 2023 2:41 UTC
26 points
2 comments1 min readLW link
(crfm.stanford.edu)

Dis­cus­sion with Nate Soares on a key al­ign­ment difficulty

HoldenKarnofsky13 Mar 2023 21:20 UTC
250 points
38 comments22 min readLW link

What Dis­cov­er­ing La­tent Knowl­edge Did and Did Not Find

Fabien Roger13 Mar 2023 19:29 UTC
164 points
16 comments11 min readLW link

South Bay ACX/​LW Meetup

IS13 Mar 2023 18:25 UTC
2 points
0 comments1 min readLW link

Could Roko’s basilisk acausally bar­gain with a pa­per­clip max­i­mizer?

Christopher King13 Mar 2023 18:21 UTC
1 point
8 comments1 min readLW link

Bayesian op­ti­miza­tion to find molecules that bind to proteins

rotatingpaguro13 Mar 2023 18:17 UTC
1 point
0 comments1 min readLW link
(www.youtube.com)

Linkpost: ‘Dis­solv­ing’ AI Risk – Pa­ram­e­ter Uncer­tainty in AI Fu­ture Forecasting

DavidW13 Mar 2023 16:52 UTC
6 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

De­cen­tral­ized Exclusion

jefftk13 Mar 2023 15:50 UTC
23 points
19 comments2 min readLW link
(www.jefftk.com)

Linkpost: A Con­tra AI FOOM Read­ing List

DavidW13 Mar 2023 14:45 UTC
25 points
4 comments1 min readLW link
(magnusvinding.com)

Linkpost: A tale of 2.5 or­thog­o­nal­ity theses

DavidW13 Mar 2023 14:19 UTC
9 points
3 comments1 min readLW link
(forum.effectivealtruism.org)