Sim­ple al­ign­ment plan that maybe works

IknownothingJul 18, 2023, 10:48 PM
4 points
8 comments1 min readLW link

Pros­pera-dump

tailcalledJul 18, 2023, 9:36 PM
11 points
16 comments1 min readLW link

Tiny Mech In­terp Pro­jects: Emer­gent Po­si­tional Embed­dings of Words

Neel NandaJul 18, 2023, 9:24 PM
51 points
1 comment9 min readLW link

Quick Thoughts on Lan­guage Models

RohanSJul 18, 2023, 8:38 PM
6 points
0 comments4 min readLW link

Still no Lie De­tec­tor for LLMs

Jul 18, 2023, 7:56 PM
50 points
2 comments21 min readLW link

Meta an­nounces Llama 2; “open sources” it for com­mer­cial use

LawrenceCJul 18, 2023, 7:28 PM
46 points
12 comments1 min readLW link
(about.fb.com)

The Rope Man­age­ment The­ory: A Com­pre­hen­sive Ap­proach to Mo­du­lat­ing Re­ward Per­cep­tion and Miti­gat­ing He­donic Adaptation

Eris DiscordiaJul 18, 2023, 5:45 PM
−23 points
2 comments3 min readLW link

AI Im­pacts Quar­terly Newslet­ter, Apr-Jun 2023

Jul 18, 2023, 5:14 PM
6 points
0 comments3 min readLW link
(blog.aiimpacts.org)

Clever ar­guers give weak ev­i­dence, not zero

dkl9Jul 18, 2023, 5:07 PM
7 points
2 comments1 min readLW link
(dkl9.net)

Mea­sur­ing and Im­prov­ing the Faith­ful­ness of Model-Gen­er­ated Rea­son­ing

Jul 18, 2023, 4:36 PM
111 points
15 comments6 min readLW link1 review

[Question] Least-prob­le­matic Re­source for learn­ing RL?

DalcyJul 18, 2023, 4:30 PM
9 points
7 comments1 min readLW link

Char­ter Cities: why they’re ex­cit­ing & how they might work

Jackson WagnerJul 18, 2023, 1:57 PM
21 points
7 commentsLW link

Nar­ra­tive The­ory. Part 6. Ar­tifi­cial Neu­ral Networks

ErisJul 18, 2023, 9:22 AM
3 points
0 comments2 min readLW link

Train for in­cor­rigi­bil­ity, then re­verse it (Shut­down Prob­lem Con­test Sub­mis­sion)

Daniel_EthJul 18, 2023, 8:26 AM
9 points
1 commentLW link

The shape of AGI: Car­toons and back of envelope

boazbarakJul 17, 2023, 8:57 PM
33 points
19 comments6 min readLW link1 review

Pre­dic­tive his­tory classes

dkl9Jul 17, 2023, 8:48 PM
68 points
17 comments2 min readLW link
(dkl9.net)

High­lights from The In­dus­trial Revolu­tion, by T. S. Ashton

jasoncrawfordJul 17, 2023, 7:02 PM
17 points
0 comments10 min readLW link
(rootsofprogress.org)

Ex­is­ten­tial Risk Per­sua­sion Tournament

PeterMcCluskeyJul 17, 2023, 6:04 PM
73 points
1 comment8 min readLW link
(bayesianinvestor.com)

[In­ter­view w/​ Rob Miles] The case for tak­ing AI Safety seriously

fowlertmJul 17, 2023, 5:08 PM
17 points
1 comment1 min readLW link

An­nounc­ing the Ex­is­ten­tial In­foSec Forum

calebp99Jul 17, 2023, 5:05 PM
10 points
0 comments2 min readLW link

Nar­ra­tive The­ory. Part 4. Neu­ral Darwinism

ErisJul 17, 2023, 4:45 PM
3 points
0 comments2 min readLW link

Sapi­ent Algorithms

ValentineJul 17, 2023, 4:30 PM
83 points
15 comments5 min readLW link

AI safety tech­ni­cal re­search—Ca­reer review

Benjamin HiltonJul 17, 2023, 3:34 PM
14 points
0 commentsLW link

[Question] Con­di­tional on liv­ing in a AI safety/​al­ign­ment by de­fault uni­verse, what are the im­pli­ca­tions of this as­sump­tion be­ing true?

Noosphere89Jul 17, 2023, 2:44 PM
26 points
10 comments1 min readLW link

Thoughts on “Pro­cess-Based Su­per­vi­sion”

Steven ByrnesJul 17, 2023, 2:08 PM
74 points
4 comments23 min readLW link

Proof of pos­te­ri­or­ity: a defense against AI-gen­er­ated misinformation

jchanJul 17, 2023, 12:04 PM
33 points
3 comments5 min readLW link

An Overview of AI risks—the Flyer

Jul 17, 2023, 12:03 PM
20 points
0 comments1 min readLW link
(docs.google.com)

[Question] Build knowl­edge base first, or backchain?

Nicholas / Heather KrossJul 17, 2023, 3:44 AM
11 points
5 comments1 min readLW link

A fic­tional AI law laced w/​ al­ign­ment theory

MiguelDevJul 17, 2023, 1:42 AM
6 points
0 comments2 min readLW link

Au­toIn­ter­pre­ta­tion Finds Sparse Cod­ing Beats Alternatives

HoagyJul 17, 2023, 1:41 AM
57 points
1 comment7 min readLW link

An up­com­ing US Supreme Court case may im­pede AI gov­er­nance efforts

NickGabsJul 16, 2023, 11:51 PM
57 points
17 comments2 min readLW link

Weak Ev­i­dence is Common

dkl9Jul 16, 2023, 11:37 PM
7 points
5 comments1 min readLW link
(dkl9.net)

Even briefer sum­mary of ai-plans.com

IknownothingJul 16, 2023, 11:25 PM
10 points
6 comments2 min readLW link
(www.ai-plans.com)

Mech In­terp Puz­zle 1: Sus­pi­ciously Similar Embed­dings in GPT-Neo

Neel NandaJul 16, 2023, 10:02 PM
67 points
15 comments1 min readLW link

A Tech­nol­ogy of Every­thing – Part 1: A Mag­i­cal Science Experiment

aiuisenseiJul 16, 2023, 10:01 PM
−3 points
0 comments7 min readLW link
(www.aiui.cloud)

AI, Con­scious­ness, and the prob­lem of Mo­ral Considerability

stultusJul 16, 2023, 7:56 PM
1 point
0 comments2 min readLW link

Nar­ra­tive The­ory. Part 3. Sim­plest to succeed

ErisJul 16, 2023, 2:41 PM
4 points
0 comments1 min readLW link

Ru­n­away Op­ti­miz­ers in Mind Space

silentbobJul 16, 2023, 2:26 PM
16 points
0 comments12 min readLW link

[Question] Is Adam Elga’s proof for thirdism in Sleep­ing Beauty still con­sid­ered to be sound?

Ape in the coatJul 16, 2023, 2:11 PM
8 points
25 comments1 min readLW link

A sim­ple way of ex­ploit­ing AI’s com­ing eco­nomic im­pact may be highly-impactful

kuiraJul 16, 2023, 9:33 AM
11 points
2 comments2 min readLW link

Ac­ti­va­tion adding ex­per­i­ments with llama-7b

Nina PanicksseryJul 16, 2023, 4:17 AM
51 points
1 comment3 min readLW link

In­tro­duc­ción al Riesgo Ex­is­ten­cial de In­teligen­cia Artificial

david.frivaJul 15, 2023, 8:37 PM
4 points
2 comments4 min readLW link
(youtu.be)

The hous­ing crisis, ex­plained us­ing game theory

JohnstoneJul 15, 2023, 8:27 PM
4 points
2 comments8 min readLW link

Only a hack can solve the shut­down problem

dpJul 15, 2023, 8:26 PM
5 points
0 comments8 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

Jul 15, 2023, 7:12 PM
47 points
5 comments9 min readLW link

[Question] How to deal with fear of failure?

TeaTieAndHatJul 15, 2023, 6:57 PM
1 point
2 comments1 min readLW link

Sim­plified bio-an­chors for up­per bounds on AI timelines

Fabien RogerJul 15, 2023, 6:15 PM
21 points
4 comments5 min readLW link

A Hill of Val­idity in Defense of Meaning

Zack_M_DavisJul 15, 2023, 5:57 PM
25 points
120 comments73 min readLW link1 review
(unremediatedgender.space)

What is a cog­ni­tive bias?

LionelJul 15, 2023, 1:01 PM
1 point
0 comments2 min readLW link
(lionelpage.substack.com)

[Question] When peo­ple say robots will steal jobs, what kinds of jobs are never im­plied?

Mary ChernyshenkoJul 15, 2023, 10:50 AM
5 points
12 comments1 min readLW link