The Over­ton Win­dow widens: Ex­am­ples of AI risk in the media

Orpheus16Mar 23, 2023, 5:10 PM
107 points
24 comments6 min readLW link

Want to pre­dict/​ex­plain/​con­trol the out­put of GPT-4? Then learn about the world, not about trans­form­ers.

Cleo NardoMar 16, 2023, 3:08 AM
107 points
26 comments5 min readLW link

Pre­dic­tions for shard the­ory mechanis­tic in­ter­pretabil­ity results

Mar 1, 2023, 5:16 AM
105 points
10 comments5 min readLW link

In­tro­duc­ing Leap Labs, an AI in­ter­pretabil­ity startup

Jessica RumbelowMar 6, 2023, 4:16 PM
103 points
12 comments1 min readLW link

On the FLI Open Letter

ZviMar 30, 2023, 4:00 PM
102 points
11 comments22 min readLW link
(thezvi.wordpress.com)

AI #4: In­tro­duc­ing GPT-4

ZviMar 21, 2023, 2:00 PM
101 points
32 comments103 min readLW link
(thezvi.wordpress.com)

Selec­tive, Cor­rec­tive, Struc­tural: Three Ways of Mak­ing So­cial Sys­tems Work

Said AchmizMar 5, 2023, 8:45 AM
100 points
13 comments2 min readLW link

LLM Mo­du­lar­ity: The Separa­bil­ity of Ca­pa­bil­ities in Large Lan­guage Models

NickyPMar 26, 2023, 9:57 PM
99 points
3 comments41 min readLW link

Truth and Ad­van­tage: Re­sponse to a draft of “AI safety seems hard to mea­sure”

So8resMar 22, 2023, 3:36 AM
98 points
10 comments5 min readLW link1 review

Learn the math­e­mat­i­cal struc­ture, not the con­cep­tual structure

Adam ShaiMar 1, 2023, 10:24 PM
98 points
35 comments2 min readLW link

New blog: Planned Obsolescence

Ajeya CotraMar 27, 2023, 7:46 PM
96 points
7 comments1 min readLW link
(www.planned-obsolescence.org)

RLHF does not ap­pear to differ­en­tially cause mode-collapse

Mar 20, 2023, 3:39 PM
95 points
9 comments3 min readLW link

AI #5: Level One Bard

ZviMar 30, 2023, 11:00 PM
95 points
9 comments47 min readLW link
(thezvi.wordpress.com)

No­body’s on the ball on AGI alignment

leopoldMar 29, 2023, 5:40 PM
94 points
38 comments9 min readLW link
(www.forourposterity.com)

Shell games

TsviBTMar 19, 2023, 10:43 AM
93 points
9 comments4 min readLW link1 review

Ab­stracts should be ei­ther Ac­tu­ally Short™, or bro­ken into paragraphs

RaemonMar 24, 2023, 12:51 AM
93 points
27 comments5 min readLW link

Google’s PaLM-E: An Em­bod­ied Mul­ti­modal Lan­guage Model

SandXboxMar 7, 2023, 4:11 AM
87 points
7 comments1 min readLW link
(palm-e.github.io)

Prac­ti­cal Pit­falls of Causal Scrubbing

Mar 27, 2023, 7:47 AM
87 points
17 comments13 min readLW link

re­flec­tions on lock­down, two years out

mingyuanMar 1, 2023, 6:58 AM
86 points
9 comments3 min readLW link

Con­tract Fraud

jefftkMar 1, 2023, 3:10 AM
86 points
10 comments1 min readLW link
(www.jefftk.com)

The epistemic virtue of scope matching

jasoncrawfordMar 15, 2023, 1:31 PM
85 points
15 comments5 min readLW link
(rootsofprogress.org)

The Kids are Not Okay

ZviMar 8, 2023, 1:30 PM
85 points
43 comments32 min readLW link
(thezvi.wordpress.com)

The 0.2 OOMs/​year target

Cleo NardoMar 30, 2023, 6:15 PM
84 points
24 comments5 min readLW link

$500 Bounty/​Con­test: Ex­plain In­fra-Bayes In The Lan­guage Of Game Theory

johnswentworthMar 25, 2023, 5:29 PM
83 points
7 comments2 min readLW link

Yud­kowsky on AGI risk on the Ban­kless podcast

Rob BensingerMar 13, 2023, 12:42 AM
83 points
5 commentsLW link

[Question] Are there spe­cific books that it might slightly help al­ign­ment to have on the in­ter­net?

AnnaSalamonMar 29, 2023, 5:08 AM
77 points
25 comments1 min readLW link

Sun­light is yel­low par­allel rays plus blue isotropic light

Thomas KehrenbergMar 1, 2023, 5:58 PM
77 points
5 comments2 min readLW link

How to Sup­port Some­one Who is Struggling

David ZellerMar 11, 2023, 6:52 PM
76 points
13 comments5 min readLW link

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofskyMar 14, 2023, 7:23 PM
76 points
17 comments15 min readLW link

Re­sponse to Tyler Cowen’s Ex­is­ten­tial risk, AI, and the in­evitable turn in hu­man history

ZviMar 28, 2023, 4:00 PM
72 points
27 comments20 min readLW link
(thezvi.wordpress.com)

A bunch of videos for in­tu­ition build­ing (2x speed, skip ones that bore you)

the gears to ascensionMar 12, 2023, 12:51 AM
72 points
5 comments4 min readLW link

Microsoft Re­search Paper Claims Sparks of Ar­tifi­cial In­tel­li­gence in GPT-4

ZviMar 24, 2023, 1:20 PM
72 points
14 comments6 min readLW link
(thezvi.wordpress.com)

Imi­ta­tion Learn­ing from Lan­guage Feedback

Mar 30, 2023, 2:11 PM
71 points
3 comments10 min readLW link

Deal­ing with in­finite entropy

Alex_AltairMar 1, 2023, 3:01 PM
70 points
9 comments11 min readLW link

AI Safety in a World of Vuln­er­a­ble Ma­chine Learn­ing Systems

Mar 8, 2023, 2:40 AM
70 points
29 comments29 min readLW link
(far.ai)

Prob­a­bil­is­tic Payor Lemma?

abramdemskiMar 19, 2023, 5:57 PM
69 points
7 comments4 min readLW link

Sparks of Ar­tifi­cial Gen­eral In­tel­li­gence: Early ex­per­i­ments with GPT-4 | Microsoft Research

DragonGodMar 23, 2023, 5:45 AM
68 points
23 comments1 min readLW link
(arxiv.org)

Plan for mediocre al­ign­ment of brain-like [model-based RL] AGI

Steven ByrnesMar 13, 2023, 2:11 PM
68 points
25 comments12 min readLW link

AI #2

ZviMar 2, 2023, 2:50 PM
66 points
18 comments55 min readLW link
(thezvi.wordpress.com)

Ta­boo­ing “Frame Con­trol”

RaemonMar 19, 2023, 11:33 PM
66 points
41 comments10 min readLW link

[Question] What hap­pened to the OpenPhil OpenAI board seat?

ChristianKlMar 15, 2023, 4:59 PM
65 points
2 comments1 min readLW link

Ja­pan AI Align­ment Conference

Mar 10, 2023, 6:56 AM
64 points
7 comments1 min readLW link
(www.conjecture.dev)

Syd­ney can play chess and kind of keep track of the board state

Erik JennerMar 3, 2023, 9:39 AM
64 points
19 comments6 min readLW link

Some com­mon con­fu­sion about in­duc­tion heads

Alexandre VariengienMar 28, 2023, 9:51 PM
64 points
4 comments5 min readLW link

Tran­script: NBC Nightly News: AI ‘race to reck­less­ness’ w/​ Tris­tan Har­ris, Aza Raskin

WilliamKielyMar 23, 2023, 1:04 AM
63 points
4 comments3 min readLW link

Why do we as­sume there is a “real” shog­goth be­hind the LLM? Why not masks all the way down?

Robert_AIZIMar 9, 2023, 5:28 PM
63 points
48 comments2 min readLW link

Sam Alt­man on GPT-4, ChatGPT, and the Fu­ture of AI | Lex Frid­man Pod­cast #367

Gabe MMar 25, 2023, 7:08 PM
63 points
4 comments2 min readLW link
(www.youtube.com)

Payor’s Lemma in Nat­u­ral Language

Andrew_CritchMar 2, 2023, 12:22 PM
62 points
0 comments2 min readLW link

The Prospect of an AI Winter

Erich_GrunewaldMar 27, 2023, 8:55 PM
62 points
24 comments15 min readLW link
(www.erichgrunewald.com)

You Can’t Pre­dict a Game of Pinball

Jeffrey HeningerMar 30, 2023, 12:40 AM
61 points
13 comments6 min readLW link1 review
(aiimpacts.org)