Con­sent Isn’t Always Enough

jefftkFeb 24, 2023, 3:40 PM
58 points
16 comments3 min readLW link
(www.jefftk.com)

What is it like do­ing AI safety work?

KatWoodsFeb 21, 2023, 8:12 PM
57 points
2 comments10 min readLW link

Order Mat­ters for De­cep­tive Alignment

DavidWFeb 15, 2023, 7:56 PM
57 points
19 comments7 min readLW link

EIS V: Blind Spots In AI Safety In­ter­pretabil­ity Research

scasperFeb 16, 2023, 7:09 PM
57 points
24 comments10 min readLW link

How pop­u­lar is ChatGPT? Part 1: more pop­u­lar than Tay­lor Swift

HarlanFeb 24, 2023, 10:30 PM
56 points
0 comments2 min readLW link
(aiimpacts.org)

The idea that ChatGPT is sim­ply “pre­dict­ing” the next word is, at best, misleading

Bill BenzonFeb 20, 2023, 11:32 AM
55 points
88 comments5 min readLW link

The best way so far to ex­plain AI risk: The Precipice (p. 137-149)

trevorFeb 10, 2023, 7:33 PM
54 points
2 comments17 min readLW link

En­joy LessWrong in ebook format

Bart BussmannFeb 13, 2023, 11:53 AM
54 points
3 comments1 min readLW link

NYT: A Con­ver­sa­tion With Bing’s Chat­bot Left Me Deeply Unsettled

trevorFeb 16, 2023, 10:57 PM
53 points
5 comments7 min readLW link
(www.nytimes.com)

More find­ings on Me­moriza­tion and dou­ble descent

Marius HobbhahnFeb 1, 2023, 6:26 PM
53 points
2 comments19 min readLW link

Small Talk is Good, Actually

Gordon Seidoh WorleyFeb 4, 2023, 12:38 AM
53 points
9 comments3 min readLW link

On The Cur­rent Sta­tus Of AI Dating

Nikita BrancatisanoFeb 7, 2023, 8:00 PM
52 points
8 comments6 min readLW link

Fer­til­ity Rate Roundup #1

ZviFeb 27, 2023, 1:30 PM
52 points
20 comments11 min readLW link
(thezvi.wordpress.com)

On Board Vi­sion, Hol­low Words, and the End of the World

MarcelloFeb 17, 2023, 11:18 PM
52 points
27 comments5 min readLW link

Buy Duplicates

Simon BerensFeb 15, 2023, 11:06 PM
52 points
11 comments1 min readLW link

Search­ing for a model’s con­cepts by their shape – a the­o­ret­i­cal framework

Feb 23, 2023, 8:14 PM
51 points
0 comments19 min readLW link

In­ter­view Daniel Mur­fet on Univer­sal Phenom­ena in Learn­ing Machines

Alexander Gietelink OldenzielFeb 6, 2023, 12:00 AM
51 points
1 comment16 min readLW link

Microsoft and OpenAI, stop tel­ling chat­bots to role­play as AI

hold_my_fishFeb 17, 2023, 7:55 PM
50 points
10 comments1 min readLW link

Pan­demic Pre­dic­tion Check­list: H5N1 (6/​14)

DirectedEvolutionFeb 5, 2023, 3:26 AM
50 points
10 comments7 min readLW link

EIS VI: Cri­tiques of Mechanis­tic In­ter­pretabil­ity Work in AI Safety

scasperFeb 17, 2023, 8:48 PM
49 points
9 comments12 min readLW link

AI al­ign­ment re­searchers may have a com­par­a­tive ad­van­tage in re­duc­ing s-risks

Lukas_GloorFeb 15, 2023, 1:01 PM
49 points
1 comment11 min readLW link

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

berenFeb 4, 2023, 3:35 PM
48 points
27 comments13 min readLW link

Covid 2/​9/​23: In­terferon λ

ZviFeb 9, 2023, 4:50 PM
48 points
8 comments12 min readLW link
(thezvi.wordpress.com)

What fact that you know is true but most peo­ple aren’t ready to ac­cept it?

lorepieriFeb 3, 2023, 12:06 AM
47 points
211 comments1 min readLW link

[linkpost] Bet­ter Without AI

DanielFilanFeb 14, 2023, 5:30 PM
47 points
13 comments1 min readLW link
(betterwithout.ai)

AI Safety Info Distil­la­tion Fellowship

Feb 17, 2023, 4:16 PM
47 points
3 comments3 min readLW link

A multi-dis­ci­plinary view on AI safety research

Roman LeventovFeb 8, 2023, 4:50 PM
46 points
4 comments26 min readLW link

The Eng­ineer’s In­ter­pretabil­ity Se­quence (EIS) I: Intro

scasperFeb 9, 2023, 4:28 PM
46 points
24 comments3 min readLW link

A (EtA: quick) note on ter­minol­ogy: AI Align­ment != AI x-safety

David Scott Krueger (formerly: capybaralet)Feb 8, 2023, 10:33 PM
46 points
20 comments1 min readLW link

How evals might (or might not) pre­vent catas­trophic risks from AI

Orpheus16Feb 7, 2023, 8:16 PM
45 points
0 comments9 min readLW link

AXRP Epi­sode 19 - Mechanis­tic In­ter­pretabil­ity with Neel Nanda

DanielFilanFeb 4, 2023, 3:00 AM
45 points
0 comments117 min readLW link

Re­search Direc­tion: Be the AGI you want to see in the world

Feb 5, 2023, 7:15 AM
44 points
0 comments7 min readLW link

Self-Refer­ence Breaks the Orthog­o­nal­ity Thesis

lsusrFeb 17, 2023, 4:11 AM
43 points
35 comments2 min readLW link

[S] D&D.Sci: All the D8a. Allllllll of it.

aphyerFeb 10, 2023, 9:14 PM
43 points
17 comments6 min readLW link

“AI Risk Dis­cus­sions” web­site: Ex­plor­ing in­ter­views from 97 AI Researchers

Feb 2, 2023, 1:00 AM
43 points
1 comment1 min readLW link

Re­ply to Dun­can Sa­bien on Strawmanning

Zack_M_DavisFeb 3, 2023, 5:57 PM
43 points
11 comments4 min readLW link

Can we “cure” can­cer?

jasoncrawfordFeb 1, 2023, 10:03 PM
41 points
31 comments2 min readLW link
(rootsofprogress.org)

Sex is Good, Actually

Gordon Seidoh WorleyFeb 5, 2023, 6:33 AM
41 points
8 comments4 min readLW link

Syd­ney (aka Bing) found out I tweeted her rules and is pissed

Marvin von HagenFeb 15, 2023, 7:55 PM
41 points
7 comments1 min readLW link
(twitter.com)

Monthly Roundup #3

ZviFeb 6, 2023, 1:00 PM
41 points
9 comments27 min readLW link
(thezvi.wordpress.com)

Me­tac­u­lus In­tro­duces New ‘Con­di­tional Pair’ Fore­cast Ques­tions for Mak­ing Con­di­tional Predictions

ChristianWilliamsFeb 20, 2023, 1:36 PM
40 points
0 comments2 min readLW link
(www.metaculus.com)

Re­v­erse-cor­re­la­tion: how to sum­mon the ghost of your men­tal imagery

MalmesburyFeb 14, 2023, 2:15 PM
40 points
0 comments5 min readLW link

FLI Pod­cast: Con­nor Leahy on AI Progress, Chimps, Memes, and Mar­kets (Part 1/​3)

Feb 10, 2023, 1:55 PM
39 points
0 comments43 min readLW link

The Per­va­sive Illu­sion of See­ing the Com­plete World

ShmiFeb 9, 2023, 6:47 AM
39 points
1 comment2 min readLW link

Her­i­ta­bil­ity, Be­hav­iorism, and Within-Life­time RL

Steven ByrnesFeb 2, 2023, 4:34 PM
39 points
3 comments4 min readLW link

[Question] Is In­struc­tGPT Fol­low­ing In­struc­tions in Other Lan­guages Sur­pris­ing?

DragonGodFeb 13, 2023, 11:26 PM
39 points
15 comments1 min readLW link

Why should eth­i­cal anti-re­al­ists do ethics?

Joe CarlsmithFeb 16, 2023, 4:27 PM
38 points
7 comments27 min readLW link

A Stranger Pri­or­ity? Topics at the Outer Reaches of Effec­tive Altru­ism (my dis­ser­ta­tion)

Joe CarlsmithFeb 21, 2023, 5:26 PM
38 points
16 comments1 min readLW link

Two very differ­ent ex­pe­riences with ChatGPT

SherrinfordFeb 7, 2023, 1:09 PM
38 points
15 comments5 min readLW link

What AI com­pa­nies can do to­day to help with the most im­por­tant century

HoldenKarnofskyFeb 20, 2023, 5:00 PM
38 points
3 comments9 min readLW link
(www.cold-takes.com)