An­nounc­ing Athena—Women in AI Align­ment Research

Claire ShortNov 7, 2023, 9:46 PM
80 points
2 comments3 min readLW link

Thomas Kwa’s re­search journal

Nov 23, 2023, 5:11 AM
79 points
1 comment6 min readLW link

Spa­cious­ness In Part­ner Dance: A Nat­u­ral­ism Demo

LoganStrohlNov 19, 2023, 7:00 AM
78 points
6 comments19 min readLW link1 review

Re­ac­tions to the Ex­ec­u­tive Order

ZviNov 1, 2023, 8:40 PM
77 points
4 comments29 min readLW link
(thezvi.wordpress.com)

Ly­ing Align­ment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM
77 points
17 comments1 min readLW link

An­thropic Fall 2023 De­bate Progress Update

Ansh RadhakrishnanNov 28, 2023, 5:37 AM
76 points
9 comments12 min readLW link

In­ter­pretabil­ity with Sparse Au­toen­coders (Co­lab ex­er­cises)

CallumMcDougallNov 29, 2023, 12:56 PM
76 points
9 comments4 min readLW link

Are lan­guage mod­els good at mak­ing pre­dic­tions?

dynomightNov 6, 2023, 1:10 PM
76 points
14 comments4 min readLW link
(dynomight.net)

On the UK Summit

ZviNov 7, 2023, 1:10 PM
74 points
6 comments30 min readLW link
(thezvi.wordpress.com)

An­nounc­ing New Begin­ner-friendly Book on AI Safety and Risk

Darren McKeeNov 25, 2023, 3:57 PM
74 points
3 commentsLW link

Dialogue on the Claim: “OpenAI’s Firing of Sam Alt­man (And Shortly-Sub­se­quent Events) On Net Re­duced Ex­is­ten­tial Risk From AGI”

Nov 21, 2023, 5:39 PM
73 points
84 comments11 min readLW link

Testbed evals: eval­u­at­ing AI safety even when it can’t be di­rectly mea­sured

joshcNov 15, 2023, 7:00 PM
71 points
2 comments4 min readLW link

A to Z of things

KatjaGraceNov 17, 2023, 5:20 AM
71 points
8 comments1 min readLW link1 review
(worldspiritsockpuppet.com)

Re­in­force­ment Via Giv­ing Peo­ple Cookies

ScrewtapeNov 15, 2023, 4:34 AM
70 points
9 comments6 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM
70 points
18 comments19 min readLW link

Why not elec­tric trains and ex­ca­va­tors?

bhauthNov 21, 2023, 12:07 AM
68 points
39 comments5 min readLW link
(www.bhauth.com)

Align­ment can im­prove gen­er­al­i­sa­tion through more ro­bustly do­ing what a hu­man wants—CoinRun example

Stuart_ArmstrongNov 21, 2023, 11:41 AM
67 points
9 comments3 min readLW link

AI #39: The Week of OpenAI

ZviNov 23, 2023, 3:10 PM
67 points
8 comments28 min readLW link
(thezvi.wordpress.com)

Black Box Biology

GeneSmithNov 29, 2023, 2:27 AM
65 points
30 comments2 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM
65 points
30 comments11 min readLW link

“Epistemic range of mo­tion” and LessWrong moderation

Nov 27, 2023, 9:58 PM
65 points
3 comments12 min readLW link

A free to en­ter, 240 char­ac­ter, open-source iter­ated pris­oner’s dilemma tournament

Isaac KingNov 9, 2023, 8:24 AM
64 points
19 comments1 min readLW link
(manifold.markets)

Thoughts on open source AI

Sam MarksNov 3, 2023, 3:35 PM
62 points
17 comments10 min readLW link

Paper out now on cre­a­tine and cog­ni­tive performance

FabienneNov 26, 2023, 10:58 AM
61 points
2 comments1 min readLW link

Rae­mon’s De­liber­ate (“Pur­pose­ful?”) Prac­tice Club

Nov 14, 2023, 6:24 PM
61 points
11 comments22 min readLW link

Vote on worth­while OpenAI top­ics to discuss

Nov 21, 2023, 12:03 AM
61 points
55 comments1 min readLW link

New pa­per shows truth­ful­ness & in­struc­tion-fol­low­ing don’t gen­er­al­ize by default

joshcNov 19, 2023, 7:27 PM
60 points
0 comments4 min readLW link

On OpenAI Dev Day

ZviNov 9, 2023, 4:10 PM
60 points
0 comments15 min readLW link
(thezvi.wordpress.com)

Sam Alt­man, Greg Brock­man and oth­ers from OpenAI join Microsoft

OzyrusNov 20, 2023, 8:23 AM
58 points
15 comments1 min readLW link
(twitter.com)

Ge­netic fit­ness is a mea­sure of se­lec­tion strength, not the se­lec­tion target

Kaj_SotalaNov 4, 2023, 7:02 PM
58 points
44 comments18 min readLW link

AI Align­ment Re­search Eng­ineer Ac­cel­er­a­tor (ARENA): call for applicants

CallumMcDougallNov 7, 2023, 9:43 AM
56 points
0 commentsLW link

It’s OK to be bi­ased to­wards humans

dr_sNov 11, 2023, 11:59 AM
54 points
69 comments6 min readLW link

The­o­ries of Change for AI Auditing

Nov 13, 2023, 7:33 PM
54 points
0 comments18 min readLW link
(www.apolloresearch.ai)

They are made of re­peat­ing patterns

quetzal_rainbowNov 13, 2023, 6:17 PM
53 points
4 comments2 min readLW link

AMA: Earn­ing to Give

jefftkNov 7, 2023, 4:20 PM
53 points
8 comments1 min readLW link
(www.jefftk.com)

Zvi’s Man­i­fold Mar­kets House Rules

ZviNov 13, 2023, 12:28 AM
53 points
6 comments3 min readLW link

Open Phil re­leases RFPs on LLM Bench­marks and Forecasting

LawrenceCNov 11, 2023, 3:01 AM
53 points
0 comments2 min readLW link
(www.openphilanthropy.org)

AI #37: Mov­ing Too Fast

ZviNov 9, 2023, 5:50 PM
53 points
5 comments76 min readLW link
(thezvi.wordpress.com)

OpenAI Staff (in­clud­ing Sutskever) Threaten to Quit Un­less Board Resigns

Seth HerdNov 20, 2023, 2:20 PM
52 points
28 comments1 min readLW link
(www.wired.com)

The Stochas­tic Par­rot Hy­poth­e­sis is de­bat­able for the last gen­er­a­tion of LLMs

Nov 7, 2023, 4:12 PM
52 points
21 comments6 min readLW link

In Defense of Parselmouths

ScrewtapeNov 15, 2023, 11:02 PM
51 points
11 comments10 min readLW link1 review

Poly­se­man­tic At­ten­tion Head in a 4-Layer Transformer

Nov 9, 2023, 4:16 PM
51 points
0 comments6 min readLW link

On Tap­ping Out

ScrewtapeNov 17, 2023, 3:23 AM
51 points
14 comments8 min readLW link1 review

The As­sumed In­tent Bias

silentbobNov 5, 2023, 4:28 PM
51 points
13 comments6 min readLW link

Alt­man firing re­tal­i­a­tion in­com­ing?

trevorNov 19, 2023, 12:10 AM
50 points
23 comments5 min readLW link

Ap­ply to the Con­cep­tual Boundaries Work­shop for AI Safety

ChipmonkNov 27, 2023, 9:04 PM
50 points
0 comments3 min readLW link

On Over­hangs and Tech­nolog­i­cal Change

RokoNov 5, 2023, 10:58 PM
50 points
19 comments2 min readLW link

GPT-2030 and Catas­trophic Drives: Four Vignettes

jsteinhardtNov 10, 2023, 7:30 AM
50 points
5 comments10 min readLW link
(bounded-regret.ghost.io)

Job list­ing: Com­mu­ni­ca­tions Gen­er­al­ist /​ Pro­ject Manager

Gretta DulebaNov 6, 2023, 8:21 PM
49 points
7 comments1 min readLW link

Tall Tales at Differ­ent Scales: Eval­u­at­ing Scal­ing Trends For De­cep­tion In Lan­guage Models

Nov 8, 2023, 11:37 AM
49 points
0 comments18 min readLW link