In favour of ex­plor­ing nag­ging doubts about x-risk

owencbJun 25, 2024, 11:52 PM
105 points
2 commentsLW link

What is a Tool?

Jun 25, 2024, 11:40 PM
62 points
4 comments6 min readLW link

[Question] When do al­ign­ment re­searchers re­tire?

Jordan TaylorJun 25, 2024, 11:30 PM
4 points
2 comments1 min readLW link

Com­pute Gover­nance Liter­a­ture Re­view

sijarvisJun 25, 2024, 10:41 PM
11 points
0 comments13 min readLW link

Com­pu­ta­tional Com­plex­ity as an In­tu­ition Pump for LLM Gen­er­al­ity

aribrillJun 25, 2024, 8:25 PM
18 points
6 comments3 min readLW link

Failure Modes of Teach­ing AI Safety

Eleni AngelouJun 25, 2024, 7:07 PM
20 points
0 comments1 min readLW link

Kingfisher Sum­mer Tour 2024

jefftkJun 25, 2024, 6:50 PM
9 points
0 comments1 min readLW link
(www.jefftk.com)

In­cen­tive Learn­ing vs Dead Sea Salt Experiment

Steven ByrnesJun 25, 2024, 5:49 PM
30 points
1 comment28 min readLW link

An In­tu­itive Ex­pla­na­tion of Sparse Au­toen­coders for Mechanis­tic In­ter­pretabil­ity of LLMs

Adam KarvonenJun 25, 2024, 3:57 PM
27 points
0 comments9 min readLW link
(adamkarvonen.github.io)

For­mal ver­ifi­ca­tion, heuris­tic ex­pla­na­tions and sur­prise accounting

Jacob_HiltonJun 25, 2024, 3:40 PM
156 points
11 comments9 min readLW link
(www.alignment.org)

Me­tas­trat­egy get-started guide

TahpJun 25, 2024, 3:04 PM
6 points
1 comment8 min readLW link

La­bor Par­ti­ci­pa­tion is an Align­ment Risk

alexJun 25, 2024, 2:15 PM
−5 points
2 comments17 min readLW link

Monthly Roundup #19: June 2024

ZviJun 25, 2024, 12:00 PM
28 points
9 comments54 min readLW link
(thezvi.wordpress.com)

Reg­u­larly meta-optimization

Crazy philosopherJun 25, 2024, 6:12 AM
−4 points
6 comments1 min readLW link

Memet­ics as an anal­ogy and its im­plicit connotations

Rachel ShuJun 25, 2024, 5:13 AM
3 points
0 comments3 min readLW link

Mis­takes peo­ple make when think­ing about units

Isaac KingJun 25, 2024, 3:39 AM
74 points
14 comments7 min readLW link

Higher-effort sum­mer sols­tice: What if we used AI (i.e., An­gel Is­land)?

Rachel ShuJun 25, 2024, 1:35 AM
46 points
9 comments3 min readLW link

I’m a bit skep­ti­cal of AlphaFold 3

Oleg TrottJun 25, 2024, 12:04 AM
87 points
14 comments2 min readLW link

Be­ing hella lost as ra­tio­nal­ity practice

Rachel ShuJun 24, 2024, 11:50 PM
14 points
0 comments2 min readLW link

A Ba­sic Eco­nomics-Style Model of AI Ex­is­ten­tial Risk

Rubi J. HudsonJun 24, 2024, 8:26 PM
24 points
3 comments7 min readLW link

The Minor­ity Coalition

Richard_NgoJun 24, 2024, 8:01 PM
103 points
9 comments5 min readLW link
(www.narrativeark.xyz)

Com­pact Proofs of Model Perfor­mance via Mechanis­tic Interpretability

Jun 24, 2024, 7:27 PM
97 points
4 comments8 min readLW link
(arxiv.org)

Con­tra­pos­i­tive Nat­u­ral Ab­strac­tion—Pro­ject Intro

Elliot CallenderJun 24, 2024, 6:37 PM
4 points
5 comments2 min readLW link

Sparse Fea­tures Through Time

Rogan InglisJun 24, 2024, 6:06 PM
12 points
1 comment1 min readLW link
(roganinglis.io)

PSA: Con­sider al­ter­na­tives to AUROC when re­port­ing clas­sifier met­rics for alignment

rpglover64Jun 24, 2024, 5:53 PM
18 points
1 comment3 min readLW link

Pay­ing Rus­si­ans to not in­vade Ukraine

djColliderBiasJun 24, 2024, 5:46 PM
9 points
7 comments3 min readLW link

SAE fea­ture ge­om­e­try is out­side the su­per­po­si­tion hypothesis

jake_mendelJun 24, 2024, 4:07 PM
228 points
17 comments11 min readLW link

So you want to work on tech­ni­cal AI safety

gwJun 24, 2024, 2:29 PM
51 points
3 comments14 min readLW link

The Fu­ture of Work: How Can Poli­cy­mak­ers Pre­pare for AI’s Im­pact on La­bor Mar­kets?

Jun 24, 2024, 2:18 PM
5 points
0 comments3 min readLW link

LLM Gen­er­al­ity is a Timeline Crux

eggsyntaxJun 24, 2024, 12:52 PM
218 points
119 comments7 min readLW link

On Claude 3.5 Sonnet

ZviJun 24, 2024, 12:00 PM
95 points
14 comments13 min readLW link
(thezvi.wordpress.com)

Book Re­view: Righ­teous Vic­tims—A His­tory of the Zion­ist-Arab Conflict

Yair HalberstadtJun 24, 2024, 11:02 AM
53 points
8 comments34 min readLW link

The Liv­ing Planet In­dex: A Case Study in Statis­ti­cal Pitfalls

Jan_KulveitJun 24, 2024, 10:05 AM
24 points
0 comments4 min readLW link
(www.nature.com)

Sci-Fi books micro-reviews

Yair HalberstadtJun 24, 2024, 9:49 AM
44 points
27 comments4 min readLW link

A Step Against Land Value Tax

Blog AltJun 24, 2024, 5:13 AM
9 points
23 comments6 min readLW link
(antematters.substack.com)

Differ­ent senses in which two AIs can be “the same”

Jun 24, 2024, 3:16 AM
69 points
2 comments4 min readLW link

Talk: AI safety field­build­ing at MATS

Ryan KiddJun 23, 2024, 11:06 PM
26 points
2 comments10 min readLW link

AI Labs Wouldn’t be Con­victed of Trea­son or Sedition

Matthew KhoriatyJun 23, 2024, 9:34 PM
9 points
2 comments3 min readLW link

Con­trol Vec­tors as Dis­po­si­tional Traits

Gianluca CalcagniJun 23, 2024, 9:34 PM
10 points
0 comments11 min readLW link

“On the Im­pos­si­bil­ity of Su­per­in­tel­li­gent Ru­bik’s Cube Solvers”, Claude 2024 [hu­mor]

gwernJun 23, 2024, 9:18 PM
22 points
6 comments1 min readLW link
(gwern.net)

[Question] How are you prepar­ing for the pos­si­bil­ity of an AI bust?

Nate ShowellJun 23, 2024, 7:13 PM
26 points
16 comments1 min readLW link

A sim­ple text sta­tus can change something

nextcallerJun 23, 2024, 6:48 PM
5 points
0 comments2 min readLW link

35 In­ter­ac­tive Learn­ing Mo­d­ules Rele­vant to EAs /​ Effec­tive Altru­ism (that are all free)

spencergJun 23, 2024, 5:57 PM
5 points
0 commentsLW link

Pod­casts: AGI Show, Con­sis­tently Can­did, Lon­don Futurists

KatjaGraceJun 23, 2024, 1:50 PM
16 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Text Posts from the Kids Group: 2019

jefftkJun 23, 2024, 1:20 PM
23 points
0 comments18 min readLW link
(www.jefftk.com)

Pop­u­la­tion ethics and the value of variety

cousin_itJun 23, 2024, 10:42 AM
24 points
11 comments2 min readLW link

[Question] Karma votes: blind to or ac­count­ing for score?

cata22 Jun 2024 21:40 UTC
19 points
4 comments1 min readLW link

[Question] Should effec­tive al­tru­ism be more “cool”?

jaredmantell22 Jun 2024 20:42 UTC
3 points
3 comments1 min readLW link

Meta Align­ment: Com­mu­ni­ca­tion Wack-a-Mole

Bridgett Kay22 Jun 2024 20:12 UTC
16 points
2 comments5 min readLW link
(dxmrevealed.wordpress.com)

AI as a com­put­ing plat­form: what to expect

Jonasb22 Jun 2024 19:55 UTC
−3 points
0 comments7 min readLW link
(www.denominations.io)