Schel­ling game eval­u­a­tions for AI control

Olli JärviniemiOct 8, 2024, 12:01 PM
71 points
5 comments11 min readLW link

If far-UV is so great, why isn’t it ev­ery­where?

Austin ChenOct 19, 2024, 6:56 PM
70 points
23 commentsLW link
(strainhardening.substack.com)

EIS XIV: Is mechanis­tic in­ter­pretabil­ity about to be prac­ti­cally use­ful?

scasperOct 11, 2024, 10:13 PM
68 points
4 comments7 min readLW link

On Shifgrethor

JustisMillsOct 27, 2024, 3:30 PM
67 points
18 comments2 min readLW link
(justismills.substack.com)

An Opinionated Evals Read­ing List

Oct 15, 2024, 2:38 PM
65 points
0 comments13 min readLW link
(www.apolloresearch.ai)

Oc­cu­pa­tional Li­cens­ing Roundup #1

ZviOct 30, 2024, 11:00 AM
65 points
11 comments11 min readLW link
(thezvi.wordpress.com)

AI re­search as­sis­tants com­pe­ti­tion 2024Q3: Tie be­tween Elicit and You.com

ElizabethOct 12, 2024, 3:10 PM
64 points
4 comments3 min readLW link
(acesounderglass.com)

[In­tu­itive self-mod­els] 6. Awak­en­ing /​ En­light­en­ment /​ PNSE

Steven ByrnesOct 22, 2024, 1:23 PM
64 points
8 comments21 min readLW link

Elec­tro­static Air­ships?

DaemonicSigilOct 27, 2024, 4:32 AM
64 points
13 comments3 min readLW link
(pbement.com)

Slightly More Than You Wanted To Know: Preg­nancy Length Effects

JustisMillsOct 21, 2024, 1:26 AM
63 points
4 comments5 min readLW link
(justismills.substack.com)

Dario Amodei — Machines of Lov­ing Grace

Matrice JacobineOct 11, 2024, 9:43 PM
63 points
26 comments1 min readLW link
(darioamodei.com)

Linkpost: Me­moran­dum on Ad­vanc­ing the United States’ Lead­er­ship in Ar­tifi­cial Intelligence

NisanOct 25, 2024, 4:37 AM
60 points
2 comments1 min readLW link
(www.whitehouse.gov)

Against em­pa­thy-by-default

Steven ByrnesOct 16, 2024, 4:38 PM
60 points
24 comments7 min readLW link

AI Align­ment via Slow Sub­strates: Early Em­piri­cal Re­sults With StarCraft II

Lester LeongOct 14, 2024, 4:05 AM
60 points
9 comments12 min readLW link

How much I’m pay­ing for AI pro­duc­tivity soft­ware (and the fu­ture of AI use)

jacquesthibsOct 11, 2024, 5:11 PM
59 points
18 comments8 min readLW link
(jacquesthibodeau.com)

[In­tu­itive self-mod­els] 5. Dis­so­ci­a­tive Iden­tity (Mul­ti­ple Per­son­al­ity) Disorder

Steven ByrnesOct 15, 2024, 1:31 PM
59 points
7 comments11 min readLW link

AI #86: Just Think of the Potential

ZviOct 17, 2024, 3:10 PM
58 points
8 comments57 min readLW link
(thezvi.wordpress.com)

The Align­ment Trap: AI Safety as Path to Power

crispweedOct 29, 2024, 3:21 PM
57 points
17 comments5 min readLW link
(upcoder.com)

AI #87: Stay­ing in Character

ZviOct 29, 2024, 7:10 AM
57 points
3 comments33 min readLW link
(thezvi.wordpress.com)

AI #84: Bet­ter Than a Podcast

ZviOct 3, 2024, 3:00 PM
56 points
7 comments52 min readLW link
(thezvi.wordpress.com)

Safe Pre­dic­tive Agents with Joint Scor­ing Rules

Rubi J. HudsonOct 9, 2024, 4:38 PM
55 points
10 comments17 min readLW link

How Likely Are Var­i­ous Pre­cur­sors of Ex­is­ten­tial Risk?

NunoSempereOct 28, 2024, 1:27 PM
55 points
4 comments15 min readLW link
(blog.sentinel-team.org)

How might we solve the al­ign­ment prob­lem? (Part 1: In­tro, sum­mary, on­tol­ogy)

Joe CarlsmithOct 28, 2024, 9:57 PM
54 points
5 comments32 min readLW link

A path to hu­man autonomy

Nathan Helm-BurgerOct 29, 2024, 3:02 AM
53 points
16 comments20 min readLW link

Can AI Out­pre­dict Hu­mans? Re­sults From Me­tac­u­lus’s Q3 AI Fore­cast­ing Benchmark

ChristianWilliamsOct 10, 2024, 6:58 PM
53 points
2 commentsLW link
(www.metaculus.com)

can­cer rates af­ter gene therapy

bhauthOct 16, 2024, 3:32 PM
53 points
2 comments3 min readLW link
(bhauth.com)

The Mys­te­ri­ous Trump Buy­ers on Polymarket

AnnapurnaOct 18, 2024, 1:26 PM
52 points
10 comments2 min readLW link
(jorgevelez.substack.com)

Parental Writ­ing Selec­tion Bias

jefftkOct 13, 2024, 2:00 PM
52 points
3 comments1 min readLW link
(www.jefftk.com)

Prices are Bounties

Maxwell TabarrokOct 12, 2024, 2:51 PM
51 points
13 comments2 min readLW link
(www.maximum-progress.com)

[In­tu­itive self-mod­els] 7. Hear­ing Voices, and Other Hallucinations

Steven ByrnesOct 29, 2024, 1:36 PM
51 points
2 comments16 min readLW link

Claude Son­net 3.5.1 and Haiku 3.5

ZviOct 24, 2024, 2:50 PM
51 points
9 comments16 min readLW link
(thezvi.wordpress.com)

[Paper Blog­post] When Your AIs De­ceive You: Challenges with Par­tial Ob­serv­abil­ity in RLHF

Leon LangOct 22, 2024, 1:57 PM
51 points
2 comments18 min readLW link
(arxiv.org)

Low Prob­a­bil­ity Es­ti­ma­tion in Lan­guage Models

Gabriel WuOct 18, 2024, 3:50 PM
50 points
0 comments10 min readLW link
(www.alignment.org)

Toy Models of Fea­ture Ab­sorp­tion in SAEs

Oct 7, 2024, 9:56 AM
49 points
8 comments10 min readLW link

Open Source Repli­ca­tion of An­thropic’s Cross­coder pa­per for model-diffing

Oct 27, 2024, 6:46 PM
48 points
4 comments5 min readLW link

Demis Hass­abis and Ge­offrey Hin­ton Awarded No­bel Prizes

Anna GajdovaOct 9, 2024, 12:56 PM
48 points
14 comments1 min readLW link

Eval­u­at­ing the truth of state­ments in a world of am­bigu­ous lan­guage.

HastingsOct 7, 2024, 6:08 PM
48 points
19 comments2 min readLW link

D&D.Sci Coli­seum: Arena of Data Eval­u­a­tion and Ruleset

aphyerOct 29, 2024, 1:21 AM
47 points
13 comments6 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

Oct 28, 2024, 4:37 PM
46 points
4 comments15 min readLW link

Min­i­mal Mo­ti­va­tion of Nat­u­ral Latents

Oct 14, 2024, 10:51 PM
46 points
14 comments3 min readLW link

AI as a pow­er­ful meme, via CGP Grey

TheManxLoinerOct 30, 2024, 6:31 PM
46 points
8 comments4 min readLW link

An­thropic rewrote its RSP

Zach Stein-PerlmanOct 15, 2024, 2:25 PM
46 points
19 comments6 min readLW link

Mo­ti­va­tion control

Joe CarlsmithOct 30, 2024, 5:15 PM
45 points
7 comments52 min readLW link

Search­ing for phe­nom­e­nal con­scious­ness in LLMs: Per­cep­tual re­al­ity mon­i­tor­ing and in­tro­spec­tive confidence

EuanMcLeanOct 29, 2024, 12:16 PM
45 points
9 comments26 min readLW link

5 ways to im­prove CoT faithfulness

Caleb BiddulphOct 5, 2024, 8:17 PM
44 points
40 comments6 min readLW link

Open Thread Fall 2024

habrykaOct 5, 2024, 10:28 PM
44 points
193 comments1 min readLW link

Start an Up­per-Room UV In­stal­la­tion Com­pany?

jefftkOct 19, 2024, 2:00 AM
44 points
9 comments1 min readLW link
(www.jefftk.com)

MATS AI Safety Strat­egy Cur­ricu­lum v2

Oct 7, 2024, 10:44 PM
43 points
6 comments13 min readLW link

Startup Suc­cess Rates Are So Low Be­cause the Re­wards Are So Large

AppliedDivinityStudiesOct 10, 2024, 8:22 PM
42 points
6 comments2 min readLW link

IAPS: Map­ping Tech­ni­cal Safety Re­search at AI Companies

Zach Stein-PerlmanOct 24, 2024, 8:30 PM
42 points
13 commentsLW link
(www.iaps.ai)