Easy Op­por­tu­nity to Help Many Animals

Bentham's Bulldog21 Nov 2025 23:03 UTC
10 points
0 comments1 min readLW link

Why Not Just Train For In­ter­pretabil­ity?

johnswentworth21 Nov 2025 22:08 UTC
56 points
12 comments4 min readLW link

Com­plain­ing about my in­abil­ity to fo­cus on un­in­ter­est­ing things

Dentosal21 Nov 2025 20:34 UTC
5 points
3 comments2 min readLW link

Models not mak­ing it clear when they’re role­play­ing seems like a fairly big issue

williawa21 Nov 2025 20:23 UTC
16 points
3 comments6 min readLW link

Nat­u­ral Emer­gent Misal­ign­ment from Re­ward Hacking

Algon21 Nov 2025 20:20 UTC
12 points
0 comments3 min readLW link
(www.anthropic.com)

Nat­u­ral emer­gent mis­al­ign­ment from re­ward hack­ing in pro­duc­tion RL

21 Nov 2025 20:00 UTC
258 points
32 comments9 min readLW link

Eight Heuris­tics of Anti-Epistemology

Ben Pace21 Nov 2025 19:54 UTC
44 points
2 comments6 min readLW link

We won’t solve post-al­ign­ment prob­lems by do­ing research

MichaelDickens21 Nov 2025 18:03 UTC
24 points
11 comments4 min readLW link

Can Ar­tifi­cial In­tel­li­gence Be Con­scious?

Bentham's Bulldog21 Nov 2025 16:43 UTC
15 points
5 comments7 min readLW link

Gem­ini 3: Model Card and Safety Frame­work Report

Zvi21 Nov 2025 16:40 UTC
33 points
0 comments11 min readLW link
(thezvi.wordpress.com)

Lorxus Does Halfhaven: 11/​15~11/​21

Lorxus21 Nov 2025 16:07 UTC
7 points
0 comments1 min readLW link
(tiled-with-pentagons.blogspot.com)

EA Ho­tel Solstice

plex21 Nov 2025 15:13 UTC
8 points
0 comments1 min readLW link

Why Does Em­pa­thy Have an Off-Switch?

J Bostock21 Nov 2025 14:56 UTC
9 points
1 comment7 min readLW link

What Do We Tell the Hu­mans? Er­rors, Hal­lu­ci­na­tions, and Lies in the AI Village

Shoshannah Tekofsky21 Nov 2025 14:19 UTC
56 points
0 comments9 min readLW link

URGENT @ev­ery­one—help us kill AI pre­emp­tion (again) be­fore this Friday

21 Nov 2025 12:51 UTC
−1 points
0 comments1 min readLW link

Should I Ap­ply to a 3.5% Ac­cep­tance-Rate Fel­low­ship? A Sim­ple EV Calculator

Tobias H21 Nov 2025 10:59 UTC
16 points
0 comments5 min readLW link

Towards Hu­man­ist Superintelligence

Chris_Leong21 Nov 2025 10:22 UTC
17 points
3 comments1 min readLW link
(microsoft.ai)

16 Writ­ing Tips from Inkhaven

dreeves21 Nov 2025 7:49 UTC
13 points
1 comment2 min readLW link

Read­ing My Diary: 10 Years Since CFAR

Ben Pace21 Nov 2025 7:27 UTC
71 points
1 comment6 min readLW link

The Wor­ry­ing Na­ture of Akrasia

Notelrac21 Nov 2025 7:00 UTC
2 points
0 comments4 min readLW link

10 Key In­sights from the “Fron­tier AI Risk Mon­i­tor­ing Plat­form”

Weibing Wang21 Nov 2025 6:07 UTC
3 points
0 comments2 min readLW link

Con­tra Col­listeru: You Get About One Carthage

Screwtape21 Nov 2025 5:33 UTC
36 points
2 comments5 min readLW link

In­finites­i­mally False

21 Nov 2025 4:57 UTC
55 points
16 comments12 min readLW link

Prefer­ences are confusing

RobertM21 Nov 2025 3:07 UTC
28 points
1 comment2 min readLW link

Can ques­tions rigidly des­ig­nate in­ten­tions?

Mason Broxham21 Nov 2025 2:00 UTC
1 point
0 comments5 min readLW link

Week 3: Ad­ver­sar­ial Robustness

Ely Hahami21 Nov 2025 1:43 UTC
1 point
0 comments3 min readLW link

In­formed Con­sent as the Sole Cri­te­rion for Med­i­cal Treatment

Character#273621 Nov 2025 1:39 UTC
7 points
2 comments4 min readLW link

Suicide Preven­tion Ought To Be Illegal

Character#273621 Nov 2025 1:39 UTC
−17 points
17 comments6 min readLW link

How you got RL’d into your idiosyn­cratic cognition

Ruby21 Nov 2025 1:06 UTC
16 points
6 comments6 min readLW link

PSA: For Chronic In­fec­tions, Check Teeth

Algon20 Nov 2025 23:14 UTC
15 points
2 comments1 min readLW link

[Paper] Out­put Su­per­vi­sion Can Obfus­cate the CoT

20 Nov 2025 22:41 UTC
92 points
3 comments5 min readLW link
(arxiv.org)

The Bor­ing Part of Bell Labs

Elizabeth20 Nov 2025 22:40 UTC
133 points
0 comments15 min readLW link
(acesounderglass.com)

What the term “Mass Com­mu­ni­ca­tion” ges­tures at

TristanTrim20 Nov 2025 22:34 UTC
3 points
0 comments7 min readLW link

Dom­i­nance: The Stan­dard Every­day Solu­tion To Akrasia

johnswentworth20 Nov 2025 21:42 UTC
50 points
22 comments2 min readLW link

Do One Neat Thing vs. Get Work Done

Kaj_Sotala20 Nov 2025 21:33 UTC
23 points
0 comments7 min readLW link

Gem­ini 3 is Eval­u­a­tion-Para­noid and Contaminated

Alice Blair20 Nov 2025 21:02 UTC
180 points
42 comments7 min readLW link

Cur­rent LLM agents need strong pres­sure to en­gage in schem­ing behavior

20 Nov 2025 20:45 UTC
23 points
0 comments11 min readLW link

Try see­ing art

foodforthought20 Nov 2025 19:25 UTC
10 points
1 comment5 min readLW link

AI #143: Every­thing, Every­where, All At Once

Zvi20 Nov 2025 18:22 UTC
37 points
2 comments65 min readLW link
(thezvi.wordpress.com)

Think­ing about rea­son­ing mod­els made me less wor­ried about scheming

Fabien Roger20 Nov 2025 18:20 UTC
89 points
7 comments12 min readLW link

Defin­ing AI Truth-Seek­ing by What It Is Not

Tianyi (Alex) Qiu20 Nov 2025 16:45 UTC
21 points
1 comment10 min readLW link

Restrict­ing Danger­ous Re­search: Has It Worked Be­fore, and Could It Work for AI?

jleibowich20 Nov 2025 16:45 UTC
12 points
1 comment16 min readLW link
(samotsvety.com)

Per­sis­tence Ethics

Suspended Reason20 Nov 2025 16:27 UTC
7 points
2 comments5 min readLW link

Should we shun the leg­ibly evil?

Dentosal20 Nov 2025 13:22 UTC
5 points
2 comments2 min readLW link

Ru­mored Trump EO

Stephen Martin20 Nov 2025 13:07 UTC
10 points
0 comments4 min readLW link

The Moss Frac­tal: How Care Reg­u­lates Func­tional Aware­ness from Microbes to AI

Lcofa20 Nov 2025 11:33 UTC
1 point
0 comments14 min readLW link

What would adults in the room know about AI risk?

rosehadshar20 Nov 2025 9:11 UTC
18 points
2 comments3 min readLW link

10 Wrong and Dumb Gram­mar Rules

dreeves20 Nov 2025 7:56 UTC
15 points
3 comments3 min readLW link

My burnout journey

Aprillion20 Nov 2025 6:58 UTC
4 points
0 comments1 min readLW link
(peter.hozak.info)

One King Upon The Chessboard

Screwtape20 Nov 2025 6:06 UTC
49 points
7 comments6 min readLW link