All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30

Easy Opportunity to Help Many Animals

Bentham's Bulldog21 Nov 2025 23:03 UTC

10 points

0 comments1 min readLW link

Why Not Just Train For Interpretability?

johnswentworth21 Nov 2025 22:08 UTC

56 points

12 comments4 min readLW link

Complaining about my inability to focus on uninteresting things

Dentosal21 Nov 2025 20:34 UTC

5 points

3 comments2 min readLW link

Models not making it clear when they’re roleplaying seems like a fairly big issue

williawa21 Nov 2025 20:23 UTC

16 points

3 comments6 min readLW link

Natural Emergent Misalignment from Reward Hacking

Algon21 Nov 2025 20:20 UTC

12 points

0 comments3 min readLW link

(www.anthropic.com)

Natural emergent misalignment from reward hacking in production RL

evhub, Monte M, Benjamin Wright and Jonathan Uesato

21 Nov 2025 20:00 UTC

258 points

32 comments9 min readLW link

Eight Heuristics of Anti-Epistemology

Ben Pace21 Nov 2025 19:54 UTC

44 points

2 comments6 min readLW link

We won’t solve post-alignment problems by doing research

MichaelDickens21 Nov 2025 18:03 UTC

24 points

11 comments4 min readLW link

Can Artificial Intelligence Be Conscious?

Bentham's Bulldog21 Nov 2025 16:43 UTC

15 points

5 comments7 min readLW link

Gemini 3: Model Card and Safety Framework Report

Zvi21 Nov 2025 16:40 UTC

33 points

0 comments11 min readLW link

(thezvi.wordpress.com)

Lorxus Does Halfhaven: 11/15~11/21

Lorxus21 Nov 2025 16:07 UTC

7 points

0 comments1 min readLW link

(tiled-with-pentagons.blogspot.com)

EA Hotel Solstice

plex21 Nov 2025 15:13 UTC

8 points

0 comments1 min readLW link

Why Does Empathy Have an Off-Switch?

J Bostock21 Nov 2025 14:56 UTC

9 points

1 comment7 min readLW link

What Do We Tell the Humans? Errors, Hallucinations, and Lies in the AI Village

Shoshannah Tekofsky21 Nov 2025 14:19 UTC

56 points

0 comments9 min readLW link

URGENT @everyone—help us kill AI preemption (again) before this Friday

Wes R and Felix De Simone

21 Nov 2025 12:51 UTC

−1 points

0 comments1 min readLW link

Should I Apply to a 3.5% Acceptance-Rate Fellowship? A Simple EV Calculator

Tobias H21 Nov 2025 10:59 UTC

16 points

0 comments5 min readLW link

Towards Humanist Superintelligence

Chris_Leong21 Nov 2025 10:22 UTC

17 points

3 comments1 min readLW link

(microsoft.ai)

16 Writing Tips from Inkhaven

dreeves21 Nov 2025 7:49 UTC

13 points

1 comment2 min readLW link

Reading My Diary: 10 Years Since CFAR

Ben Pace21 Nov 2025 7:27 UTC

71 points

1 comment6 min readLW link

The Worrying Nature of Akrasia

Notelrac21 Nov 2025 7:00 UTC

2 points

0 comments4 min readLW link

10 Key Insights from the “Frontier AI Risk Monitoring Platform”

Weibing Wang21 Nov 2025 6:07 UTC

3 points

0 comments2 min readLW link

Contra Collisteru: You Get About One Carthage

Screwtape21 Nov 2025 5:33 UTC

36 points

2 comments5 min readLW link

Infinitesimally False

Adrià Garriga-alonso and abramdemski

21 Nov 2025 4:57 UTC

55 points

16 comments12 min readLW link

Preferences are confusing

RobertM21 Nov 2025 3:07 UTC

28 points

1 comment2 min readLW link

Can questions rigidly designate intentions?

Mason Broxham21 Nov 2025 2:00 UTC

1 point

0 comments5 min readLW link

Week 3: Adversarial Robustness

Ely Hahami21 Nov 2025 1:43 UTC

1 point

0 comments3 min readLW link

Informed Consent as the Sole Criterion for Medical Treatment

Character#273621 Nov 2025 1:39 UTC

7 points

2 comments4 min readLW link

Suicide Prevention Ought To Be Illegal

Character#273621 Nov 2025 1:39 UTC

−17 points

17 comments6 min readLW link

How you got RL’d into your idiosyncratic cognition

Ruby21 Nov 2025 1:06 UTC

16 points

6 comments6 min readLW link

PSA: For Chronic Infections, Check Teeth

Algon20 Nov 2025 23:14 UTC

15 points

2 comments1 min readLW link

[Paper] Output Supervision Can Obfuscate the CoT

jacob_drori, lukemarks, cloud and TurnTrout

20 Nov 2025 22:41 UTC

92 points

3 comments5 min readLW link

(arxiv.org)

The Boring Part of Bell Labs

Elizabeth20 Nov 2025 22:40 UTC

133 points

0 comments15 min readLW link

(acesounderglass.com)

What the term “Mass Communication” gestures at

TristanTrim20 Nov 2025 22:34 UTC

3 points

0 comments7 min readLW link

Dominance: The Standard Everyday Solution To Akrasia

johnswentworth20 Nov 2025 21:42 UTC

50 points

22 comments2 min readLW link

Do One Neat Thing vs. Get Work Done

Kaj_Sotala20 Nov 2025 21:33 UTC

23 points

0 comments7 min readLW link

Gemini 3 is Evaluation-Paranoid and Contaminated

Alice Blair20 Nov 2025 21:02 UTC

180 points

42 comments7 min readLW link

Current LLM agents need strong pressure to engage in scheming behavior

Mia Hopman, Jannes Elstner, Maria Avramidou, Amritanshu Prasad, David Lindner and LASR Labs

20 Nov 2025 20:45 UTC

23 points

0 comments11 min readLW link

Try seeing art

foodforthought20 Nov 2025 19:25 UTC

10 points

1 comment5 min readLW link

AI #143: Everything, Everywhere, All At Once

Zvi20 Nov 2025 18:22 UTC

37 points

2 comments65 min readLW link

(thezvi.wordpress.com)

Thinking about reasoning models made me less worried about scheming

Fabien Roger20 Nov 2025 18:20 UTC

89 points

7 comments12 min readLW link

Defining AI Truth-Seeking by What It Is Not

Tianyi (Alex) Qiu20 Nov 2025 16:45 UTC

21 points

1 comment10 min readLW link

Restricting Dangerous Research: Has It Worked Before, and Could It Work for AI?

jleibowich20 Nov 2025 16:45 UTC

12 points

1 comment16 min readLW link

(samotsvety.com)

Persistence Ethics

Suspended Reason20 Nov 2025 16:27 UTC

7 points

2 comments5 min readLW link

Should we shun the legibly evil?

Dentosal20 Nov 2025 13:22 UTC

5 points

2 comments2 min readLW link

Rumored Trump EO

Stephen Martin20 Nov 2025 13:07 UTC

10 points

0 comments4 min readLW link

The Moss Fractal: How Care Regulates Functional Awareness from Microbes to AI

Lcofa20 Nov 2025 11:33 UTC

1 point

0 comments14 min readLW link

What would adults in the room know about AI risk?

rosehadshar20 Nov 2025 9:11 UTC

18 points

2 comments3 min readLW link

10 Wrong and Dumb Grammar Rules

dreeves20 Nov 2025 7:56 UTC

15 points

3 comments3 min readLW link

My burnout journey

Aprillion20 Nov 2025 6:58 UTC

4 points

0 comments1 min readLW link

(peter.hozak.info)

One King Upon The Chessboard

Screwtape20 Nov 2025 6:06 UTC

49 points

7 comments6 min readLW link