19 Dec 2025 22:48 UTC

46 points

7 comments11 min readLW link

(www.oliversourbut.net)

Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling

ozziegooen19 Dec 2025 21:41 UTC

11 points

0 comments5 min readLW link

Progress links and short notes, 2025-12-19

jasoncrawford19 Dec 2025 19:44 UTC

8 points

0 comments6 min readLW link

(newsletter.rootsofprogress.org)

Linch’s Top Inkhaven Posts and Reflections

Linch19 Dec 2025 19:40 UTC

38 points

0 comments9 min readLW link

(linch.substack.com)

When Were Things The Best?

Zvi19 Dec 2025 18:00 UTC

62 points

16 comments15 min readLW link

(thezvi.wordpress.com)

Response to Introspective Awareness research

maddi19 Dec 2025 17:23 UTC

6 points

0 comments9 min readLW link

SPAR Spring 2026: 130+ research projects now accepting applications

agucova19 Dec 2025 14:23 UTC

22 points

0 comments2 min readLW link

Space view

kapedalex19 Dec 2025 14:20 UTC

5 points

0 comments6 min readLW link

Digital Minds in 2025: A Year in Review

tbs and lucius

19 Dec 2025 14:18 UTC

16 points

0 comments21 min readLW link

(digitalminds.substack.com)

Scratchpad

Karthik Tadepalli19 Dec 2025 14:15 UTC

12 points

0 comments4 min readLW link

AI Safety has a scaling problem

beyarkay (Boyd Kane)19 Dec 2025 13:58 UTC

34 points

10 comments4 min readLW link

When Are Concealment Features Learned? And Does the Model Know Who’s Watching?

James Hoffend19 Dec 2025 8:19 UTC

13 points

1 comment6 min readLW link

2025-Era “Reward Hacking” Does Not Show that Reward Is the Optimization Target

TurnTrout19 Dec 2025 6:09 UTC

49 points

9 comments7 min readLW link

(turntrout.com)

Wuckles!

Raemon19 Dec 2025 3:08 UTC

64 points

15 comments2 min readLW link

Evaluation Awareness Scales Predictably in Open-Weights Large Language Models

Maheep Chaudhary19 Dec 2025 2:47 UTC

21 points

0 comments6 min readLW link

A name for the things that AI companies are building

DirectedEvolution19 Dec 2025 2:07 UTC

28 points

9 comments4 min readLW link

I made Geneguessr

Brinedew19 Dec 2025 1:55 UTC

35 points

2 comments1 min readLW link

In defence of the human agency: “Curing Cancer” is the new “Think of the Children”

Rajmohan H19 Dec 2025 0:03 UTC

27 points

9 comments3 min readLW link

Help keep AI under human control: Palisade Research 2026 fundraiser

Jeffrey Ladish, benwr, Eli Tyre and John Steidley

18 Dec 2025 23:41 UTC

105 points

66 comments6 min readLW link

OpenAI: Sidestepping Evaluation Awareness and Anticipating Misalignment with Production Evaluations

Marcus Williams and micahcarroll

18 Dec 2025 22:55 UTC

25 points

1 comment1 min readLW link

(alignment.openai.com)

Scalable End-to-End Interpretability

jsteinhardt18 Dec 2025 22:37 UTC

120 points

3 comments3 min readLW link

My Trip to NeurIPS 2025

Adam Newgas18 Dec 2025 22:31 UTC

15 points

0 comments4 min readLW link

(www.boristhebrave.com)

Leading by example

martinkunev18 Dec 2025 20:30 UTC

3 points

2 comments3 min readLW link

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas and Owain_Evans

18 Dec 2025 20:21 UTC

154 points

11 comments8 min readLW link

(arxiv.org)

A Study Of Instinct

LoganStrohl18 Dec 2025 20:19 UTC

30 points

0 comments4 min readLW link

Estimating The Portion of Income Consumed By Essentials Between 1985 and 2025

Mars_Will_Be_Ours18 Dec 2025 19:19 UTC

2 points

2 comments3 min readLW link

(shoutinginthedarkforest.substack.com)

Chemical (hunger) argument paraphrased

lemonhope18 Dec 2025 18:58 UTC

10 points

7 comments1 min readLW link

BashArena: A Control Setting for Highly Privileged AI Agents

james.lucassen and Adam Kaufman

18 Dec 2025 18:19 UTC

58 points

0 comments15 min readLW link

(blog.redwoodresearch.org)

AI Safety Orgs Should Apply for Government Grants

DusanDNesic18 Dec 2025 18:01 UTC

25 points

0 comments5 min readLW link

Good if make prior after data instead of before

dynomight18 Dec 2025 17:53 UTC

117 points

18 comments9 min readLW link

(dynomight.net)

AI #147: Flash Forward

Zvi18 Dec 2025 16:50 UTC

31 points

2 comments58 min readLW link

(thezvi.wordpress.com)

50 Things I Know

Rebecca Dai18 Dec 2025 16:32 UTC

6 points

8 comments7 min readLW link

(rebeccadai.substack.com)

Announcing Spring 2026 AI Forecasting Benchmark

Ben Wilson18 Dec 2025 15:43 UTC

2 points

0 comments4 min readLW link

(www.metaculus.com)

Deep Learning and Precipitation Reactions: A Tale of Universality

Max Hennick18 Dec 2025 14:34 UTC

57 points

4 comments18 min readLW link

A Functional Typology of Cognitive Capabilities (Interactive Visualization)

Anurag 18 Dec 2025 14:06 UTC

2 points

0 comments4 min readLW link

The Undervalued Kleene Hierarchy

milanrosko18 Dec 2025 11:57 UTC

10 points

2 comments6 min readLW link

[Paper] Self-Transparency Failures in Expert-Persona LLMs

Alex Diep18 Dec 2025 9:09 UTC

8 points

0 comments6 min readLW link

Solstice Sundowners

teegs18 Dec 2025 8:12 UTC

1 point

0 comments1 min readLW link

A basic case for donating to the Berkeley Genomics Project

TsviBT18 Dec 2025 1:55 UTC

85 points

5 comments4 min readLW link

Apply to MATS Summer 2026!

Raj Thimmiah, Ryan Kidd and Elise Racine

18 Dec 2025 1:51 UTC

31 points

0 comments1 min readLW link

Making Linear Probes Interpretable

ZuiderveldTimJ18 Dec 2025 1:48 UTC

17 points

0 comments10 min readLW link

A browser game about AI safety

NickSharp17 Dec 2025 22:36 UTC

18 points

4 comments1 min readLW link

What if we could grow human tissue by recapitulating embryogenesis?

Abhishaike Mahajan17 Dec 2025 21:53 UTC

23 points

0 comments1 min readLW link

(www.owlposting.com)

Transmitting Misalignment with Subliminal Learning via Paraphrasing

Matthew Bozoukov, Taywon Min, CallumMcDougall and J Rosser

17 Dec 2025 19:34 UTC

39 points

0 comments10 min readLW link

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordinne, ozziegooen, Violet Hour and lenz

17 Dec 2025 18:18 UTC

191 points

9 comments47 min readLW link

Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC

110 points

17 comments5 min readLW link

Alignment Fine-Tuning: Lessons from Operant Conditioning

foodforthought17 Dec 2025 16:57 UTC

5 points

4 comments10 min readLW link

Bryan Caplan on Ethical Intuitionism

vatsal_newsletter17 Dec 2025 16:48 UTC

−5 points

0 comments1 min readLW link

(www.readvatsal.com)

The Bleeding Mind

Adele Lopez17 Dec 2025 16:27 UTC

68 points

9 comments6 min readLW link

Could space debris block access to outer space?

fin17 Dec 2025 15:59 UTC

12 points

5 comments3 min readLW link

(www.forethought.org)