All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Some evidence against the idea strange CoT stems from incentives to compress language

williawa10 Dec 2025 22:43 UTC

17 points

0 comments2 min readLW link

Follow-through on Bay Solstice

Raemon10 Dec 2025 22:07 UTC

106 points

22 comments6 min readLW link

Rock Paper Scissors is Not Solved, In Practice

Linch10 Dec 2025 21:37 UTC

59 points

13 comments9 min readLW link

(inchpin.substack.com)

Childhood and Education #15: Got To Get Out

Zvi10 Dec 2025 21:31 UTC

49 points

3 comments26 min readLW link

(thezvi.wordpress.com)

Apply to ESPR & PAIR 2026, Rationality and AI Camps for Ages 16-21

Stag10 Dec 2025 19:39 UTC

25 points

0 comments1 min readLW link

Evaluation as a (Cooperation-Enabling?) Tool

VojtaKovarik10 Dec 2025 18:54 UTC

18 points

0 comments28 min readLW link

Consider calling the NY governor about the RAISE Act

thenoviceoof10 Dec 2025 18:47 UTC

15 points

0 comments11 min readLW link

No ghost in the machine

fin10 Dec 2025 18:35 UTC

10 points

5 comments45 min readLW link

(finmoorhouse.com)

Most Algorithmic Progress is Data Progress [Linkpost]

Noosphere8910 Dec 2025 17:48 UTC

36 points

9 comments5 min readLW link

(www.beren.io)

Fibonacci Holds Information

milanrosko10 Dec 2025 17:16 UTC

11 points

2 comments2 min readLW link

Register for SPAR Demo Day on Saturday, Dec 13

Topaz and agucova

10 Dec 2025 16:58 UTC

7 points

0 comments1 min readLW link

We don’t know what most microbial genes do. Can genomic language models help?

Abhishaike Mahajan10 Dec 2025 16:04 UTC

19 points

0 comments1 min readLW link

Artifacts I’d like to try

Alexandre Variengien10 Dec 2025 14:16 UTC

15 points

5 comments6 min readLW link

(alexandrevariengien.com)

AI Safety – Analyse Affordances

atharva10 Dec 2025 14:09 UTC

3 points

0 comments2 min readLW link

An Approach for Evaluating Self-Boundary Consistency in AI Systems

Anurag 10 Dec 2025 13:57 UTC

3 points

0 comments6 min readLW link

Caesar Derangement Syndrome

GenericModel10 Dec 2025 13:04 UTC

−6 points

3 comments6 min readLW link

(enrichedjamsham.substack.com)

Living on a ball of hair

Alexandre Variengien10 Dec 2025 7:38 UTC

4 points

0 comments1 min readLW link

(alexandrevariengien.com)

The funding conversation we left unfinished

jenn10 Dec 2025 2:17 UTC

151 points

3 comments3 min readLW link

[Question] Do you expect the first AI to cross NY’s RAISE Act’s “Critical Harm” threshold to be contained?

Josh Snider10 Dec 2025 1:04 UTC

4 points

0 comments1 min readLW link

TT Self Study Journal # 5

TristanTrim9 Dec 2025 22:16 UTC

4 points

2 comments5 min readLW link

Lorxus Does Halfhaven: 11/29, 11/30, Highlights, Postmortem

Lorxus9 Dec 2025 21:00 UTC

6 points

0 comments3 min readLW link

(tiled-with-pentagons.blogspot.com)

Tristan’s list of things to write

TristanTrim9 Dec 2025 20:28 UTC

5 points

21 comments1 min readLW link

Tate Modern 2150

GenericModel9 Dec 2025 19:15 UTC

15 points

2 comments9 min readLW link

(enrichedjamsham.substack.com)

Selling H200s to China Is Unwise and Unpopular

Zvi9 Dec 2025 19:11 UTC

47 points

3 comments13 min readLW link

(thezvi.wordpress.com)

Non-optimized beauty

Alexandre Variengien9 Dec 2025 19:04 UTC

7 points

0 comments3 min readLW link

(alexandrevariengien.com)

Auditing Games for Sandbagging [paper]

Jordan Taylor and Joseph Bloom

9 Dec 2025 18:37 UTC

103 points

4 comments10 min readLW link

A Catalog of AI Evaluations

Anurag 9 Dec 2025 17:05 UTC

2 points

0 comments1 min readLW link

Insights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC

222 points

24 comments10 min readLW link

Localizing Finetuned Information in Transformers with Dynamic Weight Grafting

toddknife9 Dec 2025 16:20 UTC

6 points

0 comments5 min readLW link

Gradual Disempowerment Monthly Roundup #3

Raymond Douglas9 Dec 2025 16:02 UTC

49 points

0 comments4 min readLW link

Every house has a chemistry lab

Alexandre Variengien9 Dec 2025 14:17 UTC

5 points

0 comments1 min readLW link

(alexandrevariengien.com)

Ways we can fail to answer

technicalities9 Dec 2025 13:10 UTC

13 points

0 comments5 min readLW link

[Question] Do you take joy in effective altruism?

SpectrumDT9 Dec 2025 10:52 UTC

12 points

1 comment1 min readLW link

My experience running a 100k

Alexandre Variengien9 Dec 2025 8:30 UTC

52 points

0 comments6 min readLW link

(alexandrevariengien.com)

Seriously, use text expansions

Parv Mahajan9 Dec 2025 5:08 UTC

12 points

0 comments1 min readLW link

(parvmahajan.com)

The reverse sear as a worthwhile life skill

Adam Zerner9 Dec 2025 2:47 UTC

29 points

11 comments8 min readLW link

Every point of intervention

TsviBT9 Dec 2025 2:14 UTC

92 points

2 comments8 min readLW link

D&D Sci Thanksgiving: the Festival Feast Evaluation & Ruleset

aphyer9 Dec 2025 1:38 UTC

30 points

8 comments3 min readLW link

Towards a Categorization of Adlerian Excuses

romeostevensit8 Dec 2025 23:22 UTC

90 points

12 comments6 min readLW link

A Falsifiable Causal Argument for Substrate Independence

rife8 Dec 2025 22:47 UTC

10 points

0 comments5 min readLW link

Prompting Models to Obfuscate Their CoT

Josh Engels and Felix Tudose

8 Dec 2025 21:00 UTC

16 points

4 comments7 min readLW link

Gödel’s Ontological Proof

GenericModel8 Dec 2025 20:49 UTC

19 points

74 comments13 min readLW link

(enrichedjamsham.substack.com)

High-level approaches to rigor in interpretability

David Scott Krueger8 Dec 2025 20:46 UTC

24 points

0 comments1 min readLW link

If It Can Learn It, It Can Unlearn It: AI Safety as Architecture, Not Training

Timothy Danforth8 Dec 2025 20:38 UTC

1 point

0 comments4 min readLW link

Human Dignity: a review

owencb8 Dec 2025 20:37 UTC

32 points

0 comments7 min readLW link

(strangecities.substack.com)

A few quick thoughts on measuring disempowerment

David Scott Krueger8 Dec 2025 20:03 UTC

30 points

3 comments1 min readLW link

How Stealth Works

Linch8 Dec 2025 19:46 UTC

48 points

5 comments3 min readLW link

(linch.substack.com)

Reward Function Design: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC

82 points

13 comments3 min readLW link

We need a field of Reward Function Design

Steven Byrnes8 Dec 2025 19:15 UTC

118 points

12 comments5 min readLW link

I have hope

TristanTrim8 Dec 2025 18:20 UTC

12 points

0 comments2 min readLW link