Some ev­i­dence against the idea strange CoT stems from in­cen­tives to com­press language

williawa10 Dec 2025 22:43 UTC
17 points
0 comments2 min readLW link

Fol­low-through on Bay Solstice

Raemon10 Dec 2025 22:07 UTC
106 points
22 comments6 min readLW link

Rock Paper Scis­sors is Not Solved, In Practice

Linch10 Dec 2025 21:37 UTC
59 points
13 comments9 min readLW link
(inchpin.substack.com)

Child­hood and Ed­u­ca­tion #15: Got To Get Out

Zvi10 Dec 2025 21:31 UTC
49 points
3 comments26 min readLW link
(thezvi.wordpress.com)

Ap­ply to ESPR & PAIR 2026, Ra­tion­al­ity and AI Camps for Ages 16-21

Stag10 Dec 2025 19:39 UTC
25 points
0 comments1 min readLW link

Eval­u­a­tion as a (Co­op­er­a­tion-En­abling?) Tool

VojtaKovarik10 Dec 2025 18:54 UTC
18 points
0 comments28 min readLW link

Con­sider call­ing the NY gov­er­nor about the RAISE Act

thenoviceoof10 Dec 2025 18:47 UTC
15 points
0 comments11 min readLW link

No ghost in the machine

fin10 Dec 2025 18:35 UTC
10 points
5 comments45 min readLW link
(finmoorhouse.com)

Most Al­gorith­mic Progress is Data Progress [Linkpost]

Noosphere8910 Dec 2025 17:48 UTC
36 points
9 comments5 min readLW link
(www.beren.io)

Fibonacci Holds Information

milanrosko10 Dec 2025 17:16 UTC
11 points
2 comments2 min readLW link

Register for SPAR Demo Day on Satur­day, Dec 13

10 Dec 2025 16:58 UTC
7 points
0 comments1 min readLW link

We don’t know what most micro­bial genes do. Can ge­nomic lan­guage mod­els help?

Abhishaike Mahajan10 Dec 2025 16:04 UTC
19 points
0 comments1 min readLW link

Ar­ti­facts I’d like to try

Alexandre Variengien10 Dec 2025 14:16 UTC
15 points
5 comments6 min readLW link
(alexandrevariengien.com)

AI Safety – Analyse Affordances

atharva10 Dec 2025 14:09 UTC
3 points
0 comments2 min readLW link

An Ap­proach for Eval­u­at­ing Self-Boundary Con­sis­tency in AI Systems

Anurag 10 Dec 2025 13:57 UTC
3 points
0 comments6 min readLW link

Cae­sar Derange­ment Syndrome

GenericModel10 Dec 2025 13:04 UTC
−6 points
3 comments6 min readLW link
(enrichedjamsham.substack.com)

Liv­ing on a ball of hair

Alexandre Variengien10 Dec 2025 7:38 UTC
4 points
0 comments1 min readLW link
(alexandrevariengien.com)

The fund­ing con­ver­sa­tion we left unfinished

jenn10 Dec 2025 2:17 UTC
151 points
3 comments3 min readLW link

[Question] Do you ex­pect the first AI to cross NY’s RAISE Act’s “Crit­i­cal Harm” thresh­old to be con­tained?

Josh Snider10 Dec 2025 1:04 UTC
4 points
0 comments1 min readLW link

TT Self Study Jour­nal # 5

TristanTrim9 Dec 2025 22:16 UTC
4 points
2 comments5 min readLW link

Lorxus Does Halfhaven: 11/​29, 11/​30, High­lights, Postmortem

Lorxus9 Dec 2025 21:00 UTC
6 points
0 comments3 min readLW link
(tiled-with-pentagons.blogspot.com)

Tris­tan’s list of things to write

TristanTrim9 Dec 2025 20:28 UTC
5 points
21 comments1 min readLW link

Tate Modern 2150

GenericModel9 Dec 2025 19:15 UTC
15 points
2 comments9 min readLW link
(enrichedjamsham.substack.com)

Sel­ling H200s to China Is Un­wise and Unpopular

Zvi9 Dec 2025 19:11 UTC
47 points
3 comments13 min readLW link
(thezvi.wordpress.com)

Non-op­ti­mized beauty

Alexandre Variengien9 Dec 2025 19:04 UTC
7 points
0 comments3 min readLW link
(alexandrevariengien.com)

Au­dit­ing Games for Sand­bag­ging [pa­per]

9 Dec 2025 18:37 UTC
103 points
4 comments10 min readLW link

A Cat­a­log of AI Evaluations

Anurag 9 Dec 2025 17:05 UTC
2 points
0 comments1 min readLW link

In­sights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC
222 points
24 comments10 min readLW link

Lo­cal­iz­ing Fine­tuned In­for­ma­tion in Trans­form­ers with Dy­namic Weight Grafting

toddknife9 Dec 2025 16:20 UTC
6 points
0 comments5 min readLW link

Grad­ual Disem­pow­er­ment Monthly Roundup #3

Raymond Douglas9 Dec 2025 16:02 UTC
49 points
0 comments4 min readLW link

Every house has a chem­istry lab

Alexandre Variengien9 Dec 2025 14:17 UTC
5 points
0 comments1 min readLW link
(alexandrevariengien.com)

Ways we can fail to answer

technicalities9 Dec 2025 13:10 UTC
13 points
0 comments5 min readLW link

[Question] Do you take joy in effec­tive al­tru­ism?

SpectrumDT9 Dec 2025 10:52 UTC
12 points
1 comment1 min readLW link

My ex­pe­rience run­ning a 100k

Alexandre Variengien9 Dec 2025 8:30 UTC
52 points
0 comments6 min readLW link
(alexandrevariengien.com)

Se­ri­ously, use text expansions

Parv Mahajan9 Dec 2025 5:08 UTC
12 points
0 comments1 min readLW link
(parvmahajan.com)

The re­verse sear as a worth­while life skill

Adam Zerner9 Dec 2025 2:47 UTC
29 points
11 comments8 min readLW link

Every point of intervention

TsviBT9 Dec 2025 2:14 UTC
92 points
2 comments8 min readLW link

D&D Sci Thanks­giv­ing: the Fes­ti­val Feast Eval­u­a­tion & Ruleset

aphyer9 Dec 2025 1:38 UTC
30 points
8 comments3 min readLW link

Towards a Cat­e­go­riza­tion of Adle­rian Excuses

romeostevensit8 Dec 2025 23:22 UTC
90 points
12 comments6 min readLW link

A Falsifi­able Causal Ar­gu­ment for Sub­strate Independence

rife8 Dec 2025 22:47 UTC
10 points
0 comments5 min readLW link

Prompt­ing Models to Obfus­cate Their CoT

8 Dec 2025 21:00 UTC
16 points
4 comments7 min readLW link

Gödel’s On­tolog­i­cal Proof

GenericModel8 Dec 2025 20:49 UTC
19 points
74 comments13 min readLW link
(enrichedjamsham.substack.com)

High-level ap­proaches to rigor in interpretability

David Scott Krueger8 Dec 2025 20:46 UTC
24 points
0 comments1 min readLW link

If It Can Learn It, It Can Un­learn It: AI Safety as Ar­chi­tec­ture, Not Training

Timothy Danforth8 Dec 2025 20:38 UTC
1 point
0 comments4 min readLW link

Hu­man Dig­nity: a review

owencb8 Dec 2025 20:37 UTC
32 points
0 comments7 min readLW link
(strangecities.substack.com)

A few quick thoughts on mea­sur­ing disempowerment

David Scott Krueger8 Dec 2025 20:03 UTC
30 points
3 comments1 min readLW link

How Stealth Works

Linch8 Dec 2025 19:46 UTC
48 points
5 comments3 min readLW link
(linch.substack.com)

Re­ward Func­tion De­sign: a starter pack

Steven Byrnes8 Dec 2025 19:15 UTC
82 points
13 comments3 min readLW link

We need a field of Re­ward Func­tion Design

Steven Byrnes8 Dec 2025 19:15 UTC
118 points
12 comments5 min readLW link

I have hope

TristanTrim8 Dec 2025 18:20 UTC
12 points
0 comments2 min readLW link