When does com­pe­ti­tion lead to recog­nis­able val­ues?

12 Jan 2026 23:13 UTC
66 points
18 comments25 min readLW link
(postagi.org)

Lies, Damned Lies, and Proofs: For­mal Meth­ods are not Slopless

12 Jan 2026 22:32 UTC
102 points
10 comments7 min readLW link

Pro or Aver­age Joe? Do mod­els in­fer our tech­ni­cal abil­ity and can we con­trol this judge­ment?

tobypullan12 Jan 2026 20:52 UTC
12 points
0 comments9 min readLW link

Dat­ing Roundup #10: Gen­dered Expectations

Zvi12 Jan 2026 20:30 UTC
28 points
4 comments16 min readLW link
(thezvi.wordpress.com)

Au­to­mated In­ter­pretabil­ity-Driven Model Au­dit­ing and Con­trol: A Re­search Agenda

fbarez12 Jan 2026 19:55 UTC
9 points
0 comments1 min readLW link

Ten­sor-Trans­former Var­i­ants are Sur­pris­ingly Performant

Logan Riggs12 Jan 2026 19:43 UTC
87 points
16 comments4 min readLW link

The Al­gorithm Re­wards Engagement

Wes F12 Jan 2026 19:38 UTC
14 points
0 comments1 min readLW link

Black­BoxQuery [BBQ]-Bench: Mea­sur­ing Hy­poth­e­sis For­ma­tion and Ex­per­i­men­ta­tion Ca­pa­bil­ities in LLMs

Daniel Wu12 Jan 2026 19:36 UTC
10 points
0 comments12 min readLW link

Un­der­stand­ing Agency through Markov Blankets

Ashe Vazquez Nuñez12 Jan 2026 19:32 UTC
25 points
2 comments3 min readLW link

Model Re­duc­tion as In­ter­pretabil­ity: What Neu­ro­science Could Teach Us About Un­der­stand­ing Com­plex Systems

RiekeFruengel12 Jan 2026 19:31 UTC
13 points
0 comments6 min readLW link

Futarchy (and Tyranny of The Minor­ity)

maxwickham12 Jan 2026 19:27 UTC
4 points
1 comment8 min readLW link

What Hap­pens When Su­per­hu­man AIs Com­pete for Con­trol?

steveld12 Jan 2026 19:26 UTC
44 points
3 comments30 min readLW link
(blog.ai-futures.org)

Brief Ex­plo­ra­tions in LLM Value Rankings

12 Jan 2026 18:16 UTC
39 points
1 comment11 min readLW link

Prac­ti­cal challenges of con­trol mon­i­tor­ing in fron­tier AI deployments

12 Jan 2026 16:45 UTC
19 points
0 comments1 min readLW link
(arxiv.org)

Think­ing vs Unfolding

Chris Scammell12 Jan 2026 15:26 UTC
67 points
5 comments13 min readLW link

Split Per­son­al­ity Train­ing: Re­veal­ing La­tent Knowl­edge Through Alter­nate Per­son­al­ities (Re­search Re­port)

Florian_Dietz12 Jan 2026 12:29 UTC
87 points
41 comments26 min readLW link

In­ter-branch com­mu­ni­ca­tion in the mul­ti­verse via trapped ions

avturchin12 Jan 2026 12:16 UTC
7 points
32 comments4 min readLW link

--dan­ger­ously-skip-permissions

OhadA12 Jan 2026 7:37 UTC
16 points
6 comments3 min readLW link

Clos­ing the loop

Screwtape12 Jan 2026 6:37 UTC
30 points
1 comment2 min readLW link

An­nounc­ing Inkhaven 2: April 2026

Ben Pace12 Jan 2026 4:25 UTC
70 points
7 comments4 min readLW link

[Question] What po­tent con­sumer tech­nolo­gies have long re­mained in­ac­cessible?

TsviBT12 Jan 2026 3:13 UTC
32 points
11 comments4 min readLW link

Digi­tal in­ten­tion­al­ity is not about productivity

mingyuan12 Jan 2026 3:09 UTC
65 points
1 comment3 min readLW link
(mingyuan.substack.com)

De pluribus non est disputandum

Jacob Goldsmith12 Jan 2026 0:07 UTC
11 points
0 comments3 min readLW link

Strong, bi­par­ti­san lead­er­ship for re­sis­tance to Trump.

Raemon11 Jan 2026 23:08 UTC
82 points
85 comments2 min readLW link

A Cou­ple Use­ful LessWrong Userstyles

Alex Vermillion11 Jan 2026 21:26 UTC
39 points
0 comments2 min readLW link

Stretch Hatchback

jefftk11 Jan 2026 16:40 UTC
12 points
8 comments2 min readLW link
(www.jefftk.com)

We need a bet­ter way to eval­u­ate emer­gent misalignment

11 Jan 2026 16:21 UTC
86 points
9 comments6 min readLW link

Should the AI Safety Com­mu­nity Pri­ori­tize Safety Cases?

Jan Wehner11 Jan 2026 11:56 UTC
4 points
0 comments13 min readLW link

Cod­ing Agents As An In­ter­face To The Codebase

omegastick11 Jan 2026 10:31 UTC
16 points
5 comments3 min readLW link
(dumbideas.xyz)

Why AIs aren’t power-seek­ing yet

Eli Tyre11 Jan 2026 7:07 UTC
105 points
16 comments7 min readLW link

The­o­ret­i­cal pre­dic­tions on the sam­ple effi­ciency of train­ing poli­cies and ac­ti­va­tion monitors

10 Jan 2026 23:50 UTC
18 points
2 comments7 min readLW link

If AI al­ign­ment is only as hard as build­ing the steam en­g­ine, then we likely still die

MichaelDickens10 Jan 2026 23:10 UTC
35 points
8 comments4 min readLW link

How Hu­man­ity Wins

Wes R10 Jan 2026 21:55 UTC
−20 points
10 comments4 min readLW link

Pos­si­ble Prin­ci­ples of Superagency

Mariven10 Jan 2026 21:00 UTC
14 points
0 comments12 min readLW link
(mariven.substack.com)

The Case Against Con­tin­u­ous Chain-of-Thought (Neu­ralese)

RobinHa10 Jan 2026 20:32 UTC
11 points
8 comments5 min readLW link

The false con­fi­dence the­o­rem and Bayesian reasoning

viking_math10 Jan 2026 17:14 UTC
24 points
19 comments6 min readLW link

A Pro­posal for a Bet­ter ARENA: Shift­ing from Teach­ing to Re­search Sprints

TheManxLoiner10 Jan 2026 16:56 UTC
28 points
15 comments6 min readLW link

Mo­ral-Epistemic Scrupu­los­ity: A Cross-Frame­work Failure Mode of Truth-Seeking

Tamara Sofía Falcone10 Jan 2026 2:24 UTC
17 points
2 comments8 min readLW link

Find­ing high sig­nal peo­ple—ap­ply­ing PageRank to Twitter

jfguan10 Jan 2026 2:21 UTC
27 points
0 comments3 min readLW link
(thefourierproject.org)

AI In­ci­dent Forecasting

cluebbers10 Jan 2026 2:17 UTC
8 points
0 comments1 min readLW link
(cluebbers.github.io)

6’7” Is Not Random

Martin Lichstam10 Jan 2026 2:13 UTC
−10 points
2 comments2 min readLW link

What do we mean by “im­pos­si­ble”?

Sniffnoy10 Jan 2026 0:01 UTC
23 points
3 comments2 min readLW link

Where’s the $100k iPhone?

beyarkay (Boyd Kane)9 Jan 2026 23:48 UTC
33 points
32 comments4 min readLW link
(boydkane.com)

Tak­ing LLMs Se­ri­ously (As Lan­guage Models)

abramdemski9 Jan 2026 23:23 UTC
57 points
9 comments17 min readLW link

FirstPrin­ci­ples Talks: Science in the Age of AI

Carly Turini9 Jan 2026 21:18 UTC
1 point
0 comments1 min readLW link

Ob­jec­tive Questions

JenniferRM9 Jan 2026 21:09 UTC
23 points
6 comments8 min readLW link

FirstPrin­ci­ples Talks: Shal­low Re­cur­rent De­coders for the Au­to­mated Dis­cov­ery of Phys­i­cal Models

Carly Turini9 Jan 2026 21:06 UTC
1 point
0 comments1 min readLW link

Cancer-Selec­tive, Pan-Essen­tial Tar­gets from DepMap

sarahconstantin9 Jan 2026 20:50 UTC
21 points
0 comments11 min readLW link
(sarahconstantin.substack.com)

Un­der­stand­ing com­plex con­ju­gates in quan­tum mechanics

jessicata9 Jan 2026 20:45 UTC
17 points
8 comments12 min readLW link
(unstableontology.com)

[Linkpost] On the Ori­gins of Al­gorith­mic Progress in AI

alex_fogelson9 Jan 2026 18:41 UTC
47 points
6 comments1 min readLW link
(open.substack.com)