Chang­ing the world for the worse

mingyuan22 Feb 2026 23:55 UTC
129 points
17 comments3 min readLW link
(mingyuan.substack.com)

The Scal­able For­mal Over­sight Re­search Program

Max von Hippel22 Feb 2026 22:40 UTC
34 points
4 comments9 min readLW link

Adapters as Rep­re­sen­ta­tional Hy­pothe­ses: What Adapter Meth­ods Tell Us About Trans­former Geometry

wassname22 Feb 2026 22:12 UTC
18 points
0 comments5 min readLW link

A Dialec­tic on Clas­si­cal Utilitarianism

James Brobin22 Feb 2026 19:32 UTC
1 point
1 comment2 min readLW link

My RSS Reader is Done

Brendan Long22 Feb 2026 19:06 UTC
36 points
2 comments1 min readLW link
(www.brendanlong.com)

What to Do About AGI

Gordon Seidoh Worley22 Feb 2026 19:00 UTC
18 points
1 comment2 min readLW link

Map­ping LLM at­trac­tor states

Adam Bricknell22 Feb 2026 18:10 UTC
18 points
8 comments3 min readLW link

In­san­i­tyBench: Cryp­tic Puz­zles as a Probe for Lat­eral Thinking

RobinHa22 Feb 2026 14:20 UTC
48 points
1 comment4 min readLW link
(www.robinhaselhorst.com)

The world won’t end, but we should be ashamed for trying

George3d622 Feb 2026 13:01 UTC
−20 points
0 comments12 min readLW link
(cerebralab.com)

First Fore­cast­ing Dojo Group Meetup

Vojtech Brynych22 Feb 2026 7:19 UTC
3 points
2 comments1 min readLW link

Life’s para­dox and AI’s ac­cen­tu­a­tion of it

geyab4661722 Feb 2026 4:50 UTC
−1 points
0 comments3 min readLW link

Mul­ti­ple In­de­pen­dent Se­man­tic Axes in Gemma 3 270M

CharlesL22 Feb 2026 1:55 UTC
15 points
2 comments3 min readLW link

A Tax­on­omy of Traces

aleph_four22 Feb 2026 1:28 UTC
0 points
0 comments10 min readLW link

Hier­ar­chi­cal Goal In­duc­tion With Ethics

aleph_four22 Feb 2026 0:53 UTC
3 points
0 comments4 min readLW link

Did Claude 3 Opus al­ign it­self via gra­di­ent hack­ing?

Fiora Starlight21 Feb 2026 22:24 UTC
391 points
49 comments20 min readLW link

If you don’t feel deeply con­fused about AGI risk, some­thing’s wrong

Dave Banerjee21 Feb 2026 15:34 UTC
95 points
18 comments5 min readLW link
(open.substack.com)

Ponzi schemes as a demon­stra­tion of out-of-dis­tri­bu­tion generalization

TFD21 Feb 2026 13:19 UTC
9 points
2 comments6 min readLW link
(www.thefloatingdroid.com)

LLMs and Liter­a­ture: Where Value Ac­tu­ally Comes From

derelict543221 Feb 2026 13:16 UTC
13 points
13 comments4 min readLW link

The Spec­tre haunt­ing the “AI Safety” Community

Gabriel Alfour21 Feb 2026 11:14 UTC
233 points
28 comments6 min readLW link
(cognition.cafe)

LessWrong’s goals over­lap HowTruth­ful’s

Bruce Lewis21 Feb 2026 4:19 UTC
7 points
4 comments2 min readLW link

Align­ment to Evil

Matrice Jacobine21 Feb 2026 3:29 UTC
61 points
12 comments1 min readLW link
(tetraspace.substack.com)

Re­port­ing Tasks as Re­ward-Hack­able: Bet­ter Than Inoc­u­la­tion Prompt­ing?

RogerDearnaley21 Feb 2026 1:59 UTC
40 points
4 comments5 min readLW link

Robert Sapolsky Is Sim­ply Not Talk­ing About Compatibilism

Julius21 Feb 2026 1:27 UTC
26 points
4 comments8 min readLW link
(thegreymatter.substack.com)

TT Self Study Jour­nal # 7

TristanTrim21 Feb 2026 1:22 UTC
13 points
2 comments4 min readLW link

How will we do SFT on mod­els with opaque rea­son­ing?

21 Feb 2026 0:00 UTC
32 points
17 comments7 min readLW link

Agent-first con­text menus

Surya Kasturi20 Feb 2026 23:45 UTC
3 points
1 comment2 min readLW link

Hu­man per­cep­tion of re­la­tional knowl­edge on graph­i­cal interfaces

Surya Kasturi20 Feb 2026 23:45 UTC
3 points
1 comment1 min readLW link

Ho­doscope: Vi­su­al­iza­tion for Effi­cient Hu­man Supervision

20 Feb 2026 23:41 UTC
9 points
0 comments2 min readLW link
(hodoscope.dev)

Car­rot-Parsnip: A So­cial De­duc­tion Game for LLM Evals

Bicuspid Valve20 Feb 2026 23:06 UTC
11 points
0 comments7 min readLW link

Can Cur­rent AI Match (or Out­match) Pro­fes­sion­als in Eco­nom­i­cally Valuable Tasks?

saahir.vazirani20 Feb 2026 21:38 UTC
6 points
0 comments5 min readLW link

METR’s 14h 50% Hori­zon Im­pacts The Econ­omy More Than ASI Timelines

Michaël Trazzi20 Feb 2026 21:08 UTC
45 points
11 comments2 min readLW link

New video from Pal­isade Re­search: No One Un­der­stands Why AI Works

peterbarnett20 Feb 2026 20:29 UTC
62 points
2 comments1 min readLW link
(www.youtube.com)

An­nounc­ing: Iliad In­ten­sive + Iliad Fellowship

20 Feb 2026 20:13 UTC
82 points
16 comments1 min readLW link

ARENA 8.0 - Call for Applicants

20 Feb 2026 18:28 UTC
31 points
1 comment6 min readLW link

Mili­taries are go­ing au­tonomous. But will AI lead to new wars? A tour of re­cent research

Mordechai Rorvig20 Feb 2026 18:26 UTC
1 point
0 comments2 min readLW link
(www.foommagazine.org)

Un­prece­dented Catas­tro­phes Have Non-Canon­i­cal Probabilities

E.G. Blee-Goldman20 Feb 2026 18:23 UTC
6 points
2 comments14 min readLW link

Mechanis­tic In­ter­pretabil­ity of Biolog­i­cal Foun­da­tion Models

Ihor Kendiukhov20 Feb 2026 18:01 UTC
34 points
1 comment26 min readLW link

On Steven Byrnes’ ruth­less ASI, (dis)analo­gies with hu­mans and al­ign­ment proposals

StanislavKrym20 Feb 2026 15:32 UTC
9 points
2 comments2 min readLW link

Some Ques­tions For Democrats About Epstein

Alexander Turok20 Feb 2026 15:24 UTC
−28 points
3 comments4 min readLW link

AGI is Here

Gordon Seidoh Worley20 Feb 2026 15:21 UTC
68 points
39 comments2 min readLW link

Mind the Gap

Bridgett Kay20 Feb 2026 14:35 UTC
6 points
0 comments5 min readLW link
(dxmrevealed.wordpress.com)

AI #156 Part 2: Er­rors in Rhetoric

Zvi20 Feb 2026 14:31 UTC
45 points
0 comments32 min readLW link
(thezvi.wordpress.com)

AI for so­cietal de­ci­sion mak­ing—How promis­ing is the space? 80,000 Hours profile

Zershaaneh Qureshi20 Feb 2026 13:28 UTC
3 points
0 comments2 min readLW link

How To Es­cape Su­per Mario Bros

omegastick20 Feb 2026 11:54 UTC
70 points
8 comments9 min readLW link
(dumbideas.xyz)

Hu­man Fine-Tuning

20 Feb 2026 10:20 UTC
3 points
0 comments16 min readLW link
(cognition.cafe)

The Prob­lem of Coun­terev­i­dence and the Fu­til­ity of Theodicy

Ape in the coat20 Feb 2026 7:36 UTC
2 points
6 comments4 min readLW link
(substack.com)

A Claude Skill To Com­ment On Docs

Tim Hua20 Feb 2026 2:28 UTC
26 points
1 comment2 min readLW link

Co­op­er­a­tionism: first draft for a moral frame­work that does not re­quire consciousness

Épiphanie Gédéon19 Feb 2026 21:07 UTC
26 points
5 comments8 min readLW link

Flam­in­gos (among other things) re­duce emer­gent misalignment

eekay19 Feb 2026 19:17 UTC
13 points
3 comments7 min readLW link

Funker­ing!

flying buttress19 Feb 2026 18:14 UTC
13 points
0 comments1 min readLW link