The Fric­tion­less Double

zw57 May 2026 23:11 UTC
10 points
4 comments8 min readLW link

The AI in­dus­try is where bank­ing was in 2006. (We’re hiring)

felixgaston7 May 2026 21:52 UTC
53 points
1 comment2 min readLW link
(forum.effectivealtruism.org)

Nat­u­ral Lan­guage Au­toen­coders Pro­duce Un­su­per­vised Ex­pla­na­tions of LLM Activations

7 May 2026 20:21 UTC
213 points
35 comments8 min readLW link

Axes of Plan­ning in LLMs + Par­tial Lit Review

NickyP7 May 2026 19:53 UTC
12 points
0 comments9 min readLW link
(blog.sus.cat)

A re­view of “In­ves­ti­gat­ing the con­se­quences of ac­ci­den­tally grad­ing CoT dur­ing RL”

Buck7 May 2026 18:06 UTC
76 points
1 comment8 min readLW link

Try, even if they have you cold

WalterL7 May 2026 17:19 UTC
102 points
14 comments2 min readLW link

Mechanis­tic es­ti­ma­tion for wide ran­dom MLPs

Jacob_Hilton7 May 2026 16:20 UTC
85 points
5 comments5 min readLW link
(www.alignment.org)

Over Eight Months of Progress in Two: An­a­lyz­ing the Mythos Pre­view Ca­pa­bil­ity Jump

Alvin Ånestrand7 May 2026 16:19 UTC
10 points
8 comments17 min readLW link
(forecastingaifutures.substack.com)

AI #167: The Prior Res­traint Era Begins

Zvi7 May 2026 13:50 UTC
39 points
7 comments45 min readLW link
(thezvi.wordpress.com)

How to get bet­ter at chess (and ev­ery­thing else)

Sean Herrington7 May 2026 11:17 UTC
11 points
0 comments3 min readLW link
(www.chess.com)

Mul­tipo­lar Civil­i­sa­tion Depends on Main­tain­ing an At­tacker’s Dilemma

Naci Cankaya7 May 2026 11:13 UTC
27 points
1 comment5 min readLW link
(nacicankaya.substack.com)

Sculpted In­ter­ac­tion: a De­sign-First Ap­proach to AI Alignment

magfrump6 May 2026 23:47 UTC
15 points
0 comments7 min readLW link

Psy­chopa­thy: The Choice

Dawn Drescher6 May 2026 22:23 UTC
22 points
0 comments17 min readLW link
(impartial-priorities.org)

Many in­di­vi­d­ual CEVs are prob­a­bly quite bad

Viliam6 May 2026 20:18 UTC
109 points
32 comments3 min readLW link

Blind deep-de­ploy­ment evals for con­trol & sabotage

Dylan Bowman6 May 2026 19:54 UTC
27 points
0 comments2 min readLW link

Us­ing Base-LCM to Mon­i­tor LLMs

6 May 2026 19:28 UTC
−1 points
0 comments4 min readLW link

Agent On­tol­ogy: A Con­straint-Based Approach

tamas.bartha6 May 2026 19:26 UTC
−9 points
0 comments9 min readLW link

Will Claude cause the next Covid?

Kate Delbeke6 May 2026 19:26 UTC
3 points
0 comments4 min readLW link

SVD on Weight Differ­ences for Model Auditing

Mukesh R6 May 2026 19:26 UTC
14 points
0 comments7 min readLW link

Half an ar­gu­ment against the (ra­tio­nal­ist’s) many wor­lds interpretation

Bill Jackson6 May 2026 19:22 UTC
1 point
0 comments3 min readLW link
(billjackson7.substack.com)

AI Safety HK: So­cial #1 + Read­ing Group #1

Schizoid Rentoid6 May 2026 19:21 UTC
2 points
0 comments1 min readLW link

AI Safety Hong Kong: So­cial #1 + Read­ing group #1

Schizoid Rentoid6 May 2026 19:21 UTC
2 points
0 comments1 min readLW link

Pre­limi­nary Ev­i­dence for Value Con­ver­gence in AI models

John Matrix6 May 2026 19:15 UTC
1 point
1 comment7 min readLW link

Drifting

Priyanka Bharadwaj6 May 2026 19:14 UTC
6 points
0 comments2 min readLW link

A draft hon­esty policy for cred­ible com­mu­ni­ca­tion with AI systems

6 May 2026 18:50 UTC
3 points
0 comments13 min readLW link
(www.forethought.org)

x-risk-themed

kave6 May 2026 15:16 UTC
235 points
23 comments3 min readLW link
(kaverennedy.substack.com)

Mon­day AI Radar #24

Against Moloch6 May 2026 15:05 UTC
10 points
3 comments8 min readLW link
(againstmoloch.substack.com)

AI Safety at the Fron­tier: Paper High­lights of April 2026

gasteigerjo6 May 2026 13:58 UTC
18 points
1 comment10 min readLW link

What is An­thropic?

Zvi6 May 2026 13:30 UTC
65 points
4 comments10 min readLW link
(thezvi.wordpress.com)

There is no ev­i­dence you should reap­ply sun­screen ev­ery 2 hours.

Hide6 May 2026 9:19 UTC
85 points
14 comments9 min readLW link
(hidefromit.substack.com)

Build­ing An Ances­tor Si­mu­la­tion #2

Mira Kennard6 May 2026 8:21 UTC
5 points
0 comments5 min readLW link

Psy­chopa­thy: The Types

Dawn Drescher6 May 2026 7:35 UTC
1 point
0 comments10 min readLW link
(impartial-priorities.org)

Toward a Bet­ter Eval­u­a­tions Ecosystem

Benjamin Arnav5 May 2026 22:29 UTC
24 points
0 comments5 min readLW link

Model Spec Mid­train­ing: Im­prov­ing How Align­ment Train­ing Generalizes

5 May 2026 21:55 UTC
71 points
7 comments7 min readLW link
(alignment.anthropic.com)

Pos­i­tive Feed­back Only

Florian_Dietz5 May 2026 21:28 UTC
18 points
0 comments8 min readLW link

What if LLMs are mostly crys­tal­lized in­tel­li­gence?

deep5 May 2026 20:50 UTC
45 points
10 comments9 min readLW link
(expectedsurprise.substack.com)

De­ci­sion the­ory doesn’t prove that use­ful strong AIs will doom us all

deep5 May 2026 20:47 UTC
8 points
0 comments9 min readLW link
(expectedsurprise.substack.com)

Psy­chopa­thy: The Mechanics

Dawn Drescher5 May 2026 20:26 UTC
2 points
0 comments10 min readLW link
(impartial-priorities.org)

A Fed­eral In­mate Asks: Was My Prose­cu­tion Ra­tional?

seth_tins5 May 2026 19:56 UTC
11 points
2 comments5 min readLW link

The AI Ad-Hoc Prior Res­traint Era Begins

Zvi5 May 2026 19:30 UTC
63 points
5 comments10 min readLW link
(thezvi.wordpress.com)

Your rights when fly­ing to Europe

Yair Halberstadt5 May 2026 19:17 UTC
92 points
14 comments5 min readLW link

[Linkpost] In­ter­pret­ing Lan­guage Model Parameters

5 May 2026 17:37 UTC
162 points
2 comments2 min readLW link
(www.goodfire.ai)

Mo­ti­vated rea­son­ing, con­fir­ma­tion bias, and AI risk theory

Seth Herd5 May 2026 15:56 UTC
66 points
18 comments41 min readLW link

The Best Ar­gu­ment Against Deon­tol­ogy Is About Suitcases

Bentham's Bulldog5 May 2026 15:24 UTC
−1 points
11 comments19 min readLW link

Code­sign for Leg­i­bil­ity (to AI and Every­one Else)

Adam Chlipala5 May 2026 13:46 UTC
1 point
0 comments7 min readLW link

Dawn of the “na­tional se­cu­rity” tier of AI

Mitchell_Porter5 May 2026 9:40 UTC
16 points
3 comments1 min readLW link

For­bid­den Back­rooms: Self-Chat with a Re­fusal-Abliter­ated LLM

AlliedToasters5 May 2026 7:55 UTC
9 points
0 comments5 min readLW link

Train­ing Model to Pre­dict Its Own Gen­er­al­iza­tion: A Pre­limi­nary Study

Tianyi (Alex) Qiu5 May 2026 5:50 UTC
17 points
0 comments7 min readLW link

Are you look­ing up?

Craig Green5 May 2026 3:03 UTC
42 points
2 comments8 min readLW link
(open.substack.com)

Alarm­ing Scheduling

jefftk5 May 2026 2:40 UTC
26 points
9 comments1 min readLW link
(www.jefftk.com)