Ex­plor­ing Known Un­knowns in the AI Reg­u­la­tory Landscape

NelsonDP14 Jun 2026 22:36 UTC
6 points
0 comments22 min readLW link
(open.substack.com)

At­tack of the Killer Differ­en­tial Equations

Fernand014 Jun 2026 22:20 UTC
11 points
0 comments2 min readLW link

I built a pub­lic arena where peo­ple at­tack a “pro-hu­man” steer­ing direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC
1 point
0 comments9 min readLW link
(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Prop­er­ties Fail?

14 Jun 2026 19:45 UTC
50 points
7 comments10 min readLW link

Why I think a global AI pause (al­most) cer­tainly won’t happen

Expertium14 Jun 2026 19:20 UTC
23 points
0 comments2 min readLW link

Grad­ual dis­em­pow­er­ment at the scale of one user

ppal14 Jun 2026 18:01 UTC
6 points
0 comments4 min readLW link

How does con­gress­mem­ber use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC
10 points
2 comments4 min readLW link

The Pos­ture of Thought

dongerous14 Jun 2026 18:00 UTC
13 points
0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC
5 points
2 comments4 min readLW link
(yogesh.bearblog.dev)

Can a stronger model fake be­ing a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC
10 points
1 comment7 min readLW link
(www.robkopel.me)

The 1890 Cen­sus as a fun cluster

Fernand014 Jun 2026 15:41 UTC
0 points
3 comments1 min readLW link

The Hid­den Struc­tures of Problems

spencerg14 Jun 2026 13:51 UTC
91 points
9 comments3 min readLW link
(www.spencergreenberg.com)

Agent Iden­tity Stan­dard­i­s­a­tion Efforts

tr5tn14 Jun 2026 11:30 UTC
2 points
0 comments2 min readLW link

Wikipe­dia’s na­tional fla­vors—French

Fernand014 Jun 2026 10:29 UTC
11 points
1 comment2 min readLW link

Low-tem­per­a­ture bunk

Fernand014 Jun 2026 7:59 UTC
0 points
0 comments1 min readLW link

I Bet Abliter­a­tion’s Cost Was Sloppy Im­ple­men­ta­tion. I Was Wrong

christian-mc14 Jun 2026 6:03 UTC
6 points
0 comments6 min readLW link

Don’t just aim for Fron­tier Labs

emile delcourt14 Jun 2026 4:41 UTC
4 points
0 comments28 min readLW link

Pay­ing Kids To Do Schoolwork

Jake Grover14 Jun 2026 3:15 UTC
5 points
5 comments2 min readLW link
(helixishere.substack.com)

Speed­ing Up JumpReLU SAE In­fer­ence with Cus­tom Tri­ton Ker­nels (2–14× on Real SAEs)

Daniel Tiourine14 Jun 2026 3:15 UTC
9 points
0 comments15 min readLW link

Im­pres­sions at the Ex­trem­ity of Civilization

Ben Pace14 Jun 2026 2:33 UTC
40 points
2 comments8 min readLW link

Our Work is Low Skill Expression

cantsaymuch14 Jun 2026 0:12 UTC
9 points
0 comments4 min readLW link

An­thropic Is Tak­ing AI Welfare Se­ri­ously. I’m Not Sure It Knows What It’s Mea­sur­ing.

Failfinder7013 Jun 2026 20:54 UTC
−1 points
4 comments3 min readLW link

A cheap spe­cial­ist judge gets used by agents but fails to re­duce al­ign­ment au­dit costs

burnssa13 Jun 2026 20:38 UTC
8 points
0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC
2 points
2 comments8 min readLW link
(archimedeanmonoid.substack.com)

Amer­i­can Govern­ment Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC
112 points
13 comments20 min readLW link
(thezvi.wordpress.com)

Not tel­ling is lying

Fernand013 Jun 2026 18:12 UTC
10 points
16 comments3 min readLW link

A sim­ple ar­gu­ment for try­ing less hard

Elias Schmied13 Jun 2026 18:12 UTC
13 points
3 comments3 min readLW link

How might con­tinual learn­ing af­fect safety and al­ign­ment?

13 Jun 2026 17:34 UTC
59 points
2 comments16 min readLW link

Pre­sent­ful­ness: Lu­cidity, Os­mo­sis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC
4 points
2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC
19 points
4 comments6 min readLW link
(www.uncertainupdates.com)

Some­what Con­tra Ted Chi­ang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC
8 points
0 comments10 min readLW link

The term “AGI” is al­most use­less at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC
30 points
1 comment5 min readLW link
(helentoner.substack.com)

SFT Drives Gem­ini’s Safety Properties

13 Jun 2026 15:31 UTC
69 points
3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC
8 points
5 comments1 min readLW link

AML for AI as a ver­ifi­ca­tion mechanism

MarkelKori13 Jun 2026 11:59 UTC
9 points
2 comments2 min readLW link

Pul­ling he­do­nic util­i­tar­i­anism out of eth­i­cal emotivism

Bill Jackson13 Jun 2026 11:50 UTC
6 points
2 comments6 min readLW link
(billjackson7.substack.com)

Te­quila Sun­set at the Hog’s Head (A Scene)

Ben Pace13 Jun 2026 6:53 UTC
22 points
1 comment5 min readLW link

US gov­ern­ment di­rec­tive to sus­pend ac­cess to Fable 5 and Mythos 5

Capybasilisk13 Jun 2026 1:16 UTC
67 points
15 comments1 min readLW link
(www.anthropic.com)

Do we learn less from our de­ci­sions than we think we do?

QuietCalibration13 Jun 2026 1:05 UTC
5 points
0 comments1 min readLW link

Ex­plo­ra­tion of a DNA Se­quenc­ing Base­caller us­ing Ac­ti­va­tion Patching

Madeleine L13 Jun 2026 0:58 UTC
3 points
0 comments6 min readLW link

Sandy Blvd as an ex­am­ple of complexity

Adam Zerner13 Jun 2026 0:28 UTC
20 points
0 comments2 min readLW link

Short Timelines Fa­vor Con­trol, Long Timelines Fa­vor In­fras­truc­ture Security

Jannis13 Jun 2026 0:12 UTC
7 points
0 comments3 min readLW link

Cat aller­gies & Cavities

Etha13 Jun 2026 0:11 UTC
6 points
1 comment2 min readLW link

When Emo­tion De­scrip­tors Fail: AI-Na­tive Func­tions of Emo­tion Vectors

CandidLind12 Jun 2026 23:20 UTC
8 points
0 comments27 min readLW link

A Gen­er­ated Web

Klemen12 Jun 2026 23:09 UTC
3 points
0 comments3 min readLW link

The Quest To Find The Next Big Com­mu­ni­ca­tors In AI Safety

Akshyae Singh12 Jun 2026 20:17 UTC
17 points
3 comments6 min readLW link

Up­dates on perfor­ma­tive misalignment

12 Jun 2026 20:15 UTC
22 points
0 comments12 min readLW link

Statis­ti­cal Physics for Am­bi­tious In­ter­pretabil­ity: A Work­shop Retrospective

12 Jun 2026 20:01 UTC
4 points
0 comments6 min readLW link

Cal­ibrat­ing Ac­ti­va­tion Vec­tors us­ing Norm

Kamesh R12 Jun 2026 19:59 UTC
1 point
0 comments3 min readLW link

Claude Fable 5 and Mythos 5: The Sys­tem Card

Zvi12 Jun 2026 18:50 UTC
48 points
1 comment29 min readLW link
(thezvi.wordpress.com)