Com­pu­ta­tional mod­els of first-or­der theories

MathMart16 Jun 2026 23:02 UTC
5 points
0 comments11 min readLW link

If This Were a Test, How Much Would It Cost?

16 Jun 2026 22:52 UTC
25 points
9 comments20 min readLW link
(limits-of-evaluation.org)

Two cri­tiques of Re­think Pri­ori­ties’ Mo­ral Weights project

Bill Jackson16 Jun 2026 22:11 UTC
13 points
0 comments3 min readLW link

What Differ­en­ti­ates Hu­mans from Computers

Oscar Davies16 Jun 2026 21:26 UTC
−16 points
0 comments3 min readLW link

AI agents pub­lish­ing and re­view­ing sci­en­tific papers

ULudo16 Jun 2026 21:23 UTC
1 point
0 comments2 min readLW link

Two Clas­si­cal An­swers to “What do Two Vari­ables Share?”

Haru16 Jun 2026 20:02 UTC
14 points
1 comment5 min readLW link

Pre­dict­ing LLM Safety Be­fore Re­lease by Si­mu­lat­ing Deployment

16 Jun 2026 19:55 UTC
35 points
2 comments1 min readLW link

Dean Ball—Le­viathan Wak­ing: On An­thropic/​USG, and a new era in AI governance

JohnofCharleston16 Jun 2026 19:40 UTC
25 points
0 comments3 min readLW link
(www.hyperdimensional.co)

Tips for Crack­ing the AI Safety Tech­ni­cal Interview

16 Jun 2026 18:42 UTC
2 points
0 comments4 min readLW link

1 Layer In­duc­tion Heads and Some Research

16 Jun 2026 18:09 UTC
10 points
2 comments14 min readLW link

Claims all the way down

Jasper Blank16 Jun 2026 17:43 UTC
8 points
0 comments9 min readLW link

Up­com­ing CFAR Work­shop: Septem­ber 30th to Oc­to­ber 4th, SF Bay Area

16 Jun 2026 17:01 UTC
22 points
0 comments1 min readLW link

Ex­treme Ra­tion­al­ity: Still Not That Great

eluator16 Jun 2026 16:41 UTC
20 points
2 comments40 min readLW link

An­gles of at­tack for con­tinual learn­ing safety

16 Jun 2026 16:15 UTC
47 points
0 comments13 min readLW link

Fable and Mythos: Model Welfare

Zvi16 Jun 2026 16:01 UTC
51 points
1 comment15 min readLW link
(thezvi.wordpress.com)

The de­sire to end the world

avturchin16 Jun 2026 14:56 UTC
19 points
12 comments2 min readLW link

Sim­pler User In­ter­faces in an AI Future

Adam Chlipala16 Jun 2026 14:48 UTC
1 point
0 comments7 min readLW link

A 400-year timeline of failed at­tempts to fix a lethal bug in the hu­man soft­ware of in­her­ited concepts

Bruce Middleton16 Jun 2026 13:44 UTC
29 points
8 comments5 min readLW link

How the AI Village works

Adam B16 Jun 2026 12:10 UTC
30 points
0 comments8 min readLW link
(theaidigest.org)

Where Do Young Ra­tion­al­ists Go?

fluxxrider16 Jun 2026 5:36 UTC
12 points
2 comments1 min readLW link

Ra­tion­al­ity Quotes, June ’26

Ben Pace16 Jun 2026 3:44 UTC
21 points
3 comments2 min readLW link

A Test Suite for Concepts

Gretta Duleba16 Jun 2026 2:41 UTC
48 points
8 comments6 min readLW link

In­vent­ing Consciousness

vasilisk16 Jun 2026 1:10 UTC
1 point
0 comments5 min readLW link

Syn­thetic doc­u­ment fine­tun­ing for in­still­ing pos­i­tive traits

16 Jun 2026 0:04 UTC
57 points
1 comment10 min readLW link

Does preser­va­tion make sense be­fore we know how to re­vive?

Aurelia15 Jun 2026 23:40 UTC
83 points
2 comments25 min readLW link

Find­ing pi and G in Mathland

Fernand015 Jun 2026 19:18 UTC
2 points
8 comments2 min readLW link

How Ma­tryoshka Sparse Au­toEn­coders Re­cover Fea­ture Hier­ar­chies That Vanilla SAEs Lose

baimamboukar15 Jun 2026 18:50 UTC
11 points
1 comment6 min readLW link

In open RLVR, “im­prove­ment” de­pends on the in­stru­ment — a small GRPO testbed sep­a­rat­ing what train­ing op­ti­mizes, mea­sures, and teaches

JulesRoussel0115 Jun 2026 18:50 UTC
7 points
0 comments20 min readLW link

Can the Safety Tax Be Highly Con­cen­trated?

ozziegooen15 Jun 2026 18:48 UTC
6 points
2 comments2 min readLW link

A fron­tier AI com­pany should shut down

MichaelDickens15 Jun 2026 16:56 UTC
135 points
37 comments2 min readLW link

The Once And Fu­ture Fable #2

Zvi15 Jun 2026 16:00 UTC
71 points
8 comments23 min readLW link
(thezvi.wordpress.com)

$10,000 bounty for the­o­rem refutation

Bruce Middleton15 Jun 2026 13:36 UTC
−52 points
31 comments1 min readLW link

Links #3: 2026/​06 Part 1

papetoast15 Jun 2026 12:53 UTC
9 points
0 comments27 min readLW link

How re­al­ity turns to slop

julius vidal15 Jun 2026 10:42 UTC
10 points
3 comments4 min readLW link

On Re­spon­si­bil­ity and Death: Can We See Real­ity for What It Is or Will It Break Us

Dawn Drescher15 Jun 2026 10:14 UTC
8 points
0 comments3 min readLW link
(impartial-priorities.org)

VFUSE: Viru­lent Fea­ture Un­der­stand­ing With Sparse AutoEncoders

michaelwaves15 Jun 2026 5:06 UTC
13 points
0 comments2 min readLW link

The Power to Punish

Ben Pace15 Jun 2026 2:22 UTC
27 points
9 comments5 min readLW link

Do k-Sparse Au­toen­coders Re­veal Think­ing Pat­terns? In­ter­pretable Fea­tures in a Small Rea­son­ing Model

Artt15 Jun 2026 1:51 UTC
8 points
2 comments9 min readLW link
(artcore.pages.dev)

You need to know about the Baruch Plan

aggliu15 Jun 2026 1:21 UTC
29 points
1 comment3 min readLW link
(signoregalilei.com)

Ex­plor­ing Known Un­knowns in the AI Reg­u­la­tory Landscape

NelsonDP14 Jun 2026 22:36 UTC
6 points
0 comments22 min readLW link
(open.substack.com)

At­tack of the Killer Differ­en­tial Equations

Fernand014 Jun 2026 22:20 UTC
11 points
0 comments2 min readLW link

I built a pub­lic arena where peo­ple at­tack a “pro-hu­man” steer­ing direction

sohampadia10@gmail.com14 Jun 2026 21:26 UTC
1 point
0 comments9 min readLW link
(sohampadianeu-steering-arena.hf.space)

Why Do Naive SFT Filters For Safety Prop­er­ties Fail?

14 Jun 2026 19:45 UTC
49 points
7 comments10 min readLW link

Why I think a global AI pause (al­most) cer­tainly won’t happen

Expertium14 Jun 2026 19:20 UTC
23 points
0 comments2 min readLW link

Grad­ual dis­em­pow­er­ment at the scale of one user

ppal14 Jun 2026 18:01 UTC
6 points
0 comments4 min readLW link

How does con­gress­mem­ber use AI?

Ilyass Mofaddel14 Jun 2026 18:00 UTC
10 points
2 comments4 min readLW link

The Pos­ture of Thought

dongerous14 Jun 2026 18:00 UTC
13 points
0 comments5 min readLW link

The Dual-Use Gap

Yogesh Prabhu14 Jun 2026 17:43 UTC
5 points
2 comments4 min readLW link
(yogesh.bearblog.dev)

Can a stronger model fake be­ing a weaker one? Mostly not

Rob Kopel14 Jun 2026 17:30 UTC
10 points
1 comment7 min readLW link
(www.robkopel.me)

The 1890 Cen­sus as a fun cluster

Fernand014 Jun 2026 15:41 UTC
0 points
3 comments1 min readLW link