How to build a can­cer vac­cine, and whether they will work this time

Abhishaike Mahajan8 Jun 2026 20:45 UTC
58 points
9 comments25 min readLW link
(www.owlposting.com)

Effi­cient trade­offs and the safety-use­ful­ness trade­off model

Buck8 Jun 2026 20:28 UTC
42 points
1 comment8 min readLW link

Ac­cel­er­ated Skill Learn­ing via Dream Eng­ineer­ing and Biofeedback

Elliot Callender8 Jun 2026 20:08 UTC
5 points
2 comments3 min readLW link

How valuable are weak AI safety reg­u­la­tions?

MichaelDickens8 Jun 2026 18:24 UTC
28 points
0 comments6 min readLW link

How to re­duce ca­pa­bil­ity degra­da­tion from off-model SFT

8 Jun 2026 16:24 UTC
21 points
0 comments3 min readLW link

The Next Swan: Frank Ram­sey, Vari­able Hy­po­thet­i­cals, and the Bet on Induction

Ramseyian8 Jun 2026 12:01 UTC
4 points
0 comments18 min readLW link

Cover­age-driven al­ign­ment—What ‘Teach­ing Claude Why’ can bor­row from AV verification

Yoav Hollander8 Jun 2026 11:42 UTC
16 points
4 comments14 min readLW link
(blog.foretellix.com)

Bun’s Mi­gra­tion from Zig to Rust as a Po­ten­tial Case Study for Grad­ual Disempowerment

Sayhan Yalvaçer8 Jun 2026 7:06 UTC
96 points
8 comments3 min readLW link

Con­tra Dance at LessOnline

jefftk8 Jun 2026 5:50 UTC
23 points
0 comments1 min readLW link
(www.jefftk.com)

Honk­ing is good

PossiblyElaine8 Jun 2026 4:36 UTC
9 points
7 comments4 min readLW link
(open.substack.com)

The CIA be­lieves everything

volpe8 Jun 2026 0:43 UTC
22 points
10 comments2 min readLW link
(volpe.envs.net)

How do peo­ple stop spiral­ing about Roko’s Basilisk & acausal ex­tor­tion?

anon2028 Jun 2026 0:39 UTC
9 points
6 comments1 min readLW link

Con­tex­tual Iden­tity Laun­der­ing: How Claude’s Image Re­fusal Can Be Routed Through Web Search

Failfinder708 Jun 2026 0:39 UTC
7 points
2 comments9 min readLW link

Men­tal cau­sa­tion is not load-bearing

jessicata7 Jun 2026 20:43 UTC
38 points
4 comments10 min readLW link

How Far Apart Does a Model Think Its To­kens Are?

Brendan Long7 Jun 2026 20:20 UTC
47 points
9 comments10 min readLW link
(www.brendanlong.com)

Au­topi­lot Thinking

XelaP7 Jun 2026 20:20 UTC
10 points
4 comments6 min readLW link

Se­cret Loy­alties Likely Raise Re­mote-Influenceability

Kaustubh Kislay7 Jun 2026 17:51 UTC
13 points
0 comments6 min readLW link

From One Piece to One Pace - Vi­sion and mis­sion in co­or­di­na­tion of agents

a unemployed pastor- de S Brito7 Jun 2026 17:07 UTC
2 points
0 comments4 min readLW link

Ne­glected Ba­sics of AI Alignment

Quirinus_Quirrell7 Jun 2026 9:02 UTC
28 points
2 comments6 min readLW link

The Hats of LessOnline

AprilSR7 Jun 2026 8:57 UTC
15 points
2 comments3 min readLW link
(aprilsr.substack.com)

Can ac­ti­va­tion ver­bal­iz­ers sur­face an in­ter­nal chain of thought?

7 Jun 2026 4:24 UTC
122 points
0 comments16 min readLW link

Fron­tier Models Still Lag Be­hind Hu­mans at Ro­bust Belief-State Tracking

Lukas Frei6 Jun 2026 23:54 UTC
13 points
6 comments5 min readLW link

Com­ing Around To Poli­ti­cal Donations

jefftk6 Jun 2026 21:30 UTC
59 points
8 comments2 min readLW link
(www.jefftk.com)

Anal­y­sis of Me­tastable States in the Trans­former Ac­ti­va­tion Space

Zach Baker6 Jun 2026 21:30 UTC
10 points
0 comments20 min readLW link

The Di­a­mond Lemma

Isaac Newton6 Jun 2026 21:15 UTC
21 points
0 comments7 min readLW link
(archimedeanmonoid.substack.com)

Iliad is Hiring

Peter Jean6 Jun 2026 21:08 UTC
13 points
0 comments1 min readLW link

Against Corrigibility

peralice6 Jun 2026 20:28 UTC
66 points
17 comments12 min readLW link

The Resi­d­ual Stream Has a Geom­e­try of Time

Fodenthal6 Jun 2026 19:57 UTC
23 points
0 comments8 min readLW link

Ex­po­nen­tial Solitude

PeterMaui6 Jun 2026 19:49 UTC
5 points
1 comment9 min readLW link

Freud heard a ru­mor that Science ex­isted, and had a won­der­ful dream

Bruce Middleton6 Jun 2026 14:47 UTC
8 points
8 comments6 min readLW link

Coal­i­tional Dar­winism and the In­stru­men­tal Utility of Individuality

CarolusRenniusVitellius6 Jun 2026 12:53 UTC
25 points
5 comments17 min readLW link
(charlesr-w.github.io)

Why Soft­ware Au­toma­tion Is Hard

silentbob6 Jun 2026 8:56 UTC
114 points
20 comments12 min readLW link

What if An­thropic unilat­er­ally paused ca­pa­bil­ities de­vel­op­ment right now?

Karl von Wendt6 Jun 2026 7:39 UTC
61 points
15 comments3 min readLW link

Op­ti­mi­sa­tion over non-sta­tion­ary dis­tri­bu­tions cre­ates weirder minds

6 Jun 2026 0:05 UTC
36 points
8 comments4 min readLW link

[Question] Does robotics ca­pa­bil­ities re­search ac­cel­er­ate AGI timelines?

Master Chief5 Jun 2026 23:32 UTC
4 points
3 comments1 min readLW link

Eval­u­at­ing us­ing Mock Tool Calls to Quaran­tine Un­trusted Prompt Inputs

dgros5 Jun 2026 22:43 UTC
15 points
0 comments11 min readLW link

Two More Meth­ods for Con­sis­tency Train­ing and Some New Ways to Ap­ply It

5 Jun 2026 21:06 UTC
18 points
0 comments7 min readLW link

Re­vis­it­ing GSM-Sym­bolic: mod­els seem to rea­son okay, actually

Sturb5 Jun 2026 20:54 UTC
24 points
0 comments5 min readLW link

Ac­cept­ing Death & Adult Responsibility

Unreal5 Jun 2026 19:23 UTC
−19 points
10 comments4 min readLW link

The Masochis­tic Prior

Modulo.Roland5 Jun 2026 19:05 UTC
12 points
2 comments2 min readLW link
(substack.com)

Beyond the lex­i­cal per­son­al­ity traits: What is the struc­ture of per­son­al­ity?

tailcalled5 Jun 2026 19:05 UTC
60 points
1 comment5 min readLW link

Do not try to write your first re­search pub­li­ca­tion as a sin­gle author

Mikhail Mironov5 Jun 2026 18:31 UTC
12 points
0 comments5 min readLW link

Do We Want a Su­per­in­tel­li­gent Peo­ple-Pleaser?

GenericHousewife_B5 Jun 2026 18:07 UTC
1 point
0 comments6 min readLW link

Ex­plain­ing SAE Fea­tures With For­eign Nat­u­ral Lan­guage Autoencoders

fzaffino5 Jun 2026 17:51 UTC
17 points
1 comment8 min readLW link

Se­cureBio De­tec­tion is Hiring Soft­ware Engineers

jefftk5 Jun 2026 16:50 UTC
33 points
2 comments1 min readLW link
(www.jefftk.com)

One Year of PauseAI UK

5 Jun 2026 16:41 UTC
94 points
7 comments11 min readLW link
(pauseai.uk)

Learn­ings from start­ing an AI safety re­search team

5 Jun 2026 16:27 UTC
101 points
7 comments6 min readLW link

Prepar­ing for Warn­ing Shots to Cat­alyze In­ter­na­tional Co­op­er­a­tion on AGI Risks

5 Jun 2026 15:49 UTC
40 points
1 comment5 min readLW link

My re­search: a com­pu­ta­tional cog­ni­tive neu­ro­science per­spec­tive on al­ign­ment

Seth Herd5 Jun 2026 14:19 UTC
52 points
0 comments18 min readLW link

Edit­ing is Easy, but Re­vi­sion is Hard

IanWS5 Jun 2026 11:58 UTC
5 points
0 comments3 min readLW link
(write.ianwsperber.com)