Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
156 points
37 comments2 min readLW link

NEPA, Per­mit­ting and En­ergy Roundup #2

Zvi8 Oct 2025 20:20 UTC
27 points
1 comment28 min readLW link
(thezvi.wordpress.com)

What shapes does rea­son­ing take but cir­cu­lar?

Algon8 Oct 2025 20:18 UTC
9 points
2 comments2 min readLW link

The Or­a­cle’s Gift

Karthik Tadepalli8 Oct 2025 20:13 UTC
5 points
1 comment3 min readLW link

Think­ing Math­e­mat­i­cally—Con­ver­gent Sequences

Yair Halberstadt8 Oct 2025 19:44 UTC
18 points
5 comments4 min readLW link

The Re­la­tion­ship Between So­cial Pu­n­ish­ment and Shared Maps

Zack_M_Davis8 Oct 2025 19:38 UTC
64 points
14 comments4 min readLW link
(zackmdavis.net)

IABIED: Paradigm Con­fu­sion and Overconfidence

PeterMcCluskey8 Oct 2025 19:19 UTC
12 points
14 comments11 min readLW link
(bayesianinvestor.com)

The Wise Ba­boon of Loyalty

Zander_Drax8 Oct 2025 18:48 UTC
13 points
0 comments4 min readLW link

Spooky Col­lu­sion at a Dis­tance with Su­per­ra­tional AI

bira8 Oct 2025 18:13 UTC
75 points
9 comments6 min readLW link

The Ar­chi­tec­ture of the Nar­cis­sis­tic False Self

Dawn Drescher8 Oct 2025 17:39 UTC
4 points
0 comments12 min readLW link
(impartial-priorities.org)

Reflec­tions on The Curve 2025

Gordon Seidoh Worley8 Oct 2025 17:20 UTC
18 points
0 comments2 min readLW link
(www.uncertainupdates.com)

Plans A, B, C, and D for mis­al­ign­ment risk

ryan_greenblatt8 Oct 2025 17:18 UTC
131 points
75 comments6 min readLW link

Halfhaven Digest #1

Taylor G. Lunt8 Oct 2025 14:24 UTC
15 points
0 comments3 min readLW link

Three Paths Through Manifold

8 Oct 2025 13:48 UTC
8 points
1 comment17 min readLW link
(open.substack.com)

The “cool idea” bias

James Diacoumis8 Oct 2025 12:29 UTC
17 points
2 comments3 min readLW link
(jamesdiacoumis.substack.com)

Ir­re­spon­si­ble Com­pa­nies Can Be Made of Re­spon­si­ble Employees

VojtaKovarik8 Oct 2025 11:47 UTC
80 points
16 comments5 min readLW link

Heaven, Hell, and Mechanics

Chris Scammell8 Oct 2025 11:05 UTC
39 points
5 comments3 min readLW link

10 Ways to Waste a Decade

Taylor G. Lunt8 Oct 2025 2:51 UTC
13 points
4 comments5 min readLW link

You Should Get a Reusable Mask

jefftk8 Oct 2025 2:40 UTC
96 points
28 comments1 min readLW link
(www.jefftk.com)

Re­plac­ing RL w/​ Pa­ram­e­ter-based Evolu­tion­ary Strategies

Logan Riggs8 Oct 2025 1:02 UTC
63 points
5 comments3 min readLW link

In­tent al­ign­ment seems incoherent

Joe Rogero7 Oct 2025 23:01 UTC
22 points
2 comments6 min readLW link

Petri: An open-source au­dit­ing tool to ac­cel­er­ate AI safety research

Sam Marks7 Oct 2025 20:39 UTC
77 points
0 comments1 min readLW link
(alignment.anthropic.com)

Bend­ing The Curve

Zvi7 Oct 2025 20:00 UTC
91 points
12 comments21 min readLW link
(thezvi.wordpress.com)

Kairos is hiring: Found­ing Gen­er­al­ist & SPAR Contractor

agucova7 Oct 2025 18:43 UTC
8 points
0 comments4 min readLW link

Messy on Pur­pose: Part 2 of A Con­ser­va­tive Vi­sion for the Future

7 Oct 2025 17:00 UTC
16 points
3 comments12 min readLW link

Go­ing Phoneless

robotelvis7 Oct 2025 16:40 UTC
18 points
5 comments5 min readLW link
(messyprogress.substack.com)

The Tower of Ba­bel in Reverse

Nostradamus_27 Oct 2025 16:27 UTC
18 points
0 comments7 min readLW link
(terminalvel0city.substack.com)

The Align­ment Para­dox: Why Trans­parency Can Breed Deception

Joseph Banks7 Oct 2025 13:28 UTC
4 points
0 comments7 min readLW link

Notes on “Ho­mol­ogy, Genes and Evolu­tion­ary In­no­va­tion”

Morpheus7 Oct 2025 12:45 UTC
9 points
1 comment2 min readLW link

Re­search Robots: When AIs Ex­per­i­ment on Us

Shoshannah Tekofsky7 Oct 2025 12:10 UTC
18 points
0 comments7 min readLW link
(theaidigest.org)

Top Warn­ing Signs Your Friends are Be­ing Oneshot­ted By AI

Charlie Edwards7 Oct 2025 11:56 UTC
−19 points
4 comments6 min readLW link

LLMs as a limiter of so­cial intercourse

Adam Zerner7 Oct 2025 6:38 UTC
17 points
4 comments2 min readLW link

[Question] Gen­er­al­iza­tion and the Mul­ti­ple Stage Fal­lacy?

Zack_M_Davis7 Oct 2025 6:20 UTC
41 points
9 comments3 min readLW link

Tel­ling the Differ­ence Between Me­mories & Log­i­cal Guesses

Logan Riggs7 Oct 2025 5:46 UTC
29 points
3 comments4 min readLW link

Notes from Euro­pean Progress Conference

Martin Sustrik7 Oct 2025 3:50 UTC
11 points
2 comments4 min readLW link
(www.250bpm.com)

“In­tel­li­gence” → “Re­lentless, Creative Re­source­ful­ness”

Raemon7 Oct 2025 0:28 UTC
78 points
28 comments17 min readLW link

Chaos Alone is No Bar to Superintelligence

Algon6 Oct 2025 22:45 UTC
12 points
0 comments2 min readLW link
(aisafety.info)

We won’t get AIs smart enough to solve al­ign­ment but too dumb to rebel

Joe Rogero6 Oct 2025 21:49 UTC
28 points
16 comments5 min readLW link

Notes on the need to lose

Algon6 Oct 2025 21:27 UTC
2 points
11 comments2 min readLW link

Ex­cerpts from my neu­ro­science to-do list

Steven Byrnes6 Oct 2025 21:05 UTC
28 points
2 comments4 min readLW link

Ex­pe­rience Re­port—ML4Good Boot­camp Sin­ga­pore, Sep′25

NurAlam6 Oct 2025 18:49 UTC
5 points
0 comments4 min readLW link

Which differ­ences be­tween sand­bag­ging eval­u­a­tions and sand­bag­ging safety re­search are im­por­tant for con­trol?

lennie6 Oct 2025 18:20 UTC
6 points
0 comments11 min readLW link

Grad­ual Disem­pow­er­ment Monthly Roundup

Raymond Douglas6 Oct 2025 15:36 UTC
119 points
9 comments6 min readLW link

Sublimi­nal Learn­ing, the Lot­tery-Ticket Hy­poth­e­sis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC
23 points
6 comments7 min readLW link

The Origami Men

Tomás B.6 Oct 2025 15:25 UTC
189 points
14 comments16 min readLW link

Med­i­cal Roundup #5

Zvi6 Oct 2025 15:10 UTC
39 points
3 comments26 min readLW link
(thezvi.wordpress.com)

Sand­bag­ging: dis­t­in­guish­ing de­tec­tion of un­der­perfor­mance from in­crim­i­na­tion, and the im­pli­ca­tions for down­stream in­ter­ven­tions.

lennie6 Oct 2025 14:00 UTC
8 points
0 comments8 min readLW link

Why I think ECL shouldn’t make you up­date your cause prio

Jim Buhler6 Oct 2025 13:01 UTC
1 point
0 comments11 min readLW link

[Question] Did Tyler Robin­son carry his rifle as claimed by the gov­ern­ment?

ChristianKl6 Oct 2025 12:46 UTC
2 points
15 comments1 min readLW link

AI Science Com­pa­nies: Ev­i­dence AGI Is Near

Josh Snider6 Oct 2025 10:13 UTC
6 points
3 comments1 min readLW link
(www.joshuasnider.com)