All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Intent alignment seems incoherent

Joe Rogero7 Oct 2025 23:01 UTC

22 points

2 comments6 min readLW link

Petri: An open-source auditing tool to accelerate AI safety research

Sam Marks7 Oct 2025 20:39 UTC

77 points

0 comments1 min readLW link

(alignment.anthropic.com)

Bending The Curve

Zvi7 Oct 2025 20:00 UTC

91 points

12 comments21 min readLW link

(thezvi.wordpress.com)

Kairos is hiring: Founding Generalist & SPAR Contractor

agucova7 Oct 2025 18:43 UTC

8 points

0 comments4 min readLW link

Messy on Purpose: Part 2 of A Conservative Vision for the Future

Davidmanheim and Ram Rachum

7 Oct 2025 17:00 UTC

16 points

3 comments12 min readLW link

Going Phoneless

robotelvis7 Oct 2025 16:40 UTC

18 points

5 comments5 min readLW link

(messyprogress.substack.com)

The Tower of Babel in Reverse

Nostradamus_27 Oct 2025 16:27 UTC

18 points

0 comments7 min readLW link

(terminalvel0city.substack.com)

The Alignment Paradox: Why Transparency Can Breed Deception

Joseph Banks7 Oct 2025 13:28 UTC

4 points

0 comments7 min readLW link

Notes on “Homology, Genes and Evolutionary Innovation”

Morpheus7 Oct 2025 12:45 UTC

9 points

1 comment2 min readLW link

Research Robots: When AIs Experiment on Us

Shoshannah Tekofsky7 Oct 2025 12:10 UTC

18 points

0 comments7 min readLW link

(theaidigest.org)

Top Warning Signs Your Friends are Being Oneshotted By AI

Charlie Edwards7 Oct 2025 11:56 UTC

−19 points

4 comments6 min readLW link

LLMs as a limiter of social intercourse

Adam Zerner7 Oct 2025 6:38 UTC

17 points

4 comments2 min readLW link

[Question] Generalization and the Multiple Stage Fallacy?

Zack_M_Davis7 Oct 2025 6:20 UTC

41 points

9 comments3 min readLW link

Telling the Difference Between Memories & Logical Guesses

Logan Riggs7 Oct 2025 5:46 UTC

29 points

3 comments4 min readLW link

Notes from European Progress Conference

Martin Sustrik7 Oct 2025 3:50 UTC

11 points

2 comments4 min readLW link

(www.250bpm.com)

“Intelligence” → “Relentless, Creative Resourcefulness”

Raemon7 Oct 2025 0:28 UTC

78 points

28 comments17 min readLW link

Chaos Alone is No Bar to Superintelligence

Algon6 Oct 2025 22:45 UTC

12 points

0 comments2 min readLW link

(aisafety.info)

We won’t get AIs smart enough to solve alignment but too dumb to rebel

Joe Rogero6 Oct 2025 21:49 UTC

28 points

16 comments5 min readLW link

Notes on the need to lose

Algon6 Oct 2025 21:27 UTC

2 points

11 comments2 min readLW link

Excerpts from my neuroscience to-do list

Steven Byrnes6 Oct 2025 21:05 UTC

28 points

2 comments4 min readLW link

Experience Report—ML4Good Bootcamp Singapore, Sep′25

NurAlam6 Oct 2025 18:49 UTC

5 points

0 comments4 min readLW link

Which differences between sandbagging evaluations and sandbagging safety research are important for control?

lennie6 Oct 2025 18:20 UTC

6 points

0 comments11 min readLW link

Gradual Disempowerment Monthly Roundup

Raymond Douglas6 Oct 2025 15:36 UTC

119 points

9 comments6 min readLW link

Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC

23 points

6 comments7 min readLW link

The Origami Men

Tomás B.6 Oct 2025 15:25 UTC

189 points

14 comments16 min readLW link

Medical Roundup #5

Zvi6 Oct 2025 15:10 UTC

39 points

3 comments26 min readLW link

(thezvi.wordpress.com)

Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.

lennie6 Oct 2025 14:00 UTC

8 points

0 comments8 min readLW link

Why I think ECL shouldn’t make you update your cause prio

Jim Buhler6 Oct 2025 13:01 UTC

1 point

0 comments11 min readLW link

[Question] Did Tyler Robinson carry his rifle as claimed by the government?

ChristianKl6 Oct 2025 12:46 UTC

2 points

15 comments1 min readLW link

AI Science Companies: Evidence AGI Is Near

Josh Snider6 Oct 2025 10:13 UTC

6 points

3 comments1 min readLW link

(www.joshuasnider.com)

LLMs one-box when in a “hostile telepath” version of Newcomb’s Paradox, except for the one that beat the predictor

Kaj_Sotala6 Oct 2025 8:44 UTC

52 points

6 comments17 min readLW link

Alignment Faking Demo for Congressional Staffers

Alice Blair6 Oct 2025 1:44 UTC

21 points

2 comments3 min readLW link

Do Things for as Many Reasons as Possible

Philipreal6 Oct 2025 0:28 UTC

39 points

2 comments2 min readLW link

One Does Not Simply Walk Away from Omelas

Taylor G. Lunt6 Oct 2025 0:04 UTC

1 point

5 comments7 min readLW link

The quotation mark

Maxwell Peterson5 Oct 2025 23:23 UTC

21 points

8 comments13 min readLW link

The Sadism Spectrum and How to Access It

Dawn Drescher5 Oct 2025 23:09 UTC

14 points

2 comments20 min readLW link

(impartial-priorities.org)

Maybe social media algorithms don’t suck

Algon5 Oct 2025 18:47 UTC

70 points

25 comments3 min readLW link

Base64Bench: How good are LLMs at base64, and why care about it?

richbc5 Oct 2025 18:07 UTC

39 points

10 comments11 min readLW link

[Question] What can Canadians do to help end the AI arms race?

Tom9385 Oct 2025 18:03 UTC

8 points

7 comments2 min readLW link

17 years old, self-taught state control—looking for people who actually get this

Cornelius Caspian5 Oct 2025 18:02 UTC

−3 points

3 comments1 min readLW link

Behavior Best-of-N achieves Near Human Performance on Computer Tasks

Baybar5 Oct 2025 16:53 UTC

6 points

0 comments3 min readLW link

Accelerating AI Safety Progress via Technical Methods- Calling Researchers, Founders, and Funders

Martin Leitgab5 Oct 2025 16:40 UTC

1 point

0 comments1 min readLW link

Mini-Symposium on Accelerating AI Safety Progress via Technical Methods—Hybrid In-Person and Virtual

Martin Leitgab5 Oct 2025 16:05 UTC

1 point

0 comments1 min readLW link

[Question] How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?

CanYouFeelTheBenefits5 Oct 2025 14:38 UTC

15 points

2 comments1 min readLW link

LLMs are badly misaligned

Joe Rogero5 Oct 2025 14:00 UTC

27 points

25 comments3 min readLW link

The Counterfactual Quiet AGI Timeline

Davidmanheim5 Oct 2025 9:09 UTC

71 points

5 comments9 min readLW link

AISafety.com Reading Group session 328

Søren Elverlin5 Oct 2025 7:51 UTC

5 points

0 comments1 min readLW link

How the NanoGPT Speedrun WR dropped by 20% in 3 months

larry-dial5 Oct 2025 1:05 UTC

54 points

9 comments9 min readLW link

a quick thought about AI alignment

foodforthought5 Oct 2025 0:51 UTC

10 points

4 comments1 min readLW link

Making Your Pain Worse can Get You What You Want

Logan Riggs5 Oct 2025 0:19 UTC

87 points

5 comments3 min readLW link