All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May Jun Jul Aug SepOct

All 1 2 3 4 5 678

Chaos Alone is No Bar to Superintelligence

Algon6 Oct 2025 22:45 UTC

11 points

0 comments2 min readLW link

(aisafety.info)

We won’t get AIs smart enough to solve alignment but too dumb to rebel

Joe Rogero6 Oct 2025 21:49 UTC

28 points

16 comments5 min readLW link

Notes on the need to lose

Algon6 Oct 2025 21:27 UTC

2 points

6 comments2 min readLW link

Excerpts from my neuroscience to-do list

Steven Byrnes6 Oct 2025 21:05 UTC

26 points

1 comment4 min readLW link

Experience Report—ML4Good Bootcamp Singapore, Sep′25

NurAlam6 Oct 2025 18:49 UTC

2 points

0 comments4 min readLW link

Which differences between sandbagging evaluations and sandbagging safety research are important for control?

lennie6 Oct 2025 18:20 UTC

1 point

0 comments11 min readLW link

Gradual Disempowerment Monthly Roundup

Raymond Douglas6 Oct 2025 15:36 UTC

93 points

7 comments6 min readLW link

Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity

David Africa6 Oct 2025 15:26 UTC

16 points

3 comments7 min readLW link

The Origami Men

Tomás B.6 Oct 2025 15:25 UTC

138 points

9 comments16 min readLW link

Medical Roundup #5

Zvi6 Oct 2025 15:10 UTC

26 points

2 comments26 min readLW link

(thezvi.wordpress.com)

Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.

lennie6 Oct 2025 14:00 UTC

1 point

0 comments8 min readLW link

Why I think ECL shouldn’t make you update your cause prio

Jim Buhler6 Oct 2025 13:01 UTC

2 points

0 comments11 min readLW link

[Question] Did Tyler Robinson carry his rifle as claimed by the government?

ChristianKl6 Oct 2025 12:46 UTC

4 points

9 comments1 min readLW link

AI Science Companies: Evidence AGI Is Near

Josh Snider6 Oct 2025 10:13 UTC

5 points

3 comments1 min readLW link

(www.joshuasnider.com)

LLMs one-box when in a “hostile telepath” version of Newcomb’s Paradox, except for the one that beat the predictor

Kaj_Sotala6 Oct 2025 8:44 UTC

47 points

6 comments17 min readLW link

Alignment Faking Demo for Congressional Staffers

Alice Blair6 Oct 2025 1:44 UTC

19 points

0 comments3 min readLW link

Do Things for as Many Reasons as Possible

Philipreal6 Oct 2025 0:28 UTC

35 points

1 comment2 min readLW link

One Does Not Simply Walk Away from Omelas

Taylor G. Lunt6 Oct 2025 0:04 UTC

4 points

5 comments7 min readLW link

The quotation mark

Maxwell Peterson5 Oct 2025 23:23 UTC

19 points

8 comments13 min readLW link

The Sadism Spectrum and How to Access It

Dawn Drescher5 Oct 2025 23:09 UTC

13 points

2 comments20 min readLW link

(impartial-priorities.org)

Maybe social media algorithms don’t suck

Algon5 Oct 2025 18:47 UTC

64 points

18 comments3 min readLW link

Base64Bench: How good are LLMs at base64, and why care about it?

richbc5 Oct 2025 18:07 UTC

31 points

6 comments11 min readLW link

[Question] What can Canadians do to help end the AI arms race?

Tom9385 Oct 2025 18:03 UTC

8 points

7 comments2 min readLW link

17 years old, self-taught state control—looking for people who actually get this

Cornelius Caspian5 Oct 2025 18:02 UTC

−3 points

3 comments1 min readLW link

Behavior Best-of-N achieves Near Human Performance on Computer Tasks

Baybar5 Oct 2025 16:53 UTC

6 points

0 comments3 min readLW link

Accelerating AI Safety Progress via Technical Methods- Calling Researchers, Founders, and Funders

Martin Leitgab5 Oct 2025 16:40 UTC

1 point

0 comments1 min readLW link

Mini-Symposium on Accelerating AI Safety Progress via Technical Methods—Hybrid In-Person and Virtual

Martin Leitgab5 Oct 2025 16:05 UTC

1 point

0 comments1 min readLW link

[Question] How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?

CanYouFeelTheBenefits5 Oct 2025 14:38 UTC

14 points

1 comment1 min readLW link

LLMs are badly misaligned

Joe Rogero5 Oct 2025 14:00 UTC

27 points

25 comments3 min readLW link

The Counterfactual Quiet AGI Timeline

Davidmanheim5 Oct 2025 9:09 UTC

64 points

5 comments9 min readLW link

AISafety.com Reading Group session 328

Søren Elverlin5 Oct 2025 7:51 UTC

5 points

0 comments1 min readLW link

How the NanoGPT Speedrun WR dropped by 20% in 3 months

larry-dial5 Oct 2025 1:05 UTC

26 points

9 comments9 min readLW link

a quick thought about AI alignment

foodforthought5 Oct 2025 0:51 UTC

10 points

4 comments1 min readLW link

Making Your Pain Worse can Get You What You Want

Logan Riggs5 Oct 2025 0:19 UTC

76 points

4 comments3 min readLW link

Markets in Democracy: What happens when you can sell your vote?

Mike Evron4 Oct 2025 23:59 UTC

4 points

20 comments3 min readLW link

$250 bounties for the best short stories set in our near future world & Brooklyn event to select them

Ramon Gonzalez4 Oct 2025 22:49 UTC

10 points

0 comments2 min readLW link

What I’ve Learnt About How to Sleep

Algon4 Oct 2025 20:52 UTC

25 points

7 comments2 min readLW link

The ‘Magic’ of LLMs: The Function of Language

Joseph Banks4 Oct 2025 17:45 UTC

13 points

0 comments7 min readLW link

Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees

miriam.hinthorn4 Oct 2025 17:42 UTC

3 points

0 comments1 min readLW link

Consider Small Walks at Work

Morpheus4 Oct 2025 11:53 UTC

10 points

0 comments3 min readLW link

Where does Sonnet 4.5′s desire to “not get too comfortable” come from?

Kaj_Sotala4 Oct 2025 10:19 UTC

91 points

16 comments64 min readLW link

A Workflow for System Prompted Model Organisms

michaelwaves3 Oct 2025 21:39 UTC

1 point

0 comments3 min readLW link

Goodness is harder to achieve than competence

Joe Rogero3 Oct 2025 21:32 UTC

22 points

0 comments3 min readLW link

Memory Decoding Journal Club: Connectomic traces of Hebbian plasticity in the entorhinal-hippocampal system

Devin Ward3 Oct 2025 21:24 UTC

1 point

0 comments1 min readLW link

Good is a smaller target than smart

Joe Rogero3 Oct 2025 21:04 UTC

21 points

0 comments2 min readLW link

Making Sense of Consciousness Part 6: Perceptions of Disembodiment

sarahconstantin3 Oct 2025 20:40 UTC

27 points

0 comments8 min readLW link

(sarahconstantin.substack.com)

Recent AI Experiences

abramdemski3 Oct 2025 19:32 UTC

54 points

1 comment6 min readLW link

Our Experience Running Independent Evaluations on LLMs: What Have We Learned?

MAlvarado3 Oct 2025 18:26 UTC

7 points

1 comment5 min readLW link

Do One New Thing A Day To Solve Your Problems

Algon3 Oct 2025 17:08 UTC

102 points

5 comments2 min readLW link

ENAIS is looking for an Executive Director (apply by 20th October)

gergogaspar and ENAIS

3 Oct 2025 15:29 UTC

11 points

0 comments2 min readLW link