All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May Jun Jul Aug SepOct

All 1 234 5 6 7 8

Omelas Is Perfectly Misread

Tobias H2 Oct 2025 23:11 UTC

197 points

49 comments5 min readLW link

Journalism about game theory could advance AI safety quickly

Chris Santos-Lang2 Oct 2025 23:05 UTC

4 points

0 comments3 min readLW link

(arxiv.org)

In which the author is struck by an electric couplet

Algon2 Oct 2025 21:46 UTC

10 points

5 comments2 min readLW link

Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most “classic humans” in a few decades.

Raemon2 Oct 2025 21:03 UTC

143 points

19 comments12 min readLW link

Eliciting secret knowledge from language models

Bartosz Cywiński, Arthur Conmy and Sam Marks

2 Oct 2025 20:57 UTC

67 points

3 comments2 min readLW link

(arxiv.org)

The Four Pillars: A Hypothesis for Countering Catastrophic Biological Risk

ASB2 Oct 2025 20:20 UTC

8 points

0 comments14 min readLW link

(defensesindepth.bio)

AI Risk: Can We Thread the Needle? [Recorded Talk from EA Summit Vancouver ’25]

Evan R. Murphy2 Oct 2025 19:08 UTC

6 points

0 comments2 min readLW link

Checking in on AI-2027

Baybar2 Oct 2025 18:46 UTC

119 points

21 comments4 min readLW link

Prompt Framing Changes LLM Performance (and Safety)

Kilian Merkelbach2 Oct 2025 18:29 UTC

4 points

0 comments7 min readLW link

No, That’s Not What the Flight Costs

Max Niederman2 Oct 2025 17:55 UTC

45 points

15 comments1 min readLW link

(maxniederman.com)

Why the Struggle for Safe AI Must Be Political

Alexander Müller and cansukutay

2 Oct 2025 16:38 UTC

−6 points

0 comments8 min readLW link

Why AI Caste bias is more Dangerous than you think

shanzson2 Oct 2025 16:36 UTC

0 points

1 comment6 min readLW link

Homo sapiens and homo silicus

Alexander Müller and Sophia Lopotaru

2 Oct 2025 16:33 UTC

6 points

0 comments3 min readLW link

How to Feel More Alive

Logan Riggs2 Oct 2025 15:45 UTC

47 points

2 comments4 min readLW link

AI and Biological Risk: Forecasting Key Capability Thresholds

Alvin Ånestrand2 Oct 2025 14:06 UTC

7 points

4 comments11 min readLW link

(forecastingaifutures.substack.com)

AI #136: A Song and Dance

Zvi2 Oct 2025 13:10 UTC

34 points

3 comments47 min readLW link

(thezvi.wordpress.com)

Some Biology Related Things I Found Interesting

Morpheus2 Oct 2025 12:18 UTC

37 points

9 comments2 min readLW link

Random safe AGI idea dump

sig2 Oct 2025 10:16 UTC

−3 points

0 comments3 min readLW link

How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?

CanYouFeelTheBenefits2 Oct 2025 10:02 UTC

5 points

0 comments1 min readLW link

Are we an ASI thought experiment?

Amy Rose Vossberg2 Oct 2025 1:43 UTC

−6 points

8 comments1 min readLW link

Why’s equality in logic less flexible than in category theory?

Algon1 Oct 2025 22:03 UTC

17 points

24 comments3 min readLW link

[Linkpost] A Field Guide to Writing Styles

Linch1 Oct 2025 21:49 UTC

17 points

0 comments17 min readLW link

(linch.substack.com)

</rant> </uncharitable> </psychologizing>

Raemon1 Oct 2025 21:20 UTC

53 points

11 comments2 min readLW link

How I think about alignment and ethics as a cooperation protocol software

Burny1 Oct 2025 21:09 UTC

3 points

0 comments1 min readLW link

Introducing the Mox Guest Program

Austin Chen, RobinGoins and Rachel Shu

1 Oct 2025 18:35 UTC

11 points

0 comments2 min readLW link

(moxsf.com)

The Problem of the Concentration of Power

hazem1 Oct 2025 18:13 UTC

−5 points

2 comments2 min readLW link

Claude Sonnet 4.5 Is A Very Good Model

Zvi1 Oct 2025 18:00 UTC

40 points

2 comments24 min readLW link

(thezvi.wordpress.com)

My Brush with Superhuman Persuasion

Ben S.1 Oct 2025 17:50 UTC

18 points

13 comments9 min readLW link

(thebsdetector.substack.com)

AI and Cheap Weapons

Felix C.1 Oct 2025 17:31 UTC

31 points

3 comments23 min readLW link

But what kind of stuff can you just do?

Bastiaan1 Oct 2025 16:58 UTC

25 points

5 comments1 min readLW link

AI Safety at the Frontier: Paper Highlights, September ’25

gasteigerjo1 Oct 2025 16:24 UTC

5 points

0 comments6 min readLW link

(aisafetyfrontier.substack.com)

Uncertain Updates: September 2025

Gordon Seidoh Worley1 Oct 2025 14:50 UTC

11 points

0 comments1 min readLW link

(uncertainupdates.substack.com)

[CS2881r] Optimizing Prompts with Reinforcement Learning

Anastasia Ahani and atticusw

1 Oct 2025 14:02 UTC

2 points

0 comments5 min readLW link

“Pessimization” is Just Ordinary Failure

J Bostock1 Oct 2025 13:48 UTC

56 points

2 comments6 min readLW link

Beyond the Zombie Argument

James Diacoumis1 Oct 2025 13:16 UTC

7 points

23 comments2 min readLW link

(jamesdiacoumis.substack.com)

Against the Inevitability of Habituation to Continuous Bliss

CanYouFeelTheBenefits1 Oct 2025 12:12 UTC

8 points

0 comments1 min readLW link

Lectures on statistical learning theory for alignment researchers

Vanessa Kosoy1 Oct 2025 8:36 UTC

41 points

1 comment1 min readLW link

(www.youtube.com)

Claude Sonnet 4.5: System Card and Alignment

Zvi30 Sep 2025 20:50 UTC

72 points

4 comments27 min readLW link

(thezvi.wordpress.com)

Halfhaven virtual blogger camp

Viliam30 Sep 2025 20:22 UTC

87 points

6 comments2 min readLW link

Masks: On the benefits and drawbacks of a society where everyone covering their face is the norm

3Nora30 Sep 2025 18:43 UTC

−3 points

1 comment3 min readLW link

How reimagining the nature of consciousness entirely changes the AI game

Jáchym Fibír30 Sep 2025 18:30 UTC

−9 points

0 comments14 min readLW link

(www.phiand.ai)

The Basic Case For Doom

Bentham's Bulldog30 Sep 2025 16:04 UTC

26 points

4 comments5 min readLW link

AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

JasonBrown30 Sep 2025 15:37 UTC

32 points

8 comments4 min readLW link

ARENA 7.0 - Call for Applicants

JScriven, JamesH, CallumMcDougall and David Quarel

30 Sep 2025 14:54 UTC

22 points

0 comments6 min readLW link

The famous survivorship bias image is a “loose reconstruction” of methods used on a hypothetical dataset

Lao Mein30 Sep 2025 13:13 UTC

47 points

0 comments1 min readLW link

[GDPval] Models Could Automate the U.S. Economy by 2027

bira30 Sep 2025 11:53 UTC

14 points

0 comments1 min readLW link

Ethical Design Patterns

AnnaSalamon30 Sep 2025 11:52 UTC

210 points

39 comments20 min readLW link

What is the Base Model Simulation of Human AI-Assistant Conversation?:

bodry30 Sep 2025 7:08 UTC

5 points

0 comments21 min readLW link

Firstpost: First impressions

Shell30 Sep 2025 2:23 UTC

14 points

1 comment1 min readLW link

Exploration of Counterfactual Importance and Attention Heads

Realmbird30 Sep 2025 1:17 UTC

12 points

0 comments6 min readLW link