All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Supervillain Monologues Are Unrealistic

Algon31 Oct 2025 23:58 UTC

94 points

18 comments2 min readLW link

Secretly Loyal AIs: Threat Vectors and Mitigation Strategies

Dave Banerjee31 Oct 2025 23:31 UTC

8 points

0 comments19 min readLW link

(substack.com)

Ink without haven

Dentosal31 Oct 2025 22:50 UTC

4 points

0 comments2 min readLW link

Apply to the Cambridge ERA:AI Winter 2026 Fellowship

Kyle O’Brien31 Oct 2025 22:26 UTC

5 points

3 comments1 min readLW link

FAQ: Expert Survey on Progress in AI methodology

KatjaGrace31 Oct 2025 16:51 UTC

15 points

0 comments19 min readLW link

(blog.aiimpacts.org)

Social media feeds ‘misaligned’ when viewed through AI safety framework, show researchers

Mordechai Rorvig31 Oct 2025 16:40 UTC

13 points

3 comments1 min readLW link

(www.foommagazine.org)

Crossword Halloween 2025: Manmade Horrors

jchan31 Oct 2025 16:19 UTC

7 points

0 comments1 min readLW link

Debugging Despair ~> A bet about Satisfaction and Values

FireBrito de S. Gabriel31 Oct 2025 14:00 UTC

2 points

0 comments2 min readLW link

Halfhaven Digest #3

Taylor G. Lunt31 Oct 2025 13:41 UTC

7 points

0 comments2 min readLW link

OpenAI Moves To Complete Potentially The Largest Theft In Human History

Zvi31 Oct 2025 13:20 UTC

77 points

12 comments19 min readLW link

(thezvi.wordpress.com)

A (bad) Definition of AGI

spookyuser31 Oct 2025 7:55 UTC

4 points

0 comments5 min readLW link

Modelling, Measuring, and Intervening on Goal-directed Behaviour in AI Systems

Mario Giulianelli, Raghu Arghal, Fade Chen, ndalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan and Gabriele Sarti

31 Oct 2025 1:28 UTC

15 points

0 comments8 min readLW link

Resampling Conserves Redundancy & Mediation (Approximately) Under the Jensen-Shannon Divergence

David Lorell31 Oct 2025 1:07 UTC

42 points

8 comments4 min readLW link

Centralization begets stagnation

Algon30 Oct 2025 23:49 UTC

6 points

0 comments2 min readLW link

Summary and Comments on Anthropic’s Pilot Sabotage Risk Report

GradientDissenter30 Oct 2025 20:19 UTC

29 points

0 comments5 min readLW link

Critical Fallibilism and Theory of Constraints in One Analyzed Paragraph

Elliot Temple30 Oct 2025 20:06 UTC

2 points

0 comments28 min readLW link

AI #140: Trying To Hold The Line

Zvi30 Oct 2025 18:30 UTC

26 points

1 comment52 min readLW link

(thezvi.wordpress.com)

Anthropic’s Pilot Sabotage Risk Report

dmz30 Oct 2025 17:50 UTC

32 points

2 comments3 min readLW link

(alignment.anthropic.com)

AISLE discovered three new OpenSSL vulnerabilities

Jan_Kulveit30 Oct 2025 16:32 UTC

64 points

7 comments1 min readLW link

(aisle.com)

Sonnet 4.5′s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals

Alexa Pan and ryan_greenblatt

30 Oct 2025 15:34 UTC

144 points

22 comments14 min readLW link

Steering Evaluation-Aware Models to Act Like They Are Deployed

Tim Hua, andrq, Sam Marks and Neel Nanda

30 Oct 2025 15:03 UTC

62 points

12 comments18 min readLW link

On The Conservation of Rights

Roman Maksimovich30 Oct 2025 13:48 UTC

−2 points

2 comments8 min readLW link

When “HDMI-1” Lies To You

Gunnar_Zarncke30 Oct 2025 12:23 UTC

18 points

0 comments1 min readLW link

[Question] Why there is still one instance of Eliezer Yudkowsky?

RomanS30 Oct 2025 12:00 UTC

−9 points

8 comments1 min readLW link

Interview on the Hengshui Model High School

L.M.Sherlock30 Oct 2025 10:26 UTC

21 points

2 comments30 min readLW link

(lmsherlock.substack.com)

Transcendental Argumentation and the Epistemics of Discourse

0xA30 Oct 2025 6:37 UTC

1 point

2 comments3 min readLW link

Emergent Introspective Awareness in Large Language Models

Drake Thomas30 Oct 2025 4:42 UTC

132 points

19 comments1 min readLW link

(transformer-circuits.pub)

Introducing Aeonisk: an Open Source Game and Dataset with Graded Outcome Tiers of Counterfactual Reasoning

threeriversainexus30 Oct 2025 3:02 UTC

1 point

0 comments4 min readLW link

ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents

Ziqian Zhong30 Oct 2025 2:52 UTC

62 points

5 comments3 min readLW link

(arxiv.org)

LLM Hallucinations: An Internal Tug of War

violazhong30 Oct 2025 1:21 UTC

9 points

0 comments3 min readLW link

Genius is Not About Genius

Algon30 Oct 2025 0:00 UTC

14 points

1 comment2 min readLW link

Quotes on OpenAI’s timelines to automated research, safety research, and safety collaborations before recursive self improvement

TheManxLoiner29 Oct 2025 21:47 UTC

17 points

0 comments3 min readLW link

An Opinionated Guide to Privacy Despite Authoritarianism

TurnTrout29 Oct 2025 20:32 UTC

181 points

31 comments4 min readLW link

(turntrout.com)

Unsureism: The Rational Approach to Religious Uncertainty

Taylor G. Lunt29 Oct 2025 19:45 UTC

−7 points

3 comments5 min readLW link

Why you shouldn’t eat meat if you hate factory farming

ceselder29 Oct 2025 17:00 UTC

6 points

4 comments4 min readLW link

The End of OpenAI’s Nonprofit Era

garrison29 Oct 2025 16:28 UTC

41 points

0 comments9 min readLW link

(www.obsolete.pub)

An intro to the Tensor Economics blog

harsimony29 Oct 2025 16:24 UTC

15 points

0 comments12 min readLW link

(splittinginfinity.substack.com)

Uncertain Updates: October 2025

Gordon Seidoh Worley29 Oct 2025 16:10 UTC

3 points

0 comments1 min readLW link

(www.uncertainupdates.com)

AI Doomers Should Raise Hell

James_Miller29 Oct 2025 16:10 UTC

−2 points

9 comments6 min readLW link

AISN #65: Measuring Automation and Superintelligence Moratorium Letter

Alice Blair and Dan H

29 Oct 2025 16:05 UTC

5 points

0 comments3 min readLW link

(newsletter.safe.ai)

TBC Episode with Max Harms—Red Heart and If Anyone Builds It, Everyone Dies

Steven K Zuber29 Oct 2025 15:49 UTC

13 points

0 comments1 min readLW link

(www.thebayesianconspiracy.com)

[Question] Thresholds for Pascal’s Mugging?

MattAlexander29 Oct 2025 14:54 UTC

22 points

12 comments8 min readLW link

Please Do Not Sell B30A Chips to China

Zvi29 Oct 2025 14:50 UTC

62 points

6 comments7 min readLW link

(thezvi.wordpress.com)

Why Civilizations Are Unstable (And What This Means for AI Alignment)

Elias_Kunnas29 Oct 2025 12:27 UTC

10 points

6 comments5 min readLW link

What can we learn from parent-child-alignment for AI?

Karl von Wendt29 Oct 2025 8:02 UTC

16 points

4 comments3 min readLW link

Some data from LeelaPieceOdds

Jeremy Gillen29 Oct 2025 4:27 UTC

69 points

21 comments3 min readLW link

How Do We Evaluate the Quality of LLMs’ Mathematical Responses?

Miguel Angel29 Oct 2025 1:37 UTC

5 points

0 comments13 min readLW link

Visualizing a Platform for Live World Models

Kuil29 Oct 2025 1:24 UTC

16 points

0 comments14 min readLW link

[Question] Why Would we get Inner Misalignment by Default?

Coil29 Oct 2025 1:23 UTC

3 points

0 comments2 min readLW link

A Very Simple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC

18 points

0 comments9 min readLW link