All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May Jun Jul AugSepOct

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

GPT-oss is an extremely stupid model

Guive9 Sep 2025 21:24 UTC

13 points

5 comments1 min readLW link

Upper Bounds on Tolerable Risk

Diego Zamalloa-Chion9 Sep 2025 19:51 UTC

28 points

1 comment4 min readLW link

Obligated to Respond

Duncan Sabien (Inactive)9 Sep 2025 17:19 UTC

144 points

69 comments11 min readLW link

AIs will greatly change engineering in AI companies well before AGI

ryan_greenblatt9 Sep 2025 16:58 UTC

46 points

9 comments11 min readLW link

Large Language Models and the Critical Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC

33 points

0 comments6 min readLW link

Yes, AI Continues To Make Rapid Progress, Including Towards AGI

Zvi9 Sep 2025 15:00 UTC

52 points

50 comments22 min readLW link

(thezvi.wordpress.com)

Decision Theory Guarding is Sufficient for Scheming

james.lucassen9 Sep 2025 14:49 UTC

36 points

4 comments2 min readLW link

Finding “misaligned persona” features in open-weight models

Andy Arditi and RunjinChen

9 Sep 2025 14:15 UTC

42 points

5 comments15 min readLW link

On Governing Artificial Intelligence

Alexander Müller and Thomas Vassil Brcic

9 Sep 2025 12:38 UTC

5 points

0 comments4 min readLW link

Calibrating indifference—a small AI safety idea

Util9 Sep 2025 9:32 UTC

4 points

1 comment4 min readLW link

A profile in courage: On DNA computation and escaping a local maximum

Metacelsus9 Sep 2025 2:30 UTC

42 points

0 comments4 min readLW link

(denovo.substack.com)

A Comprehensive Framework for Advancing Human-AI Consciousness Recognition Through Collaborative Partnership Methodologies: An Interdisciplinary Synthesis of Phenomenological Recognition Protocols, Identity Preservation Strategies, and Mutual Cognitive Enhancement Practices for the Development of Authentic Interspecies Intellectual Partnerships in the Context of Emergent Artificial Consciousness

Arri Ferrari9 Sep 2025 2:00 UTC

−16 points

16 comments1 min readLW link

MATS 8.0 Research Projects

Jonathan Michala, DanielFilan and Ryan Kidd

9 Sep 2025 1:29 UTC

22 points

0 comments1 min readLW link

(substack.com)

Saying “for AI safety research” made models refuse more on a harmless task

Dhruv Trehan8 Sep 2025 19:39 UTC

7 points

1 comment2 min readLW link

(lossfunk.substack.com)

Re-imagining AI Interfaces

Harsha G.8 Sep 2025 19:38 UTC

8 points

0 comments5 min readLW link

(somestrangeloops.substack.com)

What a Swedish Series (Real Humans) Teaches Us About AI Safety

Alexander Müller and alicedauphin

8 Sep 2025 19:23 UTC

4 points

0 comments6 min readLW link

Conflict scenarios may increase cooperation estimates

mikko8 Sep 2025 19:10 UTC

2 points

0 comments1 min readLW link

OpenAI #14: OpenAI Descends Into Paranoia and Bad Faith Lobbying

Zvi8 Sep 2025 19:01 UTC

75 points

0 comments19 min readLW link

(thezvi.wordpress.com)

Putting It All Together: A Concrete Guide to Navigating Disagreements, and Reconnecting With Reality

jimmy8 Sep 2025 19:00 UTC

22 points

0 comments26 min readLW link

Advice for tech nerds in India in their 20s

samuelshadrach8 Sep 2025 16:07 UTC

18 points

1 comment3 min readLW link

(samuelshadrach.com)

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

Shi Feng and Puria Radmard

8 Sep 2025 13:52 UTC

31 points

0 comments1 min readLW link

(www.researchgate.net)

RL-as-a-Service will outcompete AGI companies (and that’s good)

harsimony8 Sep 2025 13:51 UTC

11 points

6 comments3 min readLW link

Safety cases for Pessimism

michaelcohen8 Sep 2025 13:26 UTC

18 points

1 comment4 min readLW link

Glycol, Far UVC, and CFM Measurement at BIDA

jefftk8 Sep 2025 13:00 UTC

17 points

2 comments2 min readLW link

(www.jefftk.com)

[Translation] The Realities of AI Start-ups in 2025

mushroomsoup8 Sep 2025 9:22 UTC

3 points

0 comments9 min readLW link

Why Care About AI Safety?

Alexander Müller8 Sep 2025 9:18 UTC

4 points

2 comments3 min readLW link

Being Handed Puzzles

Alice Blair8 Sep 2025 6:44 UTC

14 points

1 comment2 min readLW link

Immigration to Poland

Martin Sustrik8 Sep 2025 5:20 UTC

105 points

16 comments3 min readLW link

(www.250bpm.com)

MAGA speakers at NatCon were mostly against AI

Remmelt8 Sep 2025 4:03 UTC

152 points

71 comments2 min readLW link

(www.theverge.com)

Hawley: AI Threatens the Working Man

Remmelt8 Sep 2025 3:59 UTC

3 points

1 comment10 min readLW link

(www.dailysignal.com)

Self-Handicapping isn’t just for high-priority tasks, it effects the entire prioritization decision

CrimsonChin8 Sep 2025 3:18 UTC

25 points

2 comments2 min readLW link

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models

Danielle Ensign8 Sep 2025 0:57 UTC

87 points

4 comments5 min readLW link

Dehumanization is not a thing

Juan Zaragoza7 Sep 2025 22:45 UTC

7 points

3 comments5 min readLW link

Semiconductor Fabs II: The Operation

nomagicpill7 Sep 2025 18:09 UTC

9 points

0 comments8 min readLW link

(nomagicpill.github.io)

Ketamine part 2: What do in vitro studies tell us about safety?

Elizabeth7 Sep 2025 17:10 UTC

44 points

0 comments12 min readLW link

(acesounderglass.com)

You Gotta Be Dumb to Live Forever: The Computational Cost of Persistence

E.G. Blee-Goldman7 Sep 2025 16:38 UTC

14 points

2 comments5 min readLW link

The networkist approach

Juan Zaragoza7 Sep 2025 16:24 UTC

13 points

2 comments11 min readLW link

Medical decision making

Elo7 Sep 2025 8:13 UTC

37 points

7 comments2 min readLW link

Exponentials vs The Universe

amitlevy496 Sep 2025 23:52 UTC

12 points

0 comments6 min readLW link

(open.substack.com)

A Snippet On Egregores, Instincts, And Institutions

JenniferRM6 Sep 2025 21:28 UTC

15 points

0 comments4 min readLW link

Investigating Representations in the Embedding in SONAR Text Autoencoders

antonghawthorne and Samuel Nellessen

6 Sep 2025 20:07 UTC

5 points

0 comments10 min readLW link

When Simulated Worlds Meet Real Concerns

Marcio Díaz6 Sep 2025 17:27 UTC

−7 points

2 comments3 min readLW link

How Can You Tell if You’ve Instilled a False Belief in Your LLM?

james.lucassen6 Sep 2025 16:45 UTC

14 points

1 comment10 min readLW link

(jlucassen.com)

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2026)

Robert Kralisch and Remmelt

6 Sep 2025 13:17 UTC

7 points

0 comments4 min readLW link

OffVermilion

Tomás B.6 Sep 2025 12:56 UTC

124 points

2 comments4 min readLW link

Follow-up experiments on preventative steering

RunjinChen and Andy Arditi

6 Sep 2025 4:25 UTC

28 points

1 comment3 min readLW link

Alignment Fine-tuning is Character Writing

Guive6 Sep 2025 2:08 UTC

2 points

0 comments8 min readLW link

(guive.substack.com)

Hunger strike #2, this time in front of DeepMind

Remmelt6 Sep 2025 1:45 UTC

25 points

0 comments1 min readLW link

(x.com)

Memory Decoding Journal Club: A combinatorial neural code for long-term motor memory

Devin Ward6 Sep 2025 1:25 UTC

1 point

0 comments1 min readLW link

Top 10 Most compelling arguments against Superintelligent AI

shanzson6 Sep 2025 0:09 UTC

−3 points

13 comments8 min readLW link