All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May JunJulAug Sep Oct

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

AI Agent Benchmarks Are Broken

Sasha Cui8 Jul 2025 22:11 UTC

10 points

0 comments1 min readLW link

(ddkang.substack.com)

Why Do Some Language Models Fake Alignment While Others Don’t?

abhayesian, John Hughes, Alex Mallen, Jozdien, janus and Fabien Roger

8 Jul 2025 21:49 UTC

158 points

14 comments5 min readLW link

(arxiv.org)

A Medium Scenario

Chapin Lenthall-Cleary8 Jul 2025 20:09 UTC

18 points

12 comments20 min readLW link

An Opinionated Guide to Using Anki Correctly

Luise8 Jul 2025 20:01 UTC

156 points

58 comments27 min readLW link

Lenses, Metaphors, and Meaning

WillPetillo, Sean Herrington, Spencer Ames, Adebayo Mubarak and Cancus

8 Jul 2025 19:46 UTC

7 points

0 comments4 min readLW link

Applying right-wing frames to AGI (geo)politics

Richard_Ngo8 Jul 2025 18:03 UTC

64 points

25 comments3 min readLW link

(x.com)

The Unjournal’s “Pivotal Questions” project

david reinstein8 Jul 2025 15:55 UTC

6 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Balsa Update: Springtime in DC

Zvi8 Jul 2025 15:00 UTC

61 points

6 comments10 min readLW link

(thezvi.wordpress.com)

MIT FutureTech are hiring a Postdoctoral Associate to work on AI Performance and Safety

peterslattery8 Jul 2025 14:02 UTC

3 points

0 comments4 min readLW link

Energy-Based Transformers are Scalable Learners and Thinkers

Matrice Jacobine8 Jul 2025 13:44 UTC

7 points

5 comments1 min readLW link

(energy-based-transformers.github.io)

LLMs are Capable of Misaligned Behavior Under Explicit Prohibition and Surveillance

Igor Ivanov8 Jul 2025 11:50 UTC

28 points

8 comments7 min readLW link

The Connection

Alexandre Variengien8 Jul 2025 10:53 UTC

23 points

0 comments24 min readLW link

(alexandrevariengien.com)

Subversion via Focal Points: Investigating Collusion in LLM Monitoring

Olli Järviniemi8 Jul 2025 10:15 UTC

14 points

2 comments1 min readLW link

NYT article about the Zizians including quotes from Eliezer, Anna, Ozy, Jessica, Zvi

Matrice Jacobine8 Jul 2025 1:42 UTC

9 points

3 comments1 min readLW link

(www.nytimes.com)

A Theory of Structural Independence

Matthias G. Mayer7 Jul 2025 22:54 UTC

70 points

2 comments1 min readLW link

(arxiv.org)

Navigating Attention

jimmy7 Jul 2025 21:43 UTC

10 points

2 comments8 min readLW link

The Weighted Perplexity Benchmark: Tokenizer-Normalized Evaluation for Language Model Comparison

jessicata and viemccoy

7 Jul 2025 21:43 UTC

21 points

0 comments7 min readLW link

(www.morpheus.systems)

Planet X, Lord Kelvin, and the use of Structure as Fuel

David Björling7 Jul 2025 21:23 UTC

11 points

19 comments3 min readLW link

Art, rationality, and the “feeling” for rightness

Karthik Bala7 Jul 2025 20:09 UTC

1 point

2 comments3 min readLW link

Public anti-AI sentiment can be useful: three mechanisms

andyqhan7 Jul 2025 19:05 UTC

8 points

4 comments5 min readLW link

Literature Review: Risks of MDMA

Elizabeth7 Jul 2025 19:01 UTC

67 points

8 comments4 min readLW link

(acesounderglass.com)

AI Safety at the Frontier: Paper Highlights, June ’25

gasteigerjo7 Jul 2025 18:17 UTC

4 points

0 comments7 min readLW link

(open.substack.com)

You Can’t Objectively Compare Seven Bees to One Human

J Bostock7 Jul 2025 18:11 UTC

58 points

26 comments3 min readLW link

(jbostock.substack.com)

Economics of Claude 3 Opus Inference

Antra Tessera and janus

7 Jul 2025 15:53 UTC

34 points

0 comments11 min readLW link

On the functional self of LLMs

eggsyntax7 Jul 2025 15:39 UTC

95 points

35 comments8 min readLW link

Notes on Righteousness and Megalopsychia

David Gross7 Jul 2025 15:18 UTC

12 points

0 comments31 min readLW link

On Alpha School

Zvi7 Jul 2025 15:10 UTC

37 points

2 comments14 min readLW link

(thezvi.wordpress.com)

Sleeping Beauty and the Forever Muffin

OneManyNone7 Jul 2025 12:05 UTC

1 point

13 comments16 min readLW link

Resource guide: Unawareness, indeterminacy, and cluelessness

Anthony DiGiovanni7 Jul 2025 9:54 UTC

20 points

0 comments7 min readLW link

On music and language

Joey Marcellino7 Jul 2025 9:09 UTC

18 points

6 comments8 min readLW link

Manifesto for doing good science in AI

invertedpassion7 Jul 2025 7:33 UTC

2 points

1 comment5 min readLW link

The Base Model Lens

Adam Newgas7 Jul 2025 0:12 UTC

7 points

0 comments3 min readLW link

AXRP Episode 45 - Samuel Albanie on DeepMind’s AGI Safety Approach

DanielFilan6 Jul 2025 23:00 UTC

31 points

0 comments40 min readLW link

[DELETED]

Cody @ Keeper6 Jul 2025 19:26 UTC

1 point

0 comments2 min readLW link

A simple explanation of incomplete models

Cole Wyeth6 Jul 2025 19:09 UTC

19 points

1 comment5 min readLW link

Neuroscientist survey says P(brain preservation works) is substantial

Mati_Roy6 Jul 2025 18:03 UTC

11 points

1 comment1 min readLW link

Rational Animations’ video about scalable oversight and sandwiching

Writer6 Jul 2025 14:00 UTC

18 points

0 comments9 min readLW link

(youtu.be)

New Paper: It is time to move on from MCQs for LLM Evaluations

shash426 Jul 2025 11:48 UTC

9 points

0 comments2 min readLW link

[Question] How did you first understand cognitive biases? Looking for community experiences

Vladimir Loginov6 Jul 2025 10:48 UTC

8 points

3 comments1 min readLW link

The Compulsion For (Pseudo-)Mechanisms

adamShimi6 Jul 2025 10:46 UTC

31 points

8 comments12 min readLW link

(formethods.substack.com)

Nobody is Doing AI Benchmarking Right

Chapin Lenthall-Cleary6 Jul 2025 7:05 UTC

20 points

12 comments9 min readLW link

From Unruly Stacks to Organized Shelves: Toy Model Validation of Structured Priors in Sparse Autoencoders

Yuxiao6 Jul 2025 7:03 UTC

8 points

0 comments5 min readLW link

When the Smarter AI Lies Better: Can Debate-Based Oversight Catch Deceptive Code

oskarkraak6 Jul 2025 1:21 UTC

4 points

0 comments5 min readLW link

(oskarkraak.com)

Intelligence Futures

TheOtherSteven6 Jul 2025 1:19 UTC

13 points

3 comments7 min readLW link

(syin.bearblog.dev)

Shutdown Resistance in Reasoning Models

benwr, JeremySchlatter and Jeffrey Ladish

6 Jul 2025 0:01 UTC

138 points

14 comments9 min readLW link

(palisaderesearch.org)

The ultimate goal

Alvin Ånestrand5 Jul 2025 19:10 UTC

10 points

3 comments5 min readLW link

(forecastingaifutures.substack.com)

Interview with Carl Feynman on Imminent AI Existential Risk

Liron5 Jul 2025 18:49 UTC

30 points

1 comment40 min readLW link

Small foundational puzzle for causal theories of mechanistic interpretability

Frederik Hytting Jørgensen5 Jul 2025 17:46 UTC

6 points

6 comments2 min readLW link

Essential LLM Assumes We’re Conscious—Outside Reasoner AGI Won’t

FlorianH5 Jul 2025 16:04 UTC

1 point

0 comments3 min readLW link

(nearlyfar.org)

Masking on the Subway

jefftk5 Jul 2025 14:40 UTC

23 points

12 comments1 min readLW link

(www.jefftk.com)