All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Three Quotes on Transformative Technology

Chris_Leong1 Aug 2025 22:57 UTC

8 points

3 comments1 min readLW link

SB-1047 Documentary: The Post-Mortem

Michaël Trazzi1 Aug 2025 21:42 UTC

130 points

0 comments5 min readLW link

Persona vectors: monitoring and controlling character traits in language models

RunjinChen and Andy Arditi

1 Aug 2025 21:19 UTC

26 points

3 comments5 min readLW link

(arxiv.org)

Boots theory and Wikipedia

philh1 Aug 2025 20:30 UTC

9 points

12 comments12 min readLW link

(reasonableapproximation.net)

Podcast: Lincoln Quirk from Wave

Elizabeth1 Aug 2025 19:00 UTC

41 points

1 comment1 min readLW link

(acesounderglass.com)

AI in a vat: Fundamental limits of efficient world modelling for safe agent sandboxing

Fernando Rosas1 Aug 2025 18:37 UTC

34 points

3 comments15 min readLW link

The Dark Arts As A Scaffolding Skill For Rationality

Screwtape1 Aug 2025 17:12 UTC

85 points

25 comments7 min readLW link

Steve Petersen seeking funding

abramdemski1 Aug 2025 17:03 UTC

87 points

0 comments1 min readLW link

The Week in AI Governance

Zvi1 Aug 2025 12:20 UTC

18 points

1 comment24 min readLW link

(thezvi.wordpress.com)

Research Areas in AI Control (The Alignment Project by UK AISI)

Julian Stastny, Tomek Korbak, Mojmir, Buck and Alan Cooney

1 Aug 2025 10:27 UTC

25 points

0 comments18 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Methods for Post-training and Elicitation (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:27 UTC

12 points

0 comments6 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Benchmark Design and Evaluation (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:26 UTC

10 points

0 comments9 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Interpretability (The Alignment Project by UK AISI)

Joseph Bloom1 Aug 2025 10:26 UTC

14 points

0 comments5 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Cognitive Science (The Alignment Project by UK AISI)

Geoffrey Irving1 Aug 2025 10:26 UTC

12 points

0 comments6 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Learning Theory (The Alignment Project by UK AISI)

David Africa and Edmund Lau

1 Aug 2025 10:26 UTC

15 points

0 comments24 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Probabilistic Methods (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 10:26 UTC

4 points

0 comments4 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Economic Theory and Game Theory (The Alignment Project by UK AISI)

Cecilia Wood1 Aug 2025 10:25 UTC

4 points

0 comments6 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Computational Complexity Theory (The Alignment Project by UK AISI)

Simon Marshall1 Aug 2025 10:25 UTC

6 points

0 comments10 min readLW link

(alignmentproject.aisi.gov.uk)

Research Areas in Information Theory and Cryptography (The Alignment Project by UK AISI)

Simon Marshall1 Aug 2025 10:25 UTC

6 points

0 comments3 min readLW link

(alignmentproject.aisi.gov.uk)

Self-Alignment: Exploring the perspective of Analytical Psychology

JakeArgent1 Aug 2025 10:17 UTC

4 points

0 comments12 min readLW link

Research Areas in Evaluation and Guarantees in Reinforcement Learning (The Alignment Project by UK AISI)

Jacob Pfau and Benjamin Hilton

1 Aug 2025 9:53 UTC

14 points

0 comments11 min readLW link

(alignmentproject.aisi.gov.uk)

The Alignment Project by UK AISI

Mojmir, Benjamin Hilton, Jacob Pfau, Geoffrey Irving, Joseph Bloom, Tomek Korbak, David Africa and Edmund Lau

1 Aug 2025 9:52 UTC

29 points

0 comments2 min readLW link

(alignmentproject.aisi.gov.uk)

Prolific.com survey on AI pause

samuelshadrach1 Aug 2025 8:33 UTC

9 points

3 comments7 min readLW link

(samuelshadrach.com)

Some mistakes in thinking about AGI evolution and control

Remmelt1 Aug 2025 8:08 UTC

7 points

0 comments1 min readLW link

“Opponent shaping” as a model for manipulation and cooperation

Dan MacKinlay1 Aug 2025 7:50 UTC

16 points

0 comments17 min readLW link

(danmackinlay.name)

Two Kinds of Do Overs

jefftk1 Aug 2025 2:30 UTC

67 points

1 comment2 min readLW link

(www.jefftk.com)

Call on AI Companies: Publish Your Whistleblowing Policies

karl31 Jul 2025 22:04 UTC

20 points

3 comments7 min readLW link

Do Not Render Your Counterfactuals

AlphaAndOmega31 Jul 2025 21:35 UTC

106 points

19 comments5 min readLW link

(open.substack.com)

Emergence Is Beautiful—beauty and meaning in an entropic universe

James Stephen Brown31 Jul 2025 19:00 UTC

8 points

0 comments5 min readLW link

Sharpening the Shears: 8 Lessons from Garden Leave

Jordan Rubin31 Jul 2025 18:57 UTC

8 points

0 comments4 min readLW link

(jordanmrubin.substack.com)

AISN #60: The AI Action Plan

Corin Katzke and Dan H

31 Jul 2025 18:20 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

Approximating Human Preferences Using a Multi-Judge Learned System

JoseFaustino, eitan sprejer, Fernando Avalos and Augusto Bernardi

31 Jul 2025 18:01 UTC

19 points

0 comments13 min readLW link

Follow-up to “My Empathy Is Rarely Kind”

johnswentworth31 Jul 2025 17:21 UTC

81 points

42 comments2 min readLW link

Book Review: The MANIAC

Annapurna31 Jul 2025 15:18 UTC

15 points

6 comments2 min readLW link

(jorgevelez.substack.com)

Red-Thing-Ism

J Bostock31 Jul 2025 14:09 UTC

103 points

9 comments3 min readLW link

AI #127: Continued Claude Code Complications

Zvi31 Jul 2025 13:40 UTC

32 points

4 comments43 min readLW link

(thezvi.wordpress.com)

I am worried about near-term non-LLM AI developments

testingthewaters31 Jul 2025 13:15 UTC

256 points

56 comments5 min readLW link

What do we do about the Inevitable?

CSDD31 Jul 2025 10:22 UTC

−7 points

0 comments4 min readLW link

[Question] Several questions about Zen koans

Said Achmiz31 Jul 2025 6:35 UTC

24 points

21 comments3 min readLW link

Beyond Hangriness: A Deeper Framework for Emotional Clarity

jaredclucas30 Jul 2025 23:59 UTC

−7 points

0 comments5 min readLW link

LLMs Are Already Misaligned: Simple Experiments Prove It

Mackam30 Jul 2025 23:48 UTC

12 points

10 comments7 min readLW link

Replicators—Pandora’s dangerous children

James Stephen Brown30 Jul 2025 22:39 UTC

19 points

2 comments3 min readLW link

Exploration hacking: can reasoning models subvert RL?

Damon Falck, Joschka Braun and Eyon Jang

30 Jul 2025 22:02 UTC

17 points

4 comments9 min readLW link

[Research Note] Optimizing The Final Output Can Obfuscate CoT

lukemarks, jacob_drori, cloud and TurnTrout

30 Jul 2025 21:26 UTC

200 points

23 comments6 min readLW link

A Timing Problem for Instrumental Convergence

rhys southan30 Jul 2025 19:15 UTC

2 points

45 comments1 min readLW link

(link.springer.com)

Childhood and Education: College Admissions

Zvi30 Jul 2025 17:40 UTC

54 points

11 comments18 min readLW link

(thezvi.wordpress.com)

Apply to SPAR Fall 2025—80+ projects!

agucova30 Jul 2025 17:34 UTC

19 points

0 comments1 min readLW link

Dimensions of logical time as economic strategies

tayzzyronth30 Jul 2025 16:56 UTC

10 points

2 comments7 min readLW link

On Wireheading

Dave92F130 Jul 2025 16:26 UTC

10 points

4 comments3 min readLW link

Uncertain Updates: July 2025

Gordon Seidoh Worley30 Jul 2025 14:50 UTC

8 points

0 comments2 min readLW link

(uncertainupdates.substack.com)