All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The Roots of Progress wants your stories about the AI frontier

jasoncrawford6 Jun 2025 22:52 UTC

11 points

0 comments5 min readLW link

(newsletter.rootsofprogress.org)

Unsupervised Activation Steering: Find a steering vector that best represents any set of text data

Danielle Ensign6 Jun 2025 22:37 UTC

3 points

2 comments1 min readLW link

The Mirror Trap

Cameron Berg6 Jun 2025 22:30 UTC

94 points

13 comments4 min readLW link

AXRP Episode 42 - Owain Evans on LLM Psychology

DanielFilan6 Jun 2025 20:20 UTC

13 points

0 comments66 min readLW link

Apply now to Human-Aligned AI Summer School 2025

VojtaKovarik, Tomáš Gavenčiak and Jan_Kulveit

6 Jun 2025 19:31 UTC

28 points

1 comment2 min readLW link

(humanaligned.ai)

The Common Pile and Comma-v0.1

Trevor Hill-Hand6 Jun 2025 19:20 UTC

3 points

0 comments1 min readLW link

Maximal Curiousity is Not Useful

Max Niederman6 Jun 2025 19:08 UTC

11 points

0 comments2 min readLW link

Making deals with AIs: A tournament experiment with a bounty

KFinn and Xodarap

6 Jun 2025 18:51 UTC

24 points

0 comments8 min readLW link

DeepSeek-r1-0528 Did Not Have a Moment

Zvi6 Jun 2025 15:40 UTC

30 points

2 comments15 min readLW link

(thezvi.wordpress.com)

Lessons from a year of university AI safety field building

yix, afterless, Parv Mahajan, Andersehen, Tuna and neverix

6 Jun 2025 14:35 UTC

35 points

3 comments7 min readLW link

The Demon of Interrelation

Jack6 Jun 2025 8:19 UTC

−2 points

0 comments8 min readLW link

Real-time voice translation

samuelshadrach6 Jun 2025 7:40 UTC

2 points

0 comments1 min readLW link

Liability for Misuse of Models—Dean Ball’s Proposal

Stephen Martin6 Jun 2025 5:34 UTC

2 points

0 comments9 min readLW link

How do AI agents work together when they can’t trust each other?

James Sullivan6 Jun 2025 3:10 UTC

17 points

0 comments8 min readLW link

(jamessullivan092.substack.com)

Large Language Models suffer from Anterograde Amnesia

Annapurna6 Jun 2025 1:30 UTC

7 points

0 comments3 min readLW link

(jorgevelez.substack.com)

Discontinuous Linear Functions?!

Zack_M_Davis6 Jun 2025 0:29 UTC

46 points

11 comments2 min readLW link

(zackmdavis.net)

Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion

ChengCheng, ChrisCundy, smallsilo and AdamGleave

5 Jun 2025 23:07 UTC

22 points

2 comments5 min readLW link

(far.ai)

Introducing: Meridian Cambridge’s new online lecture series covering frontier AI and AI safety

Meridian Cambridge5 Jun 2025 21:55 UTC

1 point

0 comments1 min readLW link

cheaper sodium electrolysis

bhauth5 Jun 2025 21:49 UTC

23 points

3 comments4 min readLW link

(www.bhauth.com)

Histograms are to CDFs as calibration plots are to...

Optimization Process5 Jun 2025 20:20 UTC

35 points

9 comments1 min readLW link

(optimizationprocess.com)

Integration Bandwidth: The Mechanism Behind Intelligence and Puberty

Dortex5 Jun 2025 19:37 UTC

−1 points

4 comments1 min readLW link

(osf.io)

Levels of Doom: Eutopia, Disempowerment, Extinction

Vladimir_Nesov5 Jun 2025 19:08 UTC

34 points

1 comment2 min readLW link

LLM in-context learning as (approximating) Solomonoff induction

Cole Wyeth5 Jun 2025 17:45 UTC

31 points

3 comments4 min readLW link

Fundamental Uncertainty: Chapter 2 - How do words get their meaning?

Gordon Seidoh Worley5 Jun 2025 16:32 UTC

11 points

2 comments11 min readLW link

AI Might Kill Everyone

Bentham's Bulldog5 Jun 2025 15:37 UTC

6 points

0 comments4 min readLW link

AI #119: Goodbye AISI?

Zvi5 Jun 2025 14:00 UTC

42 points

8 comments60 min readLW link

(thezvi.wordpress.com)

Powerful Predictions

Alvin Ånestrand5 Jun 2025 10:44 UTC

2 points

0 comments6 min readLW link

(forecastingaifutures.substack.com)

Potentially Useful Projects in Wise AI

Chris_Leong5 Jun 2025 8:13 UTC

12 points

0 comments5 min readLW link

Building as gardening

Itay Dreyfus5 Jun 2025 6:41 UTC

3 points

1 comment4 min readLW link

(productidentity.co)

Semiconductor Fabs I: The Equipment

nomagicpill4 Jun 2025 22:09 UTC

19 points

0 comments19 min readLW link

(nomagicpill.github.io)

The Stereotype of the Stereotype

Ike4 Jun 2025 21:06 UTC

58 points

17 comments9 min readLW link

2. Why intuitive comparisons of large-scale impact are unjustified

Anthony DiGiovanni4 Jun 2025 20:30 UTC

25 points

0 comments16 min readLW link

Dating Roundup #6

Zvi4 Jun 2025 20:00 UTC

36 points

2 comments55 min readLW link

(thezvi.wordpress.com)

Rational Prime Calendar

RickHull4 Jun 2025 19:30 UTC

−1 points

0 comments3 min readLW link

A Technique of Pure Reason

Adam Newgas4 Jun 2025 19:07 UTC

11 points

3 comments2 min readLW link

“Flaky breakthroughs” pervade inner work — but almost no one tracks them

Chris Lakin4 Jun 2025 19:02 UTC

216 points

45 comments2 min readLW link

(chrislakin.blog)

[Question] LessOnline saved my life. Now how do I let go of this house?

RedMan4 Jun 2025 18:47 UTC

24 points

7 comments1 min readLW link

Linkpost: Predicting Empirical AI Research Outcomes with Language Models

quetzal_rainbow4 Jun 2025 18:14 UTC

10 points

1 comment1 min readLW link

(arxiv.org)

Self-Coordinated Deception in Current AI Models

Avi Brach-Neufeld4 Jun 2025 17:59 UTC

8 points

5 comments4 min readLW link

To MAIM or Not to MAIM. Introducing MARS: The Nuclear Deterrent case for Hardened Datacenters

kinsman4 Jun 2025 17:56 UTC

1 point

0 comments7 min readLW link

The Belocrat: a servant leader

belos4 Jun 2025 17:25 UTC

1 point

0 comments10 min readLW link

(bestofagreatlot.substack.com)

A list of books which are adjacent to EA

marco moldo4 Jun 2025 12:31 UTC

−1 points

0 comments3 min readLW link

Philosophical Jailbreaks: Demo of LLM Nihilism

Artem Karpov4 Jun 2025 12:03 UTC

3 points

0 comments5 min readLW link

Notes from a mini-replication of the alignment faking paper

Ben_Snodin4 Jun 2025 11:01 UTC

13 points

5 comments9 min readLW link

(www.bensnodin.com)

ARENA 6.0 - Call for Applicants

JamesH, JScriven, David Quarel, CallumMcDougall and James Fox

4 Jun 2025 10:19 UTC

26 points

3 comments6 min readLW link

Quickly Assessing Reward Hacking-like Behavior in LLMs and its Sensitivity to Prompt Variations

AndresCampero4 Jun 2025 7:22 UTC

26 points

1 comment17 min readLW link

Draft: A concise theory of agentic consciousness

Martin Vlach4 Jun 2025 5:00 UTC

2 points

4 comments1 min readLW link

Individual AI representatives don’t solve Gradual Disempowerement

Jan_Kulveit4 Jun 2025 1:26 UTC

62 points

4 comments3 min readLW link

Lectures on AI for high school students (and others)

Radford Neal3 Jun 2025 23:54 UTC

6 points

0 comments1 min readLW link

(radfordneal.wordpress.com)

Does the Taiwan invasion prevent mankind from obtaining the aligned ASI?

StanislavKrym3 Jun 2025 23:35 UTC

−14 points

1 comment5 min readLW link