All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

How to help friend who needs to get better at planning?

shuffled-cantaloupe9 Jun 2025 23:28 UTC

12 points

4 comments1 min readLW link

Personal Agents: AIs as trusted advisors, caretakers, and user proxies

JWJohnston9 Jun 2025 21:26 UTC

2 points

0 comments2 min readLW link

Causation, Correlation, and Confounding: A Graphical Explainer

Tim Hua9 Jun 2025 20:46 UTC

12 points

2 comments9 min readLW link

When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.

ryan_greenblatt9 Jun 2025 19:19 UTC

63 points

11 comments9 min readLW link

METR’s Observations of Reward Hacking in Recent Frontier Models

Daniel Kokotajlo9 Jun 2025 18:03 UTC

100 points

9 comments11 min readLW link

(metr.org)

Expectation = intention = setpoint

jimmy9 Jun 2025 17:33 UTC

35 points

16 comments13 min readLW link

Identifying “Deception Vectors” In Models

Stephen Martin9 Jun 2025 17:30 UTC

12 points

0 comments1 min readLW link

(arxiv.org)

Policy Design: Ideas into Proposals

belos9 Jun 2025 17:26 UTC

2 points

0 comments7 min readLW link

(bestofagreatlot.substack.com)

Reflections on anthropic principle

Crazy philosopher9 Jun 2025 16:51 UTC

−5 points

13 comments1 min readLW link

Outer Alignment is the Necessary Compliment to AI 2027′s Best Case Scenario

Josh Hickman9 Jun 2025 15:43 UTC

4 points

2 comments2 min readLW link

The Unparalleled Awesomeness of Effective Altruism Conferences

Bentham's Bulldog9 Jun 2025 15:32 UTC

5 points

0 comments6 min readLW link

Dwarkesh Patel on Continual Learning

Zvi9 Jun 2025 14:50 UTC

35 points

1 comment20 min readLW link

(thezvi.wordpress.com)

The True Goal Fallacy

adamShimi9 Jun 2025 14:42 UTC

50 points

1 comment7 min readLW link

(formethods.substack.com)

Non-technical strategies for confronting a human-level AI competitor

Jackson Emanuel9 Jun 2025 14:07 UTC

1 point

0 comments4 min readLW link

AI companies’ eval reports mostly don’t support their claims

Zach Stein-Perlman9 Jun 2025 13:00 UTC

209 points

13 comments4 min readLW link

Against asking if AIs are conscious

AlexMennen9 Jun 2025 6:05 UTC

16 points

35 comments5 min readLW link

Beware the Delmore Effect

Lydia Nottingham9 Jun 2025 1:08 UTC

11 points

1 comment1 min readLW link

Busking with Kids

jefftk9 Jun 2025 0:30 UTC

76 points

0 comments1 min readLW link

(www.jefftk.com)

AI in Government: Resilience in an Era of AI Monoculture

prue8 Jun 2025 21:00 UTC

2 points

0 comments8 min readLW link

(www.prue0.com)

Emergence Spirals—what Yudkowsky gets wrong

James Stephen Brown8 Jun 2025 19:02 UTC

29 points

25 comments9 min readLW link

Administering immunotherapy in the morning seems to really, really matter. Why?

Abhishaike Mahajan8 Jun 2025 16:37 UTC

35 points

0 comments10 min readLW link

(www.owlposting.com)

Emergent Misalignment on a Budget

Valerio Pepe and armaan tipirneni

8 Jun 2025 15:28 UTC

55 points

0 comments9 min readLW link

The Decreasing Value of Chain of Thought in Prompting

Matrice Jacobine8 Jun 2025 15:11 UTC

11 points

0 comments1 min readLW link

(papers.ssrn.com)

3. Why impartial altruists should suspend judgment under unawareness

Anthony DiGiovanni8 Jun 2025 15:06 UTC

24 points

0 comments14 min readLW link

Invitation to an IRL retreat on AI x-risks & post-rationality in Ooty, India

bhishma, Aditya and vmehra

8 Jun 2025 13:21 UTC

10 points

2 comments5 min readLW link

Litanies Of The Way

Matthew McRedmond8 Jun 2025 7:32 UTC

7 points

0 comments5 min readLW link

Make Data Pipelines Debuggable by Storing All Source References

Brendan Long8 Jun 2025 4:16 UTC

7 points

0 comments3 min readLW link

(www.brendanlong.com)

Letting Kids Be Outside

jefftk8 Jun 2025 1:30 UTC

51 points

11 comments5 min readLW link

(www.jefftk.com)

LessOnline Could Use Meeting Stones

Brendan Long8 Jun 2025 1:01 UTC

25 points

5 comments1 min readLW link

MRI tracers

bhauth7 Jun 2025 23:03 UTC

28 points

2 comments2 min readLW link

(www.bhauth.com)

Second order taste

Adam Zerner7 Jun 2025 20:26 UTC

8 points

3 comments4 min readLW link

Dimensionalizing Forecast Value

Jordan Rubin7 Jun 2025 18:45 UTC

5 points

0 comments6 min readLW link

On working 80%

adrische7 Jun 2025 17:58 UTC

87 points

7 comments3 min readLW link

(github.com)

Meta Alignment: Communication Guide

Bridgett Kay7 Jun 2025 16:09 UTC

13 points

0 comments5 min readLW link

(dxmrevealed.wordpress.com)

Exploring vocabulary alignment of neurons in Llama-3.2-1B

Sergii7 Jun 2025 11:20 UTC

4 points

0 comments3 min readLW link

(grgv.xyz)

Summer ACX Meetup in Bordeaux

vi21maobk9vp7 Jun 2025 11:08 UTC

5 points

0 comments1 min readLW link

Vulnerability in Trusted Monitoring and Mitigations

Wen Xing and Perusha Moodley

7 Jun 2025 7:16 UTC

17 points

1 comment7 min readLW link

Not maximizing your own happiness is a fallacy

fasf7 Jun 2025 6:16 UTC

−39 points

7 comments1 min readLW link

Agents, Simulators and Interpretability

Sean Herrington, WillPetillo, Spencer Ames, Can Narin and Adebayo Mubarak

7 Jun 2025 6:06 UTC

12 points

0 comments5 min readLW link

Solo Park Play at Three

jefftk7 Jun 2025 3:00 UTC

45 points

2 comments1 min readLW link

(www.jefftk.com)

The Roots of Progress wants your stories about the AI frontier

jasoncrawford6 Jun 2025 22:52 UTC

11 points

0 comments5 min readLW link

(newsletter.rootsofprogress.org)

Unsupervised Activation Steering: Find a steering vector that best represents any set of text data

Danielle Ensign6 Jun 2025 22:37 UTC

3 points

2 comments1 min readLW link

The Mirror Trap

Cameron Berg6 Jun 2025 22:30 UTC

94 points

13 comments4 min readLW link

AXRP Episode 42 - Owain Evans on LLM Psychology

DanielFilan6 Jun 2025 20:20 UTC

13 points

0 comments66 min readLW link

Apply now to Human-Aligned AI Summer School 2025

VojtaKovarik, Tomáš Gavenčiak and Jan_Kulveit

6 Jun 2025 19:31 UTC

28 points

1 comment2 min readLW link

(humanaligned.ai)

The Common Pile and Comma-v0.1

Trevor Hill-Hand6 Jun 2025 19:20 UTC

3 points

0 comments1 min readLW link

Maximal Curiousity is Not Useful

Max Niederman6 Jun 2025 19:08 UTC

11 points

0 comments2 min readLW link

Making deals with AIs: A tournament experiment with a bounty

KFinn and Xodarap

6 Jun 2025 18:51 UTC

24 points

0 comments8 min readLW link

DeepSeek-r1-0528 Did Not Have a Moment

Zvi6 Jun 2025 15:40 UTC

30 points

2 comments15 min readLW link

(thezvi.wordpress.com)

Lessons from a year of university AI safety field building

yix, afterless, Parv Mahajan, Andersehen, Tuna and neverix

6 Jun 2025 14:35 UTC

35 points

3 comments7 min readLW link