All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728 29 30 31

A Non-cynical View of Colleges and Power

robotelvis26 Jul 2025 22:20 UTC

−1 points

0 comments2 min readLW link

(messyprogress.substack.com)

The Purpose of a System is what it Rewards

robotelvis26 Jul 2025 22:08 UTC

121 points

16 comments2 min readLW link

(messyprogress.substack.com)

Status Traps

robotelvis26 Jul 2025 22:07 UTC

5 points

0 comments1 min readLW link

(messyprogress.substack.com)

AlphaGo Moment for Model Architecture Discovery (arXiv)

Person26 Jul 2025 21:31 UTC

8 points

4 comments1 min readLW link

[Question] Where are the AI safety replications?

Max Niederman26 Jul 2025 21:29 UTC

54 points

5 comments1 min readLW link

Petals

Zander_Drax26 Jul 2025 20:23 UTC

19 points

0 comments6 min readLW link

English writes numbers backwards

TurnTrout25 Jul 2025 23:00 UTC

14 points

23 comments12 min readLW link

(turntrout.com)

a 9-week trip on retatrutide

AnnaJo25 Jul 2025 21:41 UTC

49 points

7 comments10 min readLW link

How I Spent 2024 Living Like the World Was Going to End

fernando yt25 Jul 2025 19:29 UTC

9 points

0 comments2 min readLW link

(fernandoyt.substack.com)

A Bonding Platform for Rational Thinkers – Call for Suggestions and Collaboration

Martin Braquet25 Jul 2025 19:23 UTC

4 points

4 comments22 min readLW link

(martinbraquet.com)

[Question] What are the two contradictory theories of how to evaluate counterfactuals?

Said Achmiz25 Jul 2025 18:43 UTC

29 points

16 comments1 min readLW link

HPMOR: The (Probably) Untold Lore

Gretta Duleba and Eliezer Yudkowsky

25 Jul 2025 18:39 UTC

426 points

162 comments38 min readLW link

Anthropic Faces Potentially “Business-Ending” Copyright Lawsuit

garrison25 Jul 2025 17:01 UTC

57 points

15 comments9 min readLW link

(www.obsolete.pub)

ChatGPT Agent: evals and safeguards

Zach Stein-Perlman25 Jul 2025 16:30 UTC

15 points

0 comments3 min readLW link

Why I Just Took The Giving What We Can Pledge

Bentham's Bulldog25 Jul 2025 16:24 UTC

−28 points

18 comments3 min readLW link

Access to agent CoT makes monitors vulnerable to persuasion

Nikita Ostrovsky, Julija Bainiaksina, Tuna and Vika

25 Jul 2025 16:09 UTC

18 points

0 comments4 min readLW link

Automating AI Safety: What we can do today

Matthew Shinkle, Eyon Jang and jacquesthibs

25 Jul 2025 14:49 UTC

36 points

0 comments8 min readLW link

Introducing SB53.info

MKodama25 Jul 2025 14:48 UTC

9 points

2 comments7 min readLW link

America’s AI Action Plan Is Pretty Good

Zvi25 Jul 2025 12:10 UTC

21 points

13 comments27 min readLW link

(thezvi.wordpress.com)

A website to create bets with strangers

bice25 Jul 2025 11:06 UTC

7 points

1 comment1 min readLW link

PTF 102: Conditionalization and Events

Ape in the coat25 Jul 2025 6:07 UTC

8 points

0 comments8 min readLW link

We Built a Tool to Protect Your Dataset From Simple Scrapers

TurnTrout, Edward Turner, Dipika Khullar and Roy Rinberg

25 Jul 2025 5:44 UTC

60 points

9 comments3 min readLW link

The Leverage Cycle

Annapurna24 Jul 2025 21:02 UTC

17 points

0 comments3 min readLW link

(jorgevelez.substack.com)

Recommendations for future AI growth: from exponential to linear, with economic anchors

Zabor24 Jul 2025 20:11 UTC

7 points

0 comments2 min readLW link

Building and evaluating alignment auditing agents

Sam Marks, trentbrick, RowanWang, Sam Bowman, Euan Ong, Johannes Treutlein and evhub

24 Jul 2025 19:22 UTC

47 points

1 comment5 min readLW link

Fullrank: Bayesian Noisy Sorting

Max Niederman24 Jul 2025 19:03 UTC

20 points

2 comments3 min readLW link

(maxniederman.com)

SenseMaking Summer School 2025, September 17-24th

Finn Clancy24 Jul 2025 18:00 UTC

1 point

0 comments1 min readLW link

The Ideological Spiral

PranavG and Gabriel Alfour

24 Jul 2025 13:00 UTC

11 points

1 comment10 min readLW link

(cognition.cafe)

AI #126: Go Fund Yourself

Zvi24 Jul 2025 13:00 UTC

34 points

3 comments46 min readLW link

(thezvi.wordpress.com)

Superintelligence isn’t Approximated by a Rational Agent

Nicolas Villarreal24 Jul 2025 11:41 UTC

13 points

11 comments12 min readLW link

Taking Abundance Seriously

eeeee24 Jul 2025 9:36 UTC

43 points

17 comments12 min readLW link

Cursory Analysis of LLMs in the US Gov (July 2025)

Gatlen Culp24 Jul 2025 8:52 UTC

8 points

0 comments10 min readLW link

Reflections from Ooty retreat 2.0

Aditya and bhishma

24 Jul 2025 6:48 UTC

16 points

2 comments14 min readLW link

So Shrieked ZAR

AdamLacerdo23 Jul 2025 23:25 UTC

10 points

2 comments8 min readLW link

AI Safety x Physics Grand Challenge

Lauren Greenspan and aribrill

23 Jul 2025 21:41 UTC

37 points

0 comments8 min readLW link

Dear Superintelligence, please check these considerations of your unprecedented Importance

chaosmage23 Jul 2025 20:49 UTC

17 points

0 comments3 min readLW link

The Whole Check

JustisMills23 Jul 2025 19:20 UTC

51 points

13 comments4 min readLW link

(justismills.substack.com)

Women Want Safety, Men Want Respect

Gordon Seidoh Worley23 Jul 2025 19:10 UTC

18 points

31 comments4 min readLW link

(uncertainupdates.substack.com)

Dark Lord’s Answer: Review and Economics Excerpts

Towards_Keeperhood23 Jul 2025 17:45 UTC

16 points

6 comments17 min readLW link

“Behaviorist” RL reward functions lead to scheming

Steven Byrnes23 Jul 2025 16:55 UTC

56 points

6 comments12 min readLW link

Reasoning-Finetuning Repurposes Latent Representations in Base Models

Jake Ward, lccqqqqq and Neel Nanda

23 Jul 2025 16:18 UTC

35 points

1 comment2 min readLW link

(arxiv.org)

Healthy AI relationships as a microcosm

Raymond Douglas23 Jul 2025 15:59 UTC

13 points

0 comments2 min readLW link

Involuntary One Boxers—Why Disposition Doesn’t (Always) Matter

Nickolas Cavagnaro23 Jul 2025 15:45 UTC

4 points

3 comments4 min readLW link

Ten AI safety projects I’d like people to work on

Julian Hazell23 Jul 2025 15:28 UTC

5 points

2 comments10 min readLW link

(thirdthing.ai)

Anti-Superpersuasion Interventions

niplav and Claude+

23 Jul 2025 15:18 UTC

21 points

1 comment5 min readLW link

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

kh4dien, Helena Casademunt, Adam Karvonen, Sam Marks, Senthooran Rajamanoharan and Neel Nanda

23 Jul 2025 14:57 UTC

79 points

8 comments5 min readLW link

Transformers Don’t Need LayerNorm at Inference Time: Implications for Interpretability

submarat, Joachim Schaeffer, Luca Baroni, galvsk and StefanHex

23 Jul 2025 14:55 UTC

31 points

0 comments7 min readLW link

GPT Agent Is Standing By

Zvi23 Jul 2025 14:20 UTC

25 points

1 comment12 min readLW link

(thezvi.wordpress.com)

Agent 002: A story about how artificial intelligence might soon destroy humanity

Jakub Growiec23 Jul 2025 13:56 UTC

5 points

0 comments26 min readLW link

Beyond intelligence: why wisdom matters in AI systems

Chris Cooper23 Jul 2025 11:57 UTC

6 points

0 comments7 min readLW link