All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May Jun JulAugSep Oct

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930 31

Truth

Kabir Kumar28 Aug 2025 20:53 UTC

6 points

0 comments2 min readLW link

(kkumar97.blogspot.com)

Here’s 18 Applications of Deception Probes

Cleo Nardo, Avi Parrack and jordine

28 Aug 2025 18:59 UTC

38 points

0 comments22 min readLW link

LW@Dragoncon Meetup

Error28 Aug 2025 18:40 UTC

7 points

0 comments1 min readLW link

If we can educate AIs, why not apply that education to people? - A Simulation with Claude

P. João28 Aug 2025 16:37 UTC

3 points

0 comments7 min readLW link

AI #131 Part 1: Gemini 2.5 Flash Image is Cool

Zvi28 Aug 2025 16:20 UTC

39 points

4 comments30 min readLW link

(thezvi.wordpress.com)

Von Neumann’s Fallacy and You

incident-recipient28 Aug 2025 15:52 UTC

98 points

29 comments4 min readLW link

AI misbehaviour in the wild from Andon Labs’ Safety Report

Lukas Petersson28 Aug 2025 15:10 UTC

39 points

0 comments1 min readLW link

(andonlabs.com)

The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC

3 points

0 comments5 min readLW link

We should think about the pivotal act again. Here’s a better version of it.

otto.barten28 Aug 2025 9:29 UTC

11 points

2 comments3 min readLW link

Elaborative reading

DirectedEvolution28 Aug 2025 8:55 UTC

20 points

0 comments9 min readLW link

Profanity causes emergent misalignment, but with qualitatively different results than insecure code

megasilverfist28 Aug 2025 8:22 UTC

21 points

2 comments8 min readLW link

Using Psycholinguistic Signals to Improve AI Safety

Jkreindler27 Aug 2025 22:30 UTC

−2 points

0 comments4 min readLW link

Transition and Social Dynamics of a post-coordination world

Lessbroken27 Aug 2025 22:23 UTC

1 point

0 comments7 min readLW link

Technical AI Safety research taxonomy attempt (2025)

Benjamin Plaut27 Aug 2025 22:17 UTC

2 points

0 comments2 min readLW link

The Future of AI Agents

kavya27 Aug 2025 21:58 UTC

6 points

8 comments5 min readLW link

Against “Model Welfare” in 2025

Haley Moller27 Aug 2025 21:56 UTC

−10 points

8 comments4 min readLW link

Are They Starting To Take Our Jobs?

Zvi27 Aug 2025 18:50 UTC

44 points

6 comments5 min readLW link

(thezvi.wordpress.com)

Will Any Crap Cause Emergent Misalignment?

J Bostock27 Aug 2025 18:20 UTC

192 points

37 comments3 min readLW link

Open Global Investment as a Governance Model for AGI

Nick Bostrom27 Aug 2025 17:42 UTC

152 points

47 comments39 min readLW link

(nickbostrom.com)

Uncertain Updates August 2025

Gordon Seidoh Worley27 Aug 2025 17:31 UTC

11 points

1 comment2 min readLW link

(uncertainupdates.substack.com)

Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)

ryan_greenblatt27 Aug 2025 17:04 UTC

99 points

2 comments3 min readLW link

[Anthropic] A hacker used Claude Code to automate ransomware

bohaska27 Aug 2025 14:57 UTC

86 points

25 comments3 min readLW link

(www.anthropic.com)

AI companies have started saying safeguards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC

52 points

2 comments5 min readLW link

Would you sell your soul to save it? ( I am NOT a Christian)

AdamLacerdo27 Aug 2025 11:05 UTC

−21 points

8 comments4 min readLW link

Legal Personhood—The Fifth Amendment (Part 2)

Stephen Martin27 Aug 2025 9:03 UTC

5 points

2 comments4 min readLW link

Contra Yudkowsky’s Ideal Bayesian

vae27 Aug 2025 5:43 UTC

51 points

17 comments13 min readLW link

Calibrating an Ultrasonic Humidifier for Glycol Vapors

jefftk27 Aug 2025 1:40 UTC

11 points

2 comments1 min readLW link

(www.jefftk.com)

Misgeneralization of Fictional Training Data as a Contributor to Misalignment

Mark Keavney27 Aug 2025 1:01 UTC

9 points

1 comment2 min readLW link

[Question] How are you approaching cognitive security as AI becomes more capable?

james oofou26 Aug 2025 20:52 UTC

11 points

1 comment1 min readLW link

AI Induced Psychosis: A shallow investigation

Tim Hua26 Aug 2025 20:03 UTC

359 points

43 comments26 min readLW link

Aesthetic Preferences Can Cause Emergent Misalignment

Anders Woodruff26 Aug 2025 18:41 UTC

90 points

16 comments3 min readLW link

ACX Fall Meetup

Adriana L26 Aug 2025 18:34 UTC

1 point

0 comments1 min readLW link

Harmless reward hacks can generalize to misalignment in LLMs

Mia Taylor and Owain_Evans

26 Aug 2025 17:32 UTC

46 points

7 comments7 min readLW link

Do-Divergence: A Bound for Maxwell’s Demon

johnswentworth and David Lorell

26 Aug 2025 17:07 UTC

66 points

4 comments3 min readLW link

Reports Of AI Not Progressing Or Offering Mundane Utility Are Often Greatly Exaggerated

Zvi26 Aug 2025 14:00 UTC

42 points

1 comment16 min readLW link

(thezvi.wordpress.com)

Gamblification

Aprillion26 Aug 2025 11:48 UTC

23 points

15 comments2 min readLW link

A speculation on enlightenment

Richard_Kennaway26 Aug 2025 11:23 UTC

16 points

17 comments2 min readLW link

The “Sparsity vs Reconstruction Tradeoff” Illusion

chanind and Adrià Garriga-alonso

26 Aug 2025 4:39 UTC

13 points

0 comments4 min readLW link

Legal Personhood—The Fifth Amendment (Part 1)

Stephen Martin26 Aug 2025 4:05 UTC

4 points

0 comments5 min readLW link

LLMs Are Trained to Assume Their Output Is Perfect

Brendan Long26 Aug 2025 0:24 UTC

10 points

0 comments5 min readLW link

New Paper on Reflective Oracles & Grain of Truth Problem

Cole Wyeth26 Aug 2025 0:18 UTC

53 points

0 comments1 min readLW link

Hidden Reasoning in LLMs: A Taxonomy

Rauno Arike, RohanS and Shubhorup Biswas

25 Aug 2025 22:43 UTC

65 points

10 comments12 min readLW link

The NAO is Hiring for Partnerships, Response, Virology, and Wet Lab Management

jefftk25 Aug 2025 22:37 UTC

16 points

0 comments2 min readLW link

(naobservatory.org)

ACX/SSC Meetup

teegs25 Aug 2025 20:21 UTC

1 point

0 comments1 min readLW link

Proactive AI Control: A Case for Battery-Dependent Systems

Jesper Lindholm25 Aug 2025 20:04 UTC

4 points

0 comments13 min readLW link

Solving irrational fear as deciding: A worked example

jimmy25 Aug 2025 19:44 UTC

24 points

4 comments7 min readLW link

Breastfeeding and IQ: Effects shrink as you control for more confounders

Nina Panickssery25 Aug 2025 18:43 UTC

44 points

3 comments1 min readLW link

(blog.ninapanickssery.com)

Quality Precision

Ben25 Aug 2025 17:58 UTC

24 points

13 comments3 min readLW link

Neuroscience of human sexual attraction triggers (3 hypotheses)

Steven Byrnes25 Aug 2025 17:51 UTC

54 points

6 comments12 min readLW link

Before LLM Psychosis, There Was Yes-Man Psychosis

johnswentworth25 Aug 2025 17:47 UTC

186 points

20 comments3 min readLW link