30 Aug 2025 23:13 UTC

38 points

3 comments2 min readLW link

(fulcrumresearch.ai)

ACX Everywhere fall 2025 - Newton, MA

duck_master30 Aug 2025 22:02 UTC

1 point

1 comment1 min readLW link

[via bsky, found paper] “AI Consciousness: A Centrist Manifesto”

the gears to ascension30 Aug 2025 21:05 UTC

13 points

0 comments1 min readLW link

(philpapers.org)

Female sexual attractiveness seems more egalitarian than people acknowledge

lc30 Aug 2025 18:09 UTC

55 points

28 comments3 min readLW link

AI Sleeper Agents: How Anthropic Trains and Catches Them—Video

Writer30 Aug 2025 17:53 UTC

9 points

0 comments7 min readLW link

(youtu.be)

Understanding LLMs: Insights from Mechanistic Interpretability

Stephen McAleese30 Aug 2025 16:50 UTC

45 points

2 comments30 min readLW link

Legal Personhood—The First Amendment (Part 1)

Stephen Martin30 Aug 2025 13:20 UTC

4 points

0 comments3 min readLW link

Method Iteration: An LLM Prompting Technique

Davey Morse30 Aug 2025 0:08 UTC

−12 points

1 comment2 min readLW link

[Question] How to bet on myself? From expectations to robust goals

P. João29 Aug 2025 18:33 UTC

4 points

1 comment1 min readLW link

AI Security London Hackathon

Prince Kumar29 Aug 2025 18:23 UTC

4 points

0 comments1 min readLW link

Summary of our Workshop on Post-AGI Outcomes

David Duvenaud, Raymond Douglas, Nora_Ammann and Jan_Kulveit

29 Aug 2025 17:14 UTC

109 points

3 comments3 min readLW link

Wikipedia, but written by AIs

Viliam29 Aug 2025 16:37 UTC

32 points

10 comments4 min readLW link

60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge

Joseph Miller29 Aug 2025 16:09 UTC

51 points

1 comment1 min readLW link

(time.com)

AI #131 Part 2: Various Misaligned Things

Zvi29 Aug 2025 15:00 UTC

34 points

7 comments41 min readLW link

(thezvi.wordpress.com)

The Gabian History of Mathematics

remember and Gabriel Alfour

29 Aug 2025 13:48 UTC

27 points

9 comments2 min readLW link

(cognition.cafe)

Qualified rights for AI agents

Gauraventh29 Aug 2025 12:42 UTC

4 points

1 comment5 min readLW link

(robertandgaurav.substack.com)

I am trying to write the history of transhumanism-related communities

Ihor Kendiukhov29 Aug 2025 11:37 UTC

14 points

4 comments1 min readLW link

Claude Plays… Whatever it Wants

Adam B29 Aug 2025 10:57 UTC

37 points

4 comments7 min readLW link

Not stepping on bugs

Gauraventh29 Aug 2025 10:08 UTC

3 points

6 comments2 min readLW link

(y1d2.com)

Defensiveness does not equal guilt

Kaj_Sotala29 Aug 2025 6:14 UTC

61 points

16 comments3 min readLW link

Truth

Kabir Kumar28 Aug 2025 20:53 UTC

6 points

0 comments2 min readLW link

(kkumar97.blogspot.com)

Here’s 18 Applications of Deception Probes

Cleo Nardo, Avi Parrack and jordine

28 Aug 2025 18:59 UTC

45 points

0 comments22 min readLW link

LW@Dragoncon Meetup

Error28 Aug 2025 18:40 UTC

7 points

0 comments1 min readLW link

AI #131 Part 1: Gemini 2.5 Flash Image is Cool

Zvi28 Aug 2025 16:20 UTC

39 points

4 comments30 min readLW link

(thezvi.wordpress.com)

Von Neumann’s Fallacy and You

incident-recipient28 Aug 2025 15:52 UTC

94 points

29 comments4 min readLW link

AI misbehaviour in the wild from Andon Labs’ Safety Report

Lukas Petersson28 Aug 2025 15:10 UTC

39 points

0 comments1 min readLW link

(andonlabs.com)

The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC

3 points

0 comments5 min readLW link

We should think about the pivotal act again. Here’s a better version of it.

otto.barten28 Aug 2025 9:29 UTC

11 points

2 comments3 min readLW link

Elaborative reading

DirectedEvolution28 Aug 2025 8:55 UTC

20 points

0 comments9 min readLW link

Profanity causes emergent misalignment, but with qualitatively different results than insecure code

megasilverfist28 Aug 2025 8:22 UTC

25 points

2 comments8 min readLW link

Using Psycholinguistic Signals to Improve AI Safety

Jkreindler27 Aug 2025 22:30 UTC

−2 points

0 comments4 min readLW link

Transition and Social Dynamics of a post-coordination world

Lessbroken27 Aug 2025 22:23 UTC

1 point

0 comments7 min readLW link

Technical AI Safety research taxonomy attempt (2025)

Benjamin Plaut27 Aug 2025 22:17 UTC

2 points

0 comments2 min readLW link

The Future of AI Agents

kavya27 Aug 2025 21:58 UTC

6 points

8 comments5 min readLW link

Against “Model Welfare” in 2025

Haley Moller27 Aug 2025 21:56 UTC

−10 points

8 comments4 min readLW link

Are They Starting To Take Our Jobs?

Zvi27 Aug 2025 18:50 UTC

44 points

6 comments5 min readLW link

(thezvi.wordpress.com)

Will Any Crap Cause Emergent Misalignment?

J Bostock27 Aug 2025 18:20 UTC

197 points

38 comments3 min readLW link

Open Global Investment as a Governance Model for AGI

Nick Bostrom27 Aug 2025 17:42 UTC

153 points

48 comments39 min readLW link

(nickbostrom.com)

Uncertain Updates August 2025

Gordon Seidoh Worley27 Aug 2025 17:31 UTC

11 points

1 comment2 min readLW link

(uncertainupdates.substack.com)

Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)

ryan_greenblatt27 Aug 2025 17:04 UTC

99 points

2 comments3 min readLW link

[Anthropic] A hacker used Claude Code to automate ransomware

bohaska27 Aug 2025 14:57 UTC

86 points

25 comments3 min readLW link

(www.anthropic.com)

AI companies have started saying safeguards are load-bearing

Zach Stein-Perlman27 Aug 2025 13:00 UTC

52 points

2 comments5 min readLW link

Would you sell your soul to save it? ( I am NOT a Christian)

AdamLacerdo27 Aug 2025 11:05 UTC

−21 points

8 comments4 min readLW link

Legal Personhood—The Fifth Amendment (Part 2)

Stephen Martin27 Aug 2025 9:03 UTC

5 points

2 comments4 min readLW link

Contra Yudkowsky’s Ideal Bayesian

vae27 Aug 2025 5:43 UTC

51 points

17 comments13 min readLW link

Calibrating an Ultrasonic Humidifier for Glycol Vapors

jefftk27 Aug 2025 1:40 UTC

11 points

2 comments1 min readLW link

(www.jefftk.com)

Misgeneralization of Fictional Training Data as a Contributor to Misalignment

Mark Keavney27 Aug 2025 1:01 UTC

15 points

1 comment2 min readLW link

Driven to Extinction: The Terminal Logic of Superintelligence

axiomatic27 Aug 2025 0:25 UTC

1 point

0 comments21 min readLW link

[Question] How are you approaching cognitive security as AI becomes more capable?

james oofou26 Aug 2025 20:52 UTC

12 points

1 comment1 min readLW link

AI Induced Psychosis: A shallow investigation

Tim Hua26 Aug 2025 20:03 UTC

370 points

46 comments26 min readLW link