9 Aug 2025 23:10 UTC

48 points

0 comments7 min readLW link

Output and CoE Monitoring of Customer Service Representatives Shows Default Alignment

Brendan Long9 Aug 2025 21:31 UTC

21 points

0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC

0 points

3 comments7 min readLW link

(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC

−8 points

2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC

10 points

0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC

−6 points

0 comments1 min readLW link

Testing the Authoritarian Bias of LLMs

Zhijing Jin, Irene Strauss, David Guzman Piedrahita and Keenan Samway

9 Aug 2025 18:09 UTC

9 points

1 comment6 min readLW link

Working with AI: Measuring the Occupational Implications of Generative AI

Annapurna9 Aug 2025 16:20 UTC

5 points

0 comments1 min readLW link

(jorgevelez.substack.com)

If worker coops are so productive, why aren’t they everywhere?

B Jacobs9 Aug 2025 14:47 UTC

35 points

19 comments4 min readLW link

(bobjacobs.substack.com)

Steganography via internal activations is already possible in small language models — a potential first step toward persistent hidden reasoning.

Ilia Shirokov and Ilya Nachevsky

9 Aug 2025 11:44 UTC

7 points

0 comments12 min readLW link

Against functionalism: a self dialogue

Algon9 Aug 2025 11:19 UTC

13 points

9 comments1 min readLW link

With the Future of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC

1 point

0 comments10 min readLW link

(impartial-priorities.org)

Poll on De/Accelerating AI

denkenberger9 Aug 2025 7:13 UTC

13 points

38 comments1 min readLW link

[Event] Building What the Future Needs: A curated conference in Berlin (Sep 6, 2025) for high-impact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC

7 points

0 comments2 min readLW link

Memory Decoding Journal Club: The dendritic engram

Devin Ward8 Aug 2025 22:08 UTC

1 point

0 comments1 min readLW link

Making Sense of Consciousness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC

8 points

0 comments5 min readLW link

(sarahconstantin.substack.com)

What would a human pretending to be an AI say?

Brendan Long8 Aug 2025 18:56 UTC

53 points

18 comments1 min readLW link

(www.brendanlong.com)

Will morally motivated actors steer us towards a near-best future?

wdmacaskill8 Aug 2025 18:32 UTC

22 points

0 comments4 min readLW link

How hard to achieve is eutopia?

wdmacaskill8 Aug 2025 16:16 UTC

22 points

0 comments7 min readLW link

OpenAI’s GPT-OSS Is Already Old News

Zvi8 Aug 2025 12:20 UTC

39 points

4 comments18 min readLW link

(thezvi.wordpress.com)

Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitor Performance (Research Note)

Rauno Arike, RohanS and Shubhorup Biswas

8 Aug 2025 10:41 UTC

51 points

7 comments10 min readLW link

The Tortoise and the Language Model (A Fable After Hofstadter)

mwatkins8 Aug 2025 10:39 UTC

54 points

4 comments3 min readLW link

Closed Mouth, Open Oppurtunities

CstineSublime8 Aug 2025 10:32 UTC

6 points

0 comments4 min readLW link

How anticipatory cover-ups go wrong

Kaj_Sotala8 Aug 2025 10:26 UTC

295 points

25 comments6 min readLW link

Strategic Moderation Goals (a Plan B to AI alignment)

Jim Buhler8 Aug 2025 8:08 UTC

2 points

0 comments3 min readLW link

METR’s Evaluation of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC

141 points

15 comments20 min readLW link

(metr.github.io)

ChatGPT is the Daguerreotype of AI

Alex_Altair7 Aug 2025 22:14 UTC

42 points

2 comments7 min readLW link

Principles of AI Uncontrollability

WillPetillo7 Aug 2025 21:10 UTC

1 point

0 comments7 min readLW link

Third-order cognition as a model of superintelligence (ironically: Meta® metacognition)

soycarts7 Aug 2025 20:56 UTC

2 points

5 comments13 min readLW link

Yes, Rationalism is a Cult

James Camacho7 Aug 2025 20:43 UTC

−14 points

23 comments4 min readLW link

GPT-5 is out

david reinstein7 Aug 2025 20:33 UTC

4 points

0 comments1 min readLW link

(openai.com)

OpenAI Releases GPT-5

anaguma7 Aug 2025 18:41 UTC

18 points

0 comments1 min readLW link

(openai.com)

Balancing exploration and resistance to memetic threats after AGI

Eric Neyman7 Aug 2025 18:03 UTC

26 points

5 comments5 min readLW link

state of the machine

thiccythot7 Aug 2025 17:50 UTC

21 points

5 comments6 min readLW link

Chronicles of the Gentle Singularity: A Short Story

Ihor Kendiukhov7 Aug 2025 13:50 UTC

21 points

0 comments4 min readLW link

AI #128: Four Hours Until Probably Not The Apocalypse

Zvi7 Aug 2025 13:00 UTC

34 points

5 comments65 min readLW link

(thezvi.wordpress.com)

No One is Really Working

Annapurna7 Aug 2025 11:19 UTC

5 points

9 comments1 min readLW link

(www.humaninvariant.com)

[Question] Anthropic Is Going All In On Ability Without Intelligence?

Chapin Lenthall-Cleary7 Aug 2025 5:54 UTC

2 points

0 comments2 min readLW link

Civil Service: a Victim or a Villain?

Martin Sustrik7 Aug 2025 5:50 UTC

67 points

27 comments4 min readLW link

(www.250bpm.com)

AXRP Episode 46 - Tom Davidson on AI-enabled Coups

DanielFilan7 Aug 2025 5:10 UTC

11 points

0 comments68 min readLW link

A Cheeky Pint with Anthropic CEO Dario Amodei

WilliamKiely7 Aug 2025 3:21 UTC

10 points

3 comments1 min readLW link

Reproducing Absolute Zero

Lucy Wingard7 Aug 2025 3:01 UTC

5 points

1 comment4 min readLW link

Interview with Kelsey Piper on Self-Censorship and the Vibe Shift

Zack_M_Davis7 Aug 2025 2:51 UTC

57 points

1 comment15 min readLW link

(unremediatedgender.space)

Forbes: Fear Of Super Intelligent AI Is Driving Harvard And MIT Students To Drop Out

Nikola Jurkovic7 Aug 2025 2:02 UTC

19 points

0 comments1 min readLW link

(www.forbes.com)

Open weights != Open source

martinkunev7 Aug 2025 1:04 UTC

0 points

8 comments3 min readLW link

No, Rationalism Is Not a Cult

Liam Robins7 Aug 2025 0:39 UTC

22 points

18 comments10 min readLW link

(thelimestack.substack.com)

Critiquing the Dunning-Kruger Effect

Jennifer Young7 Aug 2025 0:36 UTC

0 points

0 comments1 min readLW link

Re: recent Anthropic safety research

Eliezer Yudkowsky6 Aug 2025 22:52 UTC

145 points

22 comments5 min readLW link

(x.com)

It’s Owl in the Numbers: Token Entanglement in Subliminal Learning

Alex Loftus, Amir Zur, Kerem Şahin, zfying and Hadas Orgad

6 Aug 2025 22:18 UTC

38 points

7 comments4 min readLW link

[Question] Inscrutability was always inevitable, right?

Steven Byrnes6 Aug 2025 21:57 UTC

99 points

33 comments2 min readLW link