David Scott Krueger (formerly: capybaralet)

Karma: 2,551

https://twitter.com/DavidSKrueger
https://www.davidscottkrueger.com/
https://therealartificialintelligence.substack.com/p/the-real-ai-deploys-itself

AI is not inevitable.

David Scott Krueger (formerly: capybaralet)7 Nov 2025 20:31 UTC

29 points

2 comments3 min readLW link

(therealartificialintelligence.substack.com)

My new nonprofit Evitable is hiring.

David Scott Krueger (formerly: capybaralet)7 Nov 2025 3:39 UTC

74 points

4 comments1 min readLW link

Antisocial media: AI’s killer app?

David Scott Krueger (formerly: capybaralet)3 Oct 2025 0:00 UTC

35 points

8 comments5 min readLW link

(therealartificialintelligence.substack.com)

The real AI deploys itself

David Scott Krueger (formerly: capybaralet)25 Sep 2025 14:11 UTC

76 points

8 comments3 min readLW link

(therealartificialintelligence.substack.com)

Announcing “The Real AI”: a blog

David Scott Krueger (formerly: capybaralet)20 Sep 2025 1:27 UTC

33 points

1 comment2 min readLW link

(therealartificialintelligence.substack.com)

Detecting High-Stakes Interactions with Activation Probes

Arrrlex, williambankes, Urja Pawar, Phil Blandfort, David Scott Krueger (formerly: capybaralet) and Dmitrii Krasheninnikov

21 Jul 2025 18:21 UTC

50 points

0 comments4 min readLW link

Upcoming workshop on Post-AGI Civilizational Equilibria

David Duvenaud, Jan_Kulveit, Raymond Douglas, Nora_Ammann and David Scott Krueger (formerly: capybaralet)

21 Jun 2025 15:57 UTC

25 points

0 comments1 min readLW link

A review of “Why Did Environmentalism Become Partisan?”

David Scott Krueger (formerly: capybaralet)25 Apr 2025 5:12 UTC

24 points

0 comments4 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond Douglas, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet) and David Duvenaud

30 Jan 2025 17:03 UTC

167 points

66 comments2 min readLW link

(gradual-disempowerment.ai)

A Sober Look at Steering Vectors for LLMs

Joschka Braun, Dmitrii Krasheninnikov, Usman Anwar, RobertKirk, Daniel Tan and David Scott Krueger (formerly: capybaralet)

23 Nov 2024 17:30 UTC

40 points

0 comments5 min readLW link

[Question] Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

David Scott Krueger (formerly: capybaralet)4 Sep 2024 12:40 UTC

20 points

7 comments1 min readLW link

An ML paper on data stealing provides a construction for “gradient hacking”

David Scott Krueger (formerly: capybaralet)30 Jul 2024 21:44 UTC

21 points

1 comment1 min readLW link

(arxiv.org)

[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”

David Scott Krueger (formerly: capybaralet)6 Jun 2024 18:55 UTC

70 points

2 comments6 min readLW link

(llm-safety-challenges.github.io)

Testing for consequence-blindness in LLMs using the HI-ADS unit test.

David Scott Krueger (formerly: capybaralet)24 Nov 2023 23:35 UTC

25 points

2 comments2 min readLW link

“Publish or Perish” (a quick note on why you should try to make your work legible to existing academic communities)

David Scott Krueger (formerly: capybaralet)18 Mar 2023 19:01 UTC

112 points

49 comments1 min readLW link 1 review

[Question] What organizations other than Conjecture have (esp. public) info-hazard policies?

David Scott Krueger (formerly: capybaralet)16 Mar 2023 14:49 UTC

20 points

1 comment1 min readLW link

A (EtA: quick) note on terminology: AI Alignment != AI x-safety

David Scott Krueger (formerly: capybaralet)8 Feb 2023 22:33 UTC

46 points

20 comments1 min readLW link

Why I hate the “accident vs. misuse” AI x-risk dichotomy (quick thoughts on “structural risk”)

David Scott Krueger (formerly: capybaralet)30 Jan 2023 18:50 UTC

34 points

41 comments2 min readLW link

Quick thoughts on “scalable oversight” / “super-human feedback” research

David Scott Krueger (formerly: capybaralet)25 Jan 2023 12:55 UTC

27 points

9 comments2 min readLW link

Mechanistic Interpretability as Reverse Engineering (follow-up to “cars and elephants”)

David Scott Krueger (formerly: capybaralet)3 Nov 2022 23:19 UTC

28 points

3 comments1 min readLW link