RSS

Daniel Kokotajlo

Karma: 29,489

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://​​sl4.org/​​crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/​t Scott Alexander)



Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Vi­talik’s Re­sponse to AI 2027

Daniel Kokotajlo11 Jul 2025 21:43 UTC
116 points
53 comments12 min readLW link
(vitalik.eth.limo)

My pitch for the AI Village

Daniel Kokotajlo24 Jun 2025 15:00 UTC
177 points
32 comments5 min readLW link

METR’s Ob­ser­va­tions of Re­ward Hack­ing in Re­cent Fron­tier Models

Daniel Kokotajlo9 Jun 2025 18:03 UTC
99 points
9 comments11 min readLW link
(metr.org)

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel Kokotajlo18 Apr 2025 12:27 UTC
139 points
15 comments6 min readLW link

AI 2027: What Su­per­in­tel­li­gence Looks Like

3 Apr 2025 16:23 UTC
661 points
222 comments41 min readLW link
(ai-2027.com)

OpenAI: De­tect­ing mis­be­hav­ior in fron­tier rea­son­ing models

Daniel Kokotajlo11 Mar 2025 2:17 UTC
183 points
26 comments4 min readLW link
(openai.com)

What goals will AIs have? A list of hypotheses

Daniel Kokotajlo3 Mar 2025 20:08 UTC
88 points
19 comments18 min readLW link

Ex­tended anal­ogy be­tween hu­mans, cor­po­ra­tions, and AIs.

Daniel Kokotajlo13 Feb 2025 0:03 UTC
36 points
2 comments6 min readLW link

Why Don’t We Just… Shog­goth+Face+Para­phraser?

19 Nov 2024 20:53 UTC
158 points
58 comments14 min readLW link

Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC
66 points
2 comments11 min readLW link

AI Timelines

10 Nov 2023 5:28 UTC
300 points
136 comments51 min readLW link2 reviews

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC
59 points
1 comment2 min readLW link
(aligned.substack.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
109 points
17 comments5 min readLW link
(arxiv.org)

AGI is eas­ier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC
41 points
30 comments4 min readLW link

Pul­ling the Rope Side­ways: Em­piri­cal Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC
61 points
18 comments1 min readLW link

[Question] What money-pumps ex­ist, if any, for de­on­tol­o­gists?

Daniel Kokotajlo28 Jun 2023 19:08 UTC
39 points
35 comments1 min readLW link

The Treach­er­ous Turn is finished! (AI-takeover-themed table­top RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC
55 points
5 comments2 min readLW link
(thetreacherousturn.ai)

My ver­sion of Si­mu­lacra Levels

Daniel Kokotajlo26 Apr 2023 15:50 UTC
42 points
15 comments3 min readLW link

Kal­lipo­lis, USA

Daniel Kokotajlo1 Apr 2023 2:06 UTC
13 points
1 comment1 min readLW link
(docs.google.com)

Rus­sell Con­ju­ga­tions list & vot­ing thread

Daniel Kokotajlo20 Feb 2023 6:39 UTC
23 points
64 comments1 min readLW link