RSS

Daniel Kokotajlo

Karma: 27,014

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Now executive director of the AI Futures Project. I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://​​sl4.org/​​crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/​t Scott Alexander)



Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

METR’s Ob­ser­va­tions of Re­ward Hack­ing in Re­cent Fron­tier Models

Daniel KokotajloJun 9, 2025, 6:03 PM
97 points
9 comments11 min readLW link
(metr.org)

Train­ing AGI in Se­cret would be Un­safe and Unethical

Daniel KokotajloApr 18, 2025, 12:27 PM
139 points
15 comments6 min readLW link

AI 2027: What Su­per­in­tel­li­gence Looks Like

Apr 3, 2025, 4:23 PM
657 points
220 comments41 min readLW link
(ai-2027.com)

OpenAI: De­tect­ing mis­be­hav­ior in fron­tier rea­son­ing models

Daniel KokotajloMar 11, 2025, 2:17 AM
183 points
26 comments4 min readLW link
(openai.com)

What goals will AIs have? A list of hypotheses

Daniel KokotajloMar 3, 2025, 8:08 PM
87 points
19 comments18 min readLW link

Ex­tended anal­ogy be­tween hu­mans, cor­po­ra­tions, and AIs.

Daniel KokotajloFeb 13, 2025, 12:03 AM
36 points
2 comments6 min readLW link

Why Don’t We Just… Shog­goth+Face+Para­phraser?

Nov 19, 2024, 8:53 PM
149 points
58 comments14 min readLW link

Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel KokotajloFeb 17, 2024, 1:47 AM
65 points
2 comments11 min readLW link

AI Timelines

Nov 10, 2023, 5:28 AM
300 points
136 comments51 min readLW link2 reviews

Linkpost for Jan Leike on Self-Exfiltration

Daniel KokotajloSep 13, 2023, 9:23 PM
59 points
1 comment2 min readLW link
(aligned.substack.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

Sep 4, 2023, 12:54 PM
109 points
16 comments5 min readLW link
(arxiv.org)

AGI is eas­ier than robotaxis

Daniel KokotajloAug 13, 2023, 5:00 PM
41 points
30 comments4 min readLW link

Pul­ling the Rope Side­ways: Em­piri­cal Test Results

Daniel KokotajloJul 27, 2023, 10:18 PM
61 points
18 comments1 min readLW link

[Question] What money-pumps ex­ist, if any, for de­on­tol­o­gists?

Daniel KokotajloJun 28, 2023, 7:08 PM
39 points
35 comments1 min readLW link

The Treach­er­ous Turn is finished! (AI-takeover-themed table­top RPG)

Daniel KokotajloMay 22, 2023, 5:49 AM
55 points
5 comments2 min readLW link
(thetreacherousturn.ai)

My ver­sion of Si­mu­lacra Levels

Daniel KokotajloApr 26, 2023, 3:50 PM
42 points
15 comments3 min readLW link

Kal­lipo­lis, USA

Daniel KokotajloApr 1, 2023, 2:06 AM
13 points
1 comment1 min readLW link
(docs.google.com)

Rus­sell Con­ju­ga­tions list & vot­ing thread

Daniel KokotajloFeb 20, 2023, 6:39 AM
23 points
63 comments1 min readLW link

Im­por­tant fact about how peo­ple eval­u­ate sets of arguments

Daniel KokotajloFeb 14, 2023, 5:27 AM
33 points
11 comments2 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel KokotajloNov 30, 2022, 7:16 AM
53 points
5 comments1 min readLW link