RSS

Daniel Kokotajlo

Karma: 20,877

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I’ll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://​​sl4.org/​​crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/​t Scott Alexander)



Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC
63 points
2 comments11 min readLW link

AI Timelines

10 Nov 2023 5:28 UTC
279 points
94 comments51 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC
59 points
1 comment2 min readLW link
(aligned.substack.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
107 points
16 comments5 min readLW link
(arxiv.org)

AGI is eas­ier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC
41 points
30 comments4 min readLW link

Pul­ling the Rope Side­ways: Em­piri­cal Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC
61 points
18 comments1 min readLW link

[Question] What money-pumps ex­ist, if any, for de­on­tol­o­gists?

Daniel Kokotajlo28 Jun 2023 19:08 UTC
39 points
35 comments1 min readLW link

The Treach­er­ous Turn is finished! (AI-takeover-themed table­top RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC
55 points
5 comments2 min readLW link
(thetreacherousturn.ai)

My ver­sion of Si­mu­lacra Levels

Daniel Kokotajlo26 Apr 2023 15:50 UTC
41 points
14 comments3 min readLW link

Kal­lipo­lis, USA

Daniel Kokotajlo1 Apr 2023 2:06 UTC
13 points
1 comment1 min readLW link
(docs.google.com)

Rus­sell Con­ju­ga­tions list & vot­ing thread

Daniel Kokotajlo20 Feb 2023 6:39 UTC
22 points
62 comments1 min readLW link

Im­por­tant fact about how peo­ple eval­u­ate sets of arguments

Daniel Kokotajlo14 Feb 2023 5:27 UTC
33 points
11 comments2 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC
53 points
5 comments1 min readLW link

ACT-1: Trans­former for Actions

Daniel Kokotajlo14 Sep 2022 19:09 UTC
52 points
4 comments1 min readLW link
(www.adept.ai)

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC
88 points
4 comments1 min readLW link
(github.blog)

Re­place­ment for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC
58 points
6 comments2 min readLW link

Im­manuel Kant and the De­ci­sion The­ory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC
88 points
12 comments5 min readLW link

Fore­cast­ing Fu­sion Power

Daniel Kokotajlo18 Jun 2022 0:04 UTC
29 points
8 comments1 min readLW link
(astralcodexten.substack.com)

Why agents are powerful

Daniel Kokotajlo6 Jun 2022 1:37 UTC
37 points
7 comments7 min readLW link

[Question] Prob­a­bil­ity that the Pres­i­dent would win elec­tion against a ran­dom adult cit­i­zen?

Daniel Kokotajlo1 Jun 2022 20:38 UTC
15 points
26 comments1 min readLW link