RSS

Daniel Kokotajlo

Karma: 18,882

Philosophy PhD student, worked at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I’ll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://​​sl4.org/​​crocker.html

Some of my favorite memes:


(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/​t Scott Alexander)


Self-Aware­ness: Tax­on­omy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC
61 points
0 comments11 min readLW link

AI Timelines

10 Nov 2023 5:28 UTC
264 points
74 comments51 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC
58 points
1 comment2 min readLW link
(aligned.substack.com)

Paper: On mea­sur­ing situ­a­tional aware­ness in LLMs

4 Sep 2023 12:54 UTC
106 points
16 comments5 min readLW link
(arxiv.org)

AGI is eas­ier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC
39 points
30 comments4 min readLW link

Pul­ling the Rope Side­ways: Em­piri­cal Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC
61 points
18 comments1 min readLW link

[Question] What money-pumps ex­ist, if any, for de­on­tol­o­gists?

Daniel Kokotajlo28 Jun 2023 19:08 UTC
39 points
35 comments1 min readLW link

The Treach­er­ous Turn is finished! (AI-takeover-themed table­top RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC
55 points
5 comments2 min readLW link
(thetreacherousturn.ai)

My ver­sion of Si­mu­lacra Levels

Daniel Kokotajlo26 Apr 2023 15:50 UTC
41 points
14 comments3 min readLW link

Kal­lipo­lis, USA

Daniel Kokotajlo1 Apr 2023 2:06 UTC
13 points
1 comment1 min readLW link
(docs.google.com)

Rus­sell Con­ju­ga­tions list & vot­ing thread

Daniel Kokotajlo20 Feb 2023 6:39 UTC
22 points
62 comments1 min readLW link

Im­por­tant fact about how peo­ple eval­u­ate sets of arguments

Daniel Kokotajlo14 Feb 2023 5:27 UTC
33 points
11 comments2 min readLW link

AI takeover table­top RPG: “The Treach­er­ous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC
53 points
5 comments1 min readLW link

ACT-1: Trans­former for Actions

Daniel Kokotajlo14 Sep 2022 19:09 UTC
52 points
4 comments1 min readLW link
(www.adept.ai)

Linkpost: Github Copi­lot pro­duc­tivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC
88 points
4 comments1 min readLW link
(github.blog)

Re­place­ment for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC
58 points
6 comments2 min readLW link

Im­manuel Kant and the De­ci­sion The­ory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC
88 points
12 comments5 min readLW link

Fore­cast­ing Fu­sion Power

Daniel Kokotajlo18 Jun 2022 0:04 UTC
29 points
8 comments1 min readLW link
(astralcodexten.substack.com)

Why agents are powerful

Daniel Kokotajlo6 Jun 2022 1:37 UTC
37 points
7 comments7 min readLW link

[Question] Prob­a­bil­ity that the Pres­i­dent would win elec­tion against a ran­dom adult cit­i­zen?

Daniel Kokotajlo1 Jun 2022 20:38 UTC
15 points
26 comments1 min readLW link