Daniel Kokotajlo

Karma: 20,877

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I’ll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don't know, that's why we're showing you." White Hat: "Well, let me know if that happens!" Megan: "Based on this conversation, it already has."

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus 5:49 PM Nov 8, 2021. Twitter Web App

Self-Awareness: Taxonomy and eval suite proposal

Daniel Kokotajlo17 Feb 2024 1:47 UTC

63 points

2 comments11 min readLW link

AI Timelines

habryka, Daniel Kokotajlo, Ajeya Cotra and Ege Erdil

10 Nov 2023 5:28 UTC

279 points

94 comments51 min readLW link

Linkpost for Jan Leike on Self-Exfiltration

Daniel Kokotajlo13 Sep 2023 21:23 UTC

59 points

1 comment2 min readLW link

(aligned.substack.com)

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, lberglund, Asa Cooper Stickland, Meg and Maximilian Kaufmann

4 Sep 2023 12:54 UTC

107 points

16 comments5 min readLW link

(arxiv.org)

AGI is easier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC

41 points

30 comments4 min readLW link

Pulling the Rope Sideways: Empirical Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC

61 points

18 comments1 min readLW link

[Question] What money-pumps exist, if any, for deontologists?

Daniel Kokotajlo28 Jun 2023 19:08 UTC

39 points

35 comments1 min readLW link

The Treacherous Turn is finished! (AI-takeover-themed tabletop RPG)

Daniel Kokotajlo22 May 2023 5:49 UTC

55 points

5 comments2 min readLW link

(thetreacherousturn.ai)

My version of Simulacra Levels

Daniel Kokotajlo26 Apr 2023 15:50 UTC

41 points

14 comments3 min readLW link

Kallipolis, USA

Daniel Kokotajlo1 Apr 2023 2:06 UTC

13 points

1 comment1 min readLW link

(docs.google.com)

Russell Conjugations list & voting thread

Daniel Kokotajlo20 Feb 2023 6:39 UTC

22 points

62 comments1 min readLW link

Important fact about how people evaluate sets of arguments

Daniel Kokotajlo14 Feb 2023 5:27 UTC

33 points

11 comments2 min readLW link

AI takeover tabletop RPG: “The Treacherous Turn”

Daniel Kokotajlo30 Nov 2022 7:16 UTC

53 points

5 comments1 min readLW link

ACT-1: Transformer for Actions

Daniel Kokotajlo14 Sep 2022 19:09 UTC

52 points

4 comments1 min readLW link

(www.adept.ai)

Linkpost: Github Copilot productivity experiment

Daniel Kokotajlo8 Sep 2022 4:41 UTC

88 points

4 comments1 min readLW link

(github.blog)

Replacement for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC

58 points

6 comments2 min readLW link

Immanuel Kant and the Decision Theory App Store

Daniel Kokotajlo10 Jul 2022 16:04 UTC

88 points

12 comments5 min readLW link

Forecasting Fusion Power

Daniel Kokotajlo18 Jun 2022 0:04 UTC

29 points

8 comments1 min readLW link

(astralcodexten.substack.com)

Why agents are powerful

Daniel Kokotajlo6 Jun 2022 1:37 UTC

37 points

7 comments7 min readLW link

[Question] Probability that the President would win election against a random adult citizen?

Daniel Kokotajlo1 Jun 2022 20:38 UTC

15 points

26 comments1 min readLW link