Daniel Kokotajlo

Karma: 19,915

Was a philosophy PhD student, left to work at AI Impacts, then Center on Long-Term Risk, then OpenAI. Quit OpenAI due to losing confidence that it would behave responsibly around the time of AGI. Not sure what I’ll do next yet. Views are my own & do not represent those of my current or former employer(s). I subscribe to Crocker’s Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html

Some of my favorite memes:

(by Rob Wiblin)

Comic. Megan & Cueball show White Hat a graph of a line going up, not yet at, but heading towards, a threshold labelled "BAD". White Hat: "So things will be bad?" Megan: "Unless someone stops it." White Hat: "Will someone do that?" Megan: "We don

(xkcd)

My EA Journey, depicted on the whiteboard at CLR:

(h/t Scott Alexander)

Alex Blechman @AlexBlechman Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don

Daniel Kokotajlo 26 Jul 2024 13:24 UTC
2 points
0
in reply to: Richard_Ngo’s comment on: Daniel Kokotajlo’s Shortform
At this point I don’t remember! But I think not, I think it was a comment on one of Carlsmith’s drafts about powerseeking AI and deceptive alignment.

Daniel Kokotajlo 24 Jul 2024 4:26 UTC
12 points
0
in reply to: Raemon’s comment on: jacobjacob’s Shortform Feed
I was talking about the immediate parent, not the previous one. Though as secrecy gets ramped up, the effect described in the previous one might set in as well.

I have personal experience feeling captured by this dynamic, yes, and from conversations with other people i get the impression that it was even stronger for many others.

Hard to say how large of an effect it has. It definitely creates a significant chilling effect on criticism/dissent. (I think people who were employees alongside me while I was there will attest that I was pretty outspoken… yet I often found myself refraining from saying things that seemed true and important, due to not wanting to rock the boat / lose ‘credibility’ etc.

The point about salving the consciences of the majority is interesting and seems true to me as well. I feel like there’s definitely a dynamic of ‘the dissenters make polite reserved versions of their criticisms, and feel good about themselves for fighting the good fight, and the orthodox listen patiently and then find some justification to proceed as planned, feeling good about themselves for hearing out the dissent.’

I don’t know of an easy solution to this problem. Perhaps something to do with regular anonymous surveys? idk.

Daniel Kokotajlo 23 Jul 2024 19:27 UTC
39 points
5
in reply to: Jonas V’s comment on: jacobjacob’s Shortform Feed
Wow, yeah. This is totally going on at OpenAI, and I expect at other AGI corporations also.

Daniel Kokotajlo 23 Jul 2024 19:22 UTC
66 points
29
on: Daniel Kokotajlo’s Shortform
Great quote, & chilling: (h/t Jacobjacob)
The idea of Kissinger seeking out Ellsberg for advice on Vietnam initially seems a bit unlikely, but in 1968 Ellsberg was a highly respected analyst on the war who had worked for both the Pentagon and Rand, and Kissinger was just entering the government for the first time. Here’s what Ellsberg told him. Enjoy:
“Henry, there’s something I would like to tell you, for what it’s worth, something I wish I had been told years ago. You’ve been a consultant for a long time, and you’ve dealt a great deal with top secret information. But you’re about to receive a whole slew of special clearances, maybe fifteen or twenty of them, that are higher than top secret.
“I’ve had a number of these myself, and I’ve known other people who have just acquired them, and I have a pretty good sense of what the effects of receiving these clearances are on a person who didn’t previously know they even existed. And the effects of reading the information that they will make available to you.
“First, you’ll be exhilarated by some of this new information, and by having it all — so much! incredible! — suddenly available to you. But second, almost as fast, you will feel like a fool for having studied, written, talked about these subjects, criticized and analyzed decisions made by presidents for years without having known of the existence of all this information, which presidents and others had and you didn’t, and which must have influenced their decisions in ways you couldn’t even guess. In particular, you’ll feel foolish for having literally rubbed shoulders for over a decade with some officials and consultants who did have access to all this information you didn’t know about and didn’t know they had, and you’ll be stunned that they kept that secret from you so well.
“You will feel like a fool, and that will last for about two weeks. Then, after you’ve started reading all this daily intelligence input and become used to using what amounts to whole libraries of hidden information, which is much more closely held than mere top secret data, you will forget there ever was a time when you didn’t have it, and you’ll be aware only of the fact that you have it now and most others don’t….and that all those other people are fools.
“Over a longer period of time — not too long, but a matter of two or three years — you’ll eventually become aware of the limitations of this information. There is a great deal that it doesn’t tell you, it’s often inaccurate, and it can lead you astray just as much as the New York Times can. But that takes a while to learn.
“In the meantime it will have become very hard for you to learn from anybody who doesn’t have these clearances. Because you’ll be thinking as you listen to them: ‘What would this man be telling me if he knew what I know? Would he be giving me the same advice, or would it totally change his predictions and recommendations?’ And that mental exercise is so torturous that after a while you give it up and just stop listening. I’ve seen this with my superiors, my colleagues….and with myself.
“You will deal with a person who doesn’t have those clearances only from the point of view of what you want him to believe and what impression you want him to go away with, since you’ll have to lie carefully to him about what you know. In effect, you will have to manipulate him. You’ll give up trying to assess what he has to say. The danger is, you’ll become something like a moron. You’ll become incapable of learning from most people in the world, no matter how much experience they may have in their particular areas that may be much greater than yours.”
….Kissinger hadn’t interrupted this long warning. As I’ve said, he could be a good listener, and he listened soberly. He seemed to understand that it was heartfelt, and he didn’t take it as patronizing, as I’d feared. But I knew it was too soon for him to appreciate fully what I was saying. He didn’t have the clearances yet.

Daniel Kokotajlo 18 Jul 2024 21:46 UTC
2 points
0
on: Intelligence in Commitment Races
Greasers with unloosened steering wheels don’t share a both-drivers-having-their-steering-wheels commons—because each greaser can commit ahead of time to not swerving, this commons exists.

This sentence feels unnecessary/out-of-place/confusing. (Stylistic nitpick)

Daniel Kokotajlo 15 Jul 2024 13:55 UTC
7 points
4
in reply to: lc’s comment on: lc’s Shortform
Not sure, but it seems to me that in the vast majority of Everett branches in which shots were fired at Trump, either they all missed or at least one of them scored a hit solid enough to kill or seriously injure Trump. The outcome that happened in our branch (graze his cheek & ear) is pretty unlikely. I don’t think there are any implications of this, it’s just interesting.

Daniel Kokotajlo 14 Jul 2024 1:27 UTC
LW: 5 AF: 4
1
AF
on: A simple case for extreme inner misalignment
This doesn’t sound like an argument Yudkowsky would make, though it seems to have some similar concepts. And it’s interesting food for thought regardless—thanks! Looking forward to the rest of the series.

Daniel Kokotajlo 12 Jul 2024 18:35 UTC
5 points
1
on: Making AIs less likely to be spiteful
Thanks for doing this! I think this is a promising line of research and I look forward to seeing this agenda developed!

Daniel Kokotajlo 10 Jul 2024 22:43 UTC
2 points
0
in reply to: DavidW’s comment on: Deceptive Alignment is <1% Likely by Default
I just realized I never responded to this. Sorry. I hope to find time to respond someday… feel free to badger me about it. Curious how you are doing these days and what you are up to.

Daniel Kokotajlo 10 Jul 2024 22:04 UTC
10 points
2
in reply to: Bogdan Ionut Cirstea’s comment on: Daniel Kokotajlo’s Shortform
On the contrary, I think the development model was bang on the money basically. As peterbarnett says Ajeya did forecast that there’d be a bunch of pre-training before RL. It even forecast that there’d be behavior cloning too after the pretraining and before the RL. And yeah, RL isn’t happening on a massive scale yet (as far as we know) but I and others predict that’ll change in the next few years.

Daniel Kokotajlo 10 Jul 2024 18:04 UTC
LW: 8 AF: 6
0
AF
on: Daniel Kokotajlo’s Shortform
Rereading this classic by Ajeya Cotra: https://www.planned-obsolescence.org/july-2022-training-game-report/

I feel like this is an example of a piece that is clear, well-argued, important, etc. but which doesn’t seem to have been widely read and responded to. I’d appreciate pointers to articles/posts/papers that explicitly (or, failing that, implicitly) respond to Ajeya’s training game report. Maybe the ‘AI Optimists?’

Daniel Kokotajlo 10 Jul 2024 17:27 UTC
LW: 53 AF: 22
15
AF
on: Daniel Kokotajlo’s Shortform
I found this article helpful and depressing. Kudos to TracingWoodgrains for detailed, thorough investigation.

Daniel Kokotajlo 10 Jul 2024 15:10 UTC
LW: 5 AF: 3
0
AF
in reply to: Matthew Barnett’s comment on: Matthew Barnett’s Shortform
Thanks for this Matthew, it was an update for me—according to the quote you pulled Bostrom did seem to think that understanding would grow up hand-in-hand with agency, such that the current understanding-without-agency situation should come as a positive/welcome surprise to him. (Whereas my previous position was that probably Bostrom didn’t have much of an opinion about this)

Daniel Kokotajlo 9 Jul 2024 17:34 UTC
12 points
6
in reply to: Tapatakt’s comment on: niplav’s Shortform
Wait really? That’s super bad. I sure hope Anthropic isn’t reading this and then fine-tuning or otherwise patching their model to hide the fact that they trained on the canary string...

I just tried it (with a minor jailbreak) and it worked though.

Daniel Kokotajlo 8 Jul 2024 22:33 UTC
6 points
2
in reply to: Hastings’s comment on: Open Thread Summer 2024
I don’t think that outcome would be a win condition from the point of view of evolution. A win condition would be “AGIs that intrinsically want to replicate take over the lightcone” or maybe the more moderate “AGIs take over the lightcone and fill it with copies of themselves, to at least 90% of the degree to which they would do so if their terminal goal was filling it with copies of themselves”

Realistically (at least in these scenarios) there’s a period of replication and expansion, followed by a period of ‘exploitation’ in which all the galaxies get turned into paperclips (or whatever else the AGIs value) which is probably not going to be just more copies of themselves.

Daniel Kokotajlo 8 Jul 2024 14:52 UTC
2 points
0
on: Reflections on Less Online
aybar

Habryka i believe

Daniel Kokotajlo 5 Jul 2024 2:59 UTC
2 points
0
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
So, it wasn’t your idea for the crew to start theorizing that they were fictional characters? Nor was it your idea for them to theorize that they were fictional characters in a story written by AI?

Daniel Kokotajlo 4 Jul 2024 14:38 UTC
2 points
0
in reply to: Wei Dai’s comment on: Wei Dai’s Shortform
Nice story! Could you copy-paste here the prompt you used? I’m curious what bits of plot were added by you and which were improvised by the AI.

Daniel Kokotajlo 4 Jul 2024 14:25 UTC
3 points
1
in reply to: defun’s comment on: AI Timelines
Twas just a guess, I think it could go either way. In fact these days I’d guess they wouldn’t release it at all; the official internal story would be it’s for security and safety reasons.

Daniel Kokotajlo 2 Jul 2024 21:00 UTC
24 points
9
on: An AI Race With China Can Be Better Than Not Racing

I’m confused by this graph. Why is there no US non-race timeline? Or is that supposed to be MAGIC? If so, why is it so much farther behind than the PRC non-race timeline?

Also, the US race and PRC race shouldn’t be independent distributions. A still inaccurate but better model would be to use the same distribution for USA and then have PRC be e.g. 1 year behind +/- some normally distributed noise with mean 0 and SD 1 year.