Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Ilya Sutskever and Jan Leike have resigned. They led OpenAI’s alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece (archive) and others I’ve seen don’t really have details.

OpenAI announced Sutskever’s departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.


Updates:

Friday May 17:

Superalignment dissolves.

Leike tweets, including:

I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point.

I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics.

These problems are quite hard to get right, and I am concerned we aren’t on a trajectory to get there.

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Building smarter-than-human machines is an inherently dangerous endeavor. OpenAI is shouldering an enormous responsibility on behalf of all of humanity.

But over the past years, safety culture and processes have taken a backseat to shiny products.

Daniel Kokotajlo talks to Vox:

“I joined with substantial hope that OpenAI would rise to the occasion and behave more responsibly as they got closer to achieving AGI. It slowly became clear to many of us that this would not happen,” Kokotajlo told me. “I gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI, so I quit.”

Kelsey Piper says:

I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.

More.

TechCrunch says:

requests for . . . compute were often denied, blocking the [Superalignment] team from doing their work [according to someone on the team].

Piper is back:

OpenAI . . . says that going forward, they *won’t* strip anyone of their equity for not signing the secret NDA.

(This is slightly good but OpenAI should free all past employees from their non-disparagement obligations.)

Saturday May 18:

OpenAI leaders Sam Altman and Greg Brockman tweet a response to Leike. It doesn’t really say anything.

Separately, Altman tweets:

we have never clawed back anyone’s vested equity, nor will we do that if people do not sign a separation agreement (or don’t agree to a non-disparagement agreement). vested equity is vested equity, full stop.

there was a provision about potential equity cancellation in our previous exit docs; although we never clawed anything back, it should never have been something we had in any documents or communication. this is on me and one of the few times i’ve been genuinely embarrassed running openai; i did not know this was happening and i should have.

the team was already in the process of fixing the standard exit paperwork over the past month or so. if any former employee who signed one of those old agreements is worried about it, they can contact me and we’ll fix that too. very sorry about this.

This seems to contradict various claims, including (1) OpenAI threatened to take all of your equity if you don’t sign the non-disparagement agreement when you leave—the relevant question for evaluating OpenAI’s transparency/​integrity isn’t whether OpenAI actually took people’s equity, it’s whether OpenAI threatened to—and (2) Daniel Kokotajlo gave up all of his equity. (Note: OpenAI equity isn’t really equity, it’s “PPUs,” and I think the relevant question isn’t whether you own the PPUs but rather whether you’re allowed to sell them.)

No comment from OpenAI on freeing everyone from non-disparagement obligations.

It’s surprising that Altman says he “did not know this was happening.” I think Gwern and LW have been talking about this for a while. [Update: I failed to find this; I forget exactly why I feel like I was already aware of non-disparagement or where former OpenAI staff said “no comment” about such things.] Surely Altman knew that people leaving were signing non-disparagement agreements and would rather not… Oh, maybe he is talking narrowly about vested equity and OpenAI pseudo-equity is such that he’s saying something technically true.