AI strategy & governance. ailabwatch.org. Looking for new projects.
Zach Stein-Perlman(Zachary Stein-Perlman)
Introducing AI Lab Watch
Ilya Sutskever and Jan Leike resign from OpenAI
FLI open letter: Pause giant AI experiments
The public supports regulating AI for safety
DeepMind: Model evaluation for extreme risks
Kat, Emerson, and Drew’s reputation is not your concern insofar you’re basically certain that your post is basically true. If you thought there was a decent chance that your post was basically wrong and Nonlinear would find proof in the next week, publishing now would be inappropriate.
When destroying someone’s reputation you have an extra obligation to make sure what you’re saying is true. I think you did that in this case—just clarifying norms.
The commitment—”20% of the compute we’ve secured to date” (in July 2023), to be used “over the next four years”—may be quite little in 2027, with compute use increasing exponentially. I’m confused about why people think it’s a big commitment.
Safety-wise, they claim to have run it through their Preparedness framework and the red-team of external experts.
I’m disappointed and I think they shouldn’t get much credit PF-wise: they haven’t published their evals, published a report on results, or even published a high-level “scorecard.” They are not yet meeting the commitments in their beta Preparedness Framework — some stuff is unclear but at the least publishing the scorecard is an explicit commitment.
(It’s now been six months since they published the beta PF!)
[Edit: not to say that we should feel much better if OpenAI was successfully implementing its PF—the thresholds are way too high and it says nothing about internal deployment.]
OpenAI: Preparedness framework
Questions for labs
Update: Greg Brockman quit.
Update: Sam and Greg say:
Sam and I are shocked and saddened by what the board did today.
Let us first say thank you to all the incredible people who we have worked with at OpenAI, our customers, our investors, and all of those who have been reaching out.
We too are still trying to figure out exactly what happened. Here is what we know:
- Last night, Sam got a text from Ilya asking to talk at noon Friday. Sam joined a Google Meet and the whole board, except Greg, was there. Ilya told Sam he was being fired and that the news was going out very soon.
- At 12:19pm, Greg got a text from Ilya asking for a quick call. At 12:23pm, Ilya sent a Google Meet link. Greg was told that he was being removed from the board (but was vital to the company and would retain his role) and that Sam had been fired. Around the same time, OpenAI published a blog post.
- As far as we know, the management team was made aware of this shortly after, other than Mira who found out the night prior.
The outpouring of support has been really nice; thank you, but please don’t spend any time being concerned. We will be fine. Greater things coming soon.
Update: three more resignations including Jakub Pachocki.
Sam Altman’s firing as OpenAI CEO was not the result of “malfeasance or anything related to our financial, business, safety, or security/privacy practices” but rather a “breakdown in communications between Sam Altman and the board,” per an internal memo from chief operating officer Brad Lightcap seen by Axios.
Update: Sam is planning to launch something (no details yet).
Update: Sam may return as OpenAI CEO.
Update: Tigris.
Update: talks with Sam and the board.
Update: Mira wants to hire Sam and Greg in some capacity; board still looking for a permanent CEO.
Update: Emmett Shear is interim CEO; Sam won’t return.
Update: lots more resignations (according to an insider).
Update: Sam and Greg leading a new lab in Microsoft.
Update: total chaos.
Ben has also been quietly fixing errors in the post, which I appreciate, but people are going around right now attacking us for things that Ben got wrong, because how would they know he quietly changed the post?
This is why every time newspapers get caught making a mistake they issue a public retraction the next day to let everyone know. I believe Ben should make these retractions more visible.
I used a diff checker to find the differences between the current post and the original post. There seem to be two:
“Alice worked there from November 2021 to June 2022” became “Alice travelled with Nonlinear from November 2021 to June 2022 and started working for the org from around February”
“using Lightcone funds” became “using personal funds”
Possibly I made a mistake, or Ben made edits and you saw them and then Ben reverted them—if so, I encourage you/anyone to point to another specific edit, possibly on other archive.org versions.
Update: Kat guesses she was thinking of changes from a near-final draft rather than changes from the first published version.
DeepMind: Evaluating Frontier Models for Dangerous Capabilities
I largely agree. But I think not-stacking is only slightly bad because I think the “crappy toy model [where] every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path” is importantly wrong; I think many new visions have a decent chance of succeeding more quickly and if we pursue enough different visions we get a good chance of at least one paying off quickly.
Edit: even if alignment researchers could stack into just a couple paths, I think we might well still choose to go wide.
OpenAI-Microsoft partnership
Please tell us what you think! Love it/hate it/think it should be different? Let us know.
I think it’s a fine experiment but… right now I’m closest to “hate it,” at least if it was used for all posts (I’d be much happier if it was only for question-posts, or only if the author requested it or a moderator thought it would be particularly useful, or something).
It makes voting take longer (with not much value added).
It makes reading comments take longer (with not much value added). You learn very little from these votes beyond what you learn from reading the comment.
It’s liable to make the more OCD among us go crazy. Worrying about how other people vote on your writing is bad enough. I, for one, would write worse comments in expectation if I was always thinking about making everyone else believe that my comments were true and well-aimed and clear and truth-seeking &c.
If this system was implemented in general, I would almost always prefer not to interact with it, so I would strongly request a setting to hide all non-karma voting from my view.
Edit in response to Rafael: for me at least the downside isn’t anxiety but mental effort to optimize for comment quality rather than votes and mental effort to ignore votes on my own comments. I’m not sure if the distinction matters; regardless, I’d be satisfied with the ability to hide non-karma votes.
Slowing AI: Reading list
Slowing AI: Foundations
Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, “It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!”
We already have a Schelling point for “infohazard”: Bostrom’s paper. Redefining “infohazard” now is needlessly confusing. (And most of the time I hear “infohazard” it’s in the collectively-destructive smallpox-y sense, and as Buck notes this is more important and common.)