AI strategy & governance. ailabwatch.org. Looking for new projects.
Zach Stein-Perlman(Zachary Stein-Perlman)
Kat, Emerson, and Drew’s reputation is not your concern insofar you’re basically certain that your post is basically true. If you thought there was a decent chance that your post was basically wrong and Nonlinear would find proof in the next week, publishing now would be inappropriate.
When destroying someone’s reputation you have an extra obligation to make sure what you’re saying is true. I think you did that in this case—just clarifying norms.
Update: Greg Brockman quit.
Update: Sam and Greg say:
Sam and I are shocked and saddened by what the board did today.
Let us first say thank you to all the incredible people who we have worked with at OpenAI, our customers, our investors, and all of those who have been reaching out.
We too are still trying to figure out exactly what happened. Here is what we know:
- Last night, Sam got a text from Ilya asking to talk at noon Friday. Sam joined a Google Meet and the whole board, except Greg, was there. Ilya told Sam he was being fired and that the news was going out very soon.
- At 12:19pm, Greg got a text from Ilya asking for a quick call. At 12:23pm, Ilya sent a Google Meet link. Greg was told that he was being removed from the board (but was vital to the company and would retain his role) and that Sam had been fired. Around the same time, OpenAI published a blog post.
- As far as we know, the management team was made aware of this shortly after, other than Mira who found out the night prior.
The outpouring of support has been really nice; thank you, but please don’t spend any time being concerned. We will be fine. Greater things coming soon.
Update: three more resignations including Jakub Pachocki.
Sam Altman’s firing as OpenAI CEO was not the result of “malfeasance or anything related to our financial, business, safety, or security/privacy practices” but rather a “breakdown in communications between Sam Altman and the board,” per an internal memo from chief operating officer Brad Lightcap seen by Axios.
Update: Sam is planning to launch something (no details yet).
Update: Sam may return as OpenAI CEO.
Update: Tigris.
Update: talks with Sam and the board.
Update: Mira wants to hire Sam and Greg in some capacity; board still looking for a permanent CEO.
Update: Emmett Shear is interim CEO; Sam won’t return.
Update: lots more resignations (according to an insider).
Update: Sam and Greg leading a new lab in Microsoft.
Update: total chaos.
Ben has also been quietly fixing errors in the post, which I appreciate, but people are going around right now attacking us for things that Ben got wrong, because how would they know he quietly changed the post?
This is why every time newspapers get caught making a mistake they issue a public retraction the next day to let everyone know. I believe Ben should make these retractions more visible.
I used a diff checker to find the differences between the current post and the original post. There seem to be two:
“Alice worked there from November 2021 to June 2022” became “Alice travelled with Nonlinear from November 2021 to June 2022 and started working for the org from around February”
“using Lightcone funds” became “using personal funds”
Possibly I made a mistake, or Ben made edits and you saw them and then Ben reverted them—if so, I encourage you/anyone to point to another specific edit, possibly on other archive.org versions.
Update: Kat guesses she was thinking of changes from a near-final draft rather than changes from the first published version.
I largely agree. But I think not-stacking is only slightly bad because I think the “crappy toy model [where] every alignment-visionary’s vision would ultimately succeed, but only after 30 years of study along their particular path” is importantly wrong; I think many new visions have a decent chance of succeeding more quickly and if we pursue enough different visions we get a good chance of at least one paying off quickly.
Edit: even if alignment researchers could stack into just a couple paths, I think we might well still choose to go wide.
Please tell us what you think! Love it/hate it/think it should be different? Let us know.
I think it’s a fine experiment but… right now I’m closest to “hate it,” at least if it was used for all posts (I’d be much happier if it was only for question-posts, or only if the author requested it or a moderator thought it would be particularly useful, or something).
It makes voting take longer (with not much value added).
It makes reading comments take longer (with not much value added). You learn very little from these votes beyond what you learn from reading the comment.
It’s liable to make the more OCD among us go crazy. Worrying about how other people vote on your writing is bad enough. I, for one, would write worse comments in expectation if I was always thinking about making everyone else believe that my comments were true and well-aimed and clear and truth-seeking &c.
If this system was implemented in general, I would almost always prefer not to interact with it, so I would strongly request a setting to hide all non-karma voting from my view.
Edit in response to Rafael: for me at least the downside isn’t anxiety but mental effort to optimize for comment quality rather than votes and mental effort to ignore votes on my own comments. I’m not sure if the distinction matters; regardless, I’d be satisfied with the ability to hide non-karma votes.
Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, “It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!”
Has anyone collected their public statements on various AI x-risk topics anywhere?
A bit, not shareable.
Helen is an AI safety person. Tasha is on the Effective Ventures board. Ilya leads superalignment. Adam signed the CAIS statement.
I ended up doing some quick google searches for AI opinion polls
I collected such polls here, if you want to see more. Most people say they want to regulate AI.
- 26 Dec 2022 3:19 UTC; 35 points) 's comment on Let’s think about slowing down AI by (EA Forum;
Or! This idea sounds superficially reasonable and even (per the appendix) gets praise from a few people, but is actually useless or harmful. Currently working out a hypothesis for how that could be the case...
This was the press release; the actual order has now been published.
One safety-relevant part:
4.2. Ensuring Safe and Reliable AI. (a) Within 90 days of the date of this order, to ensure and verify the continuous availability of safe, reliable, and effective AI in accordance with the Defense Production Act, as amended, 50 U.S.C. 4501 et seq., including for the national defense and the protection of critical infrastructure, the Secretary of Commerce shall require:
(i) Companies developing or demonstrating an intent to develop potential dual-use foundation models to provide the Federal Government, on an ongoing basis, with information, reports, or records regarding the following:
(A) any ongoing or planned activities related to training, developing, or producing dual-use foundation models, including the physical and cybersecurity protections taken to assure the integrity of that training process against sophisticated threats;
(B) the ownership and possession of the model weights of any dual-use foundation models, and the physical and cybersecurity measures taken to protect those model weights; and
(C) the results of any developed dual-use foundation model’s performance in relevant AI red-team testing based on guidance developed by NIST pursuant to subsection 4.1(a)(ii) of this section, and a description of any associated measures the company has taken to meet safety objectives, such as mitigations to improve performance on these red-team tests and strengthen overall model security. Prior to the development of guidance on red-team testing standards by NIST pursuant to subsection 4.1(a)(ii) of this section, this description shall include the results of any red-team testing that the company has conducted relating to lowering the barrier to entry for the development, acquisition, and use of biological weapons by non-state actors; the discovery of software vulnerabilities and development of associated exploits; the use of software or tools to influence real or virtual events; the possibility for self-replication or propagation; and associated measures to meet safety objectives; and
(ii) Companies, individuals, or other organizations or entities that acquire, develop, or possess a potential large-scale computing cluster to report any such acquisition, development, or possession, including the existence and location of these clusters and the amount of total computing power available in each cluster.
(b) The Secretary of Commerce, in consultation with the Secretary of State, the Secretary of Defense, the Secretary of Energy, and the Director of National Intelligence, shall define, and thereafter update as needed on a regular basis, the set of technical conditions for models and computing clusters that would be subject to the reporting requirements of subsection 4.2(a) of this section. Until such technical conditions are defined, the Secretary shall require compliance with these reporting requirements for:
(i) any model that was trained using a quantity of computing power greater than 1026 integer or floating-point operations, or using primarily biological sequence data and using a quantity of computing power greater than 1023 integer or floating-point operations; and
(ii) any computing cluster that has a set of machines physically co-located in a single datacenter, transitively connected by data center networking of over 100 Gbit/s, and having a theoretical maximum computing capacity of 1020 integer or floating-point operations per second for training AI.
But other labs are even less safe, and not far behind.
Yes, largely alignment is an unsolved problem on which progress is an exogenous function of time. But to a large extent we’re safer with safety-interested labs developing powerful AI: this will boost model-independent alignment research, make particular critical models more likely to be aligned/controlled, help generate legible evidence that alignment is hard (insofar as that exists), and maybe enable progress to pause at a critical moment.
Covid-19 will be one more disease among many, and life will be marginally worse, but by about April you shouldn’t act substantially differently than if it no longer existed.
This seems quite bold given our history of variants emerging. And if Omicron infects billions, then prima facie there’s great opportunity for mutation. I’d be interested to hear your credence in the following proposition:
From 1 May 2021 to 1 Jan 2030, Zvi won’t act substantially differently due to risk of SARS-CoV-2 infection.
Additionally, “one more disease among many” suggests (to me) that it won’t cause 100K+ more deaths in the following few years, which also seems bold. [edit: American deaths, see replies for more]
Some nitpicks:
You write like Stockfish 14 is a probabilistic function from game-state to next-move, the thing-which-has-an-ELO. But I think Stockfish 14 running on X hardware for Y time is the real probabilistic function from game-state to next-move (see e.g. the inclusion of hardware in ELO ranking here). And you probably played with hardware and time such that its ELO is substantially below 3549.
I think a human with Stockfish’s ELO would be much better at beating you down odds of a queen, since (not certain about these):
Stockfish is optimized for standard chess and human grandmasters are probably better at transferring to odds-chess.
Stockfish roughly tries to maximize P(win) against optimal play or Stockfish-level play, or maximize number of moves before losing once it knows you have a winning strategy. Human grandmasters would adapt to be better against your skill level (e.g. by trying to make positions more complex), and would sometimes correctly make choices that would be bad against Stockfish or optimal play but good against weaker players.
fwiw my guess is that OP didn’t ask its grantees to do open-source LLM biorisk work at all; I think its research grantees generally have lots of freedom.
(I’ve worked for an OP-funded research org for 1.5 years. I don’t think I’ve ever heard of OP asking us to work on anything specific, nor of us working on something because we thought OP would like it. Sometimes we receive restricted, project-specific grants, but I think those projects were initiated by us. Oh, one exception: Holden’s standards-case-studies project.)
What would a good RSP look like?
Clear commitments along the lines of “we promise to run these 5 specific tests to evaluate these 10 specific dangerous capabilities.”
Clear commitments regarding what happens if the evals go off (e.g., “if a model scores above a 20 on the Hubinger Deception Screener, we will stop scaling until it has scored below a 10 on the relatively conservative Smith Deception Test.”)
Clear commitments regarding the safeguards that will be used once evals go off (e.g., “if a model scores above a 20 on the Cotra Situational Awareness Screener, we will use XYZ methods and we believe they will be successful for ABC reasons.”)
Clear evidence that these evals will exist, will likely work, and will be conservative enough to prevent catastrophe
Some way of handling race dynamics (such that Bad Guy can’t just be like “haha, cute that you guys are doing RSPs. We’re either not going to engage with your silly RSPs at all, or we’re gonna publish our own RSP but it’s gonna be super watered down and vague”).
Yeah, of course this would be nice. But the reason that ARC and Anthropic didn’t write this ‘good RSP’ isn’t that they’re reckless, but because writing such an RSP is a hard open problem. It would be great to have “specific tests” for various dangerous capabilities, or “Some way of handling race dynamics,” but nobody knows what those are.
Of course the specific object-level commitments Anthropic has made so far are insufficient. (Fortunately, they committed to make more specific object-level commitments before reaching ASL-3, and ASL-3 is reasonably well-specified [edit: and almost certainly below x-catastrophe-level].) I praise Anthropic’s RSP and disagree with your vibe because I don’t think you or I or anyone else could write much better commitments. (If you have specific commitments-labs-should-make in mind, please share them!)
(Insofar as you’re just worried about comms and what-people-think-about-RSPs rather than how-good-RSPs-are, I’m agnostic.)
I built an (unpublished) TAI timelines model
I’d be excited to see this if it’s substantially different from existing published models. (Edit: yay, it’s https://www.lesswrong.com/posts/4ufbirCCLsFiscWuY/a-proposed-method-for-forecasting-ai)
I account for potential coordinated delays, catastrophes, and a 15% chance that we’re fundamentally wrong about all of this stuff.
+1 to noting this explicitly; everyone should distinguish between their conditional on no major disruptions and their unconditional models.
In addition to the bill, CAIP has a short summary and a long summary.
Unfortunately, I can’t talk about the game itself, as that’s forbidden by the rules.
You two can just change the rules… I’m confused by this rule.
Based on the past week’s worth of papers, it seems quite likely that we are now in a fast takeoff, and that we have 2-5 years until Moore’s law and organizational prioritization put these systems at AGI.
What makes you say this? What should I read to appreciate how big a deal for AGI the recent papers are?
We already have a Schelling point for “infohazard”: Bostrom’s paper. Redefining “infohazard” now is needlessly confusing. (And most of the time I hear “infohazard” it’s in the collectively-destructive smallpox-y sense, and as Buck notes this is more important and common.)