Comments on OpenAI's "Planning for AGI and beyond"

Sam Altman shared me on a draft of his OpenAI blog post Planning for AGI and beyond, and I left some comments, reproduced below without typos and with some added hyperlinks. Where the final version of the OpenAI post differs from the draft, I’ve noted that as well, making text Sam later cut $r e d$ and text he added $b l u e$ .

My overall sense is that Sam deleted text and occasionally rephrased sentences so as to admit more models (sometimes including mine), but didn’t engage with the arguments enough to shift his own probability mass around on the important disagreements.

Our disagreements are pretty major, as far as I can tell. With my comments, I was hoping to spark more of a back-and-forth. Having failed at that, I’m guessing part of the problem is that I didn’t phrase my disagreements bluntly or strongly enough, while also noting various points of agreement, which might have overall made it sound like I had only minor disagreements.

To help with that, I’ve added blunter versions below, in a bunch of cases where I don’t think the public version of the post fully takes into account the point I was trying to make.

(Though I don’t want this to take away from the positive aspects of the post, since I think these are very important as well. I put in a bunch of positive comments on the original draft, in large part because I think it’s worth acknowledging and reinforcing whatever process/author drafted the especially reasonable paragraphs.)

I don’t expect Sam to hear me make a blunt claim and instantly update to agreeing with me, but I put more probability on us converging over time, and clearly stating our disagreements for the benefit of readers, if he understands my claims, knows I still disagree, and feels license to push back on specific things I’ve re-asserted.

Formatting note: The general format is that I include the text of Sam’s original draft (that I commented on), my comment, and the text of the final post. That said, I don’t want to go blasting someone’s old private drafts across the internet just because they shared that draft with me, so in some cases, the original text is redacted, at Sam’s request.

Sam’s draft: Our mission is to ensure that $A G I$ benefits all of humanity. $T h e c r e a t i o n o f$ $A G I s h o u l d b e a t r e m e n d o u s s h a r e d t r i u m p h t h a t e v e r y o n e c o n t r i b u t e s t o a n d b e n e f i t s$ $f r o m; i t w i l l b e t h e r e s u l t o f t h e c o l l e c t i v e t e c h n o l o g i c a l a n d s o c i e t a l p r o g r e s s o f$ $h u m a n i t y o v e r m i l l e n n i a .$

My comment: +1

Sam’s post: Our mission is to ensure that $a r t i f i c i a l g e n e r a l i n t e l l i g e n c e — A I s y s t e m s$ $t h a t a r e g e n e r a l l y s m a r t e r t h a n h u m a n s —$ benefits all of humanity.

Sam’s draft: $O f c o u r s e o u r c u r r e n t p r o g r e s s c o u l d h i t a w a l l, b u t$ if AGI is successfully created, this technology could help us elevate humanity by increasing abundance, turbocharging $o u r$ economy, and aiding in the discovery of new scientific knowledge.

My comment: seems to me an understatement :-p

(unlocking nanotech; uploading minds; copying humans; interstellar probes that aren’t slowed down by needing to cradle bags of meat, and that can have the minds beamed to them; energy abundance; ability to run civilizations on computers in the cold of space; etc. etc., are all things that i expect to follow from automated scientific & technological development)

(seems fine to avoid the more far-out stuff, and also fine to only say things that you personally believe, but insofar as you also expect some of this tech to be within-reach in 50 sidereal years after AGI, i think it’d be virtuous to acknowledge)

Sam’s post: If AGI is successfully created, this technology could help us elevate humanity by increasing abundance, turbocharging $t h e g l o b a l$ economy, and aiding in the discovery of new scientific knowledge $t h a t c h a n g e s t h e l i m i t s o f p o s s i b i l i t y$ .

Blunter follow-up: seems to undersell the technological singularity, and the fact that the large-scale/coarse-grain shape of the future will be governed by superintelligences.

Sam’s draft: On the other hand, AGI would also come with serious risk of misuse $a n d$ drastic accidents. Because the upside of AGI is so great, we do not believe $i t^{'} s$ possible or desirable for society to stop its development forever; instead, $w e$ have to figure out how to get it right. [1]

My comment: +1

Sam’s post: On the other hand, AGI would also come with serious risk of misuse, drastic accidents, $a n d s o c i e t a l d i s r u p t i o n$ . Because the upside of AGI is so great, we do not believe $i t i s$ possible or desirable for society to stop its development forever; instead, $s o c i e t y a n d t h e d e v e l o p e r s o f A G I$ have to figure out how to get it right. $^{C}$

Blunter follow-up: still +1, with the caveat that accident risk >> misuse risk

Sam’s draft: 1) We want AGI to empower humanity to maximally flourish in the universe. We don’t expect the future to be an unqualified utopia, but we want to maximize the good and minimize the bad, and for AGI to be an amplifier of $h u m a n w i l l$ .

My comment: i think i agree with the sentiment here, but i disagree with parts of the literal denotation

for one, i sure hope for an unqualified utopia, and think there’s a chance that superintelligent assistance could figure out how to get one (cf fun theory).

(it is ofc important to note that “superintelligences puppet the humans through the motions of a utopia” is not in fact a utopia, and that the future will undoubtedly include tradeoffs (including continuing to let people make their own mistakes and learn their own lessons), and so in that sense i agree that it wouldn’t be an “unqualified utopia”, even in the best case)

...though i don’t currently expect us to do that well, so i don’t technically disagree with the literal phrasing you chose there.

i do have qualms about “we want AGI to be an amplifier of human will”. there’s a bunch of ways that this seems off-kilter to me. my basic qualm here is that i think getting a wonderful future is more of a fragile operation than simply cranking up everybody’s “power level” simultaneously, roughly analogously to how spoiling a child isn’t the best way for them to grow up.

i’d stand full-throatedly behind “we want AGI to be an amplifier of all the best parts of humanity”.

also, i ofc ultimately want AGI that are also people, to be humanity’s friends as we explore the universe and so on. (though, stating the obvious, i think we should aim to avoid personhood in our early AGIs, for various reasons.)

Sam’s post: 1. We want AGI to empower humanity to maximally flourish in the universe. We don’t expect the future to be an unqualified utopia, but we want to maximize the good and minimize the bad, and for AGI to be an amplifier of $h u m a n i t y$ .

Blunter follow-up: we can totally get an unqualified utopia. also this “amplifier of humanity” thing sounds like an applause light—though i endorse certain charitable interpretations that i can wring from it (that essentially amount to CEV (as such things usually do)), at the same time i disendorse other interpretations.

Sam’s draft: 2) We want the benefits of, access to, and governance of AGI to be widely and fairly shared.

My comment: +1 to benefits of. i have lots more qualms about “access to” and “governance of”.

re “access to”, my guess is that early AGIs will be able to attain a decisive strategic advantage over the rest of the world entire. saying “everyone should have equal access” seems to me like saying “a nuclear bomb in every household”; it just sounds kinda mad.

i’d agree that once the world has exited the acute risk period, it’s critical for access to AGI tech to be similarly available to all. but that is, in my book, a critical distinction.

(so access-wise, i agree long-term, but not short-term.)

governance-wise, my current state is something like: in the short term, using design-by-committee to avert the destruction of the world sounds like a bad idea; and in the long term, i think you’re looking at stuff at least as crazy as people running thousands of copies of their own brain at 1000x speedup and i think it would be dystopian to try to yolk them to, like, the will of the flesh-bodied American taxpayers (or whatever).

there’s something in the spirit of “distributed governance” that i find emotionally appealing, but there’s also lots and lots of stuff right nearby, that would be catastrophic, dystopian, or both, and that implementation would be likely to stumble into in practice. so i have qualms about that one.

Sam’s post: [unchanged]

Blunter follow-up: full-throated endorsement of “benefits of”. widely sharing access and governance in the short-term seems reckless and destructive.

Sam’s draft: 3) We want to successfully navigate massive risks. In confronting these risks, we acknowledge that what seems right in theory $f r e q u e n t l y$ plays out more strangely than expected in practice. We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to $a v o i d a$ “one shot to get it right” $s c e n a r i o$ .

My comment: i don’t think that this “continuously deploy weak systems” helps avoid the “you have one shot”-type problems that i predict we’ll face in the future.

(this also strikes me as a rationalization for continuing to do the fun/cool work of pushing the capabilities envelope, which i currently think is net-bad for everyone)

Sam’s post: 3. We want to successfully navigate massive risks. In confronting these risks, we acknowledge that what seems right in theory $o f t e n$ plays out more strangely than expected in practice. We believe we have to continuously learn and adapt by deploying less powerful versions of the technology in order to $m i n i m i z e$ “one shot to get it right” $s c e n a r i o s$ .

Blunter follow-up: i think it takes a bunch more than “continuously deploy weak systems” to address the “one shot to get it right” scenarios, and none of the leading orgs (OAI included) seem to me to be on track to acquire the missing parts.

Sam’s draft: a gradual transition to a world with AGI is better than a sudden one

My comment: for the record, i don’t think continuous deployment really smooths out the sharp changes that i expect in the future. (i’m not trying to argue the point here, just noting that there are some people who are predicting a sort of sharp change that they think is ~unrelated to your choice of continuous deployment.)

Sam’s post: [unchanged]

Blunter follow-up: insofar as this sentence is attempting to reply to MIRI-esque concerns, i don’t think it’s a very good reply (for the reasons alluded to in the original comment).

Sam’s draft: A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place.

My comment: i’m skeptical, similar to the above.

Sam’s post: [unchanged]

Sam’s draft: It also allows $u s t o l e a r n a s m u c h a s w e c a n f r o m o u r d e p l o y m e n t s,$ for society and AI to co-evolve, and to collectively figure out what $w e$ want while the stakes are relatively low.

My comment: stating the obvious, other ways of learning as much as you can from the systems you have include efforts in transparency, legibility, and interpretability.

Sam’s post: It also allows for society and AI to co-evolve, and $f o r p e o p l e$ collectively to figure out what $t h e y$ want while the stakes are relatively low.

Sam’s draft: As our systems get closer to AGI, we are becoming increasingly $m o r e$ cautious with the creation and deployment of our models.

My comment: +1

Sam’s post: As our systems get closer to AGI, we are becoming increasingly cautious with the creation and deployment of our models.

Blunter follow-up: +1 as a thing y’all should do. my guess is that you need to do it even faster and more thoroughly than you have been.

a more general blunt note: me +1ing various statements does not mean that i think the corporate culture has internalized the corresponding points, and it currently looks likely to me that OpenAI is not on track to live up to the admirable phrases in this post, and are instead on track to get everyone killed.

i still think it’s valuable that the authors of this post are thinking about these points, and i hope that these sorts of public endorsements increase the probability that the corresponding sentiments actually end up fully internalized in the corporate culture, but i want to be clear that the actions are what ultimately matter, not the words.

Sam’s draft: Our decisions will require much more caution than society usually applies to new technologies, and more caution than many users would like. Some people in the AI field think the risks of AGI (and successor systems) are fictitious; we would be delighted if they turn out to be right, but we are going to operate as if these risks are existential.

My comment: hooray

Sam’s post: [unchanged]

Blunter follow-up: to be clear, my response to the stated sentiment is “hooray”, and i’m happy to see this plainly and publicly stated, but i have strong doubts about whether and how this sentiment will be implemented in practice.

Sam’s draft: At some point, the balance between the upsides and downsides of deployments (such as empowering malicious actors, creating social and economic disruptions, and accelerating an unsafe race) could shift, in which case we would significantly change our plans.

My comment: this is vague enough that i don’t quite understand what it’s saying; i’d appreciate it being spelled out more

Sam’s post: At some point, the balance between the upsides and downsides of deployments (such as empowering malicious actors, creating social and economic disruptions, and accelerating an unsafe race) could shift, in which case we would significantly change our plans $a r o u n d c o n t i n u o u s d e p l o y m e n t$ .

Sam’s draft: Second, we are working towards creating increasingly aligned $(i . e ., m o d e l s t h a t r e l i a b l y f o l l o w t h e i r u s e r s^{'} i n t e n t i o n s)$ and steerable models. Our shift from models like the first version of GPT-3 to ChatGPT and InstructGPT is an early example of this.

My comment: ftr, i think there’s a bunch of important notkilleveryoneism work that won’t be touched upon by this approach

Sam’s post: Second, we are working towards creating increasingly aligned and steerable models. Our shift from models like the first version of GPT-3 to InstructGPT and ChatGPT is an early example of this

Sam’s draft: Importantly, we think we often have to make progress on AI safety and capabilities together $(a n d t h a t$ it’s a false dichotomy to talk about them separately; they are correlated in many ways $)$ . Our best safety work has come from working with our most capable models.

My comment: i agree that they’re often connected, but i also think that traditionally the capabilities runs way out ahead of the alignment, and that e.g. if capabilities progress was paused now, there would be many years’ worth of alignment work that could be done to catch up (e.g. by doing significant work on transparency, legibility, and interpretability). and i think that if we do keep running ahead with the current capabilities/alignment ratio (or even a slightly better one), we die.

(stating the obvious: this is not to say that transparency/legibility/interpretability aren’t also intertwined with capabilities; it’s all intertwined to some degree. but one can still avoid pushing the capabilities frontier, and focus on the alignment end of things. and one can still institute a policy of privacy, to further avoid burning the commons.)

Sam’s post: Importantly, we think we often have to make progress on AI safety and capabilities together. It’s a false dichotomy to talk about them separately; they are correlated in many ways. Our best safety work has come from working with our most capable models. $T h a t s a i d, i t^{'} s i m p o r t a n t t h a t t h e r a t i o o f s a f e t y p r o g r e s s t o$ $c a p a b i l i t y p r o g r e s s i n c r e a s e s .$

Sam’s draft: We have a clause in our charter about assisting other organizations instead of racing them in late-stage AGI development. $[r e d a c t e d s t a t e m e n t t h a t g o t c u t]$

My comment: i think it’s really cool of y’all to have this; +1

Sam’s post: We have a clause in our Charter about assisting other organizations $t o a d v a n c e s a f e t y$ instead of racing $w i t h$ them in late-stage AGI development.

Sam’s draft: We have a cap on the returns our shareholders can earn so that we aren’t incentivized to attempt to capture value without bound and risk deploying something potentially catastrophically dangerous (and of course as a way to share the benefits with society)

My comment: also rad

Sam’s post: [unchanged]

Sam’s draft: We believe that the future of humanity should be determined by humanity. $[r e d a c t e d d r a f t v e r s i o n o f “ a n d t h a t i t^{'} s i m p o r t a n t t o s h a r e i n f o r m a t i o n a b o u t$ $p r o g r e s s w i t h t h e p u b l i c ”]$

My comment: +1 to “the future of humanity should be determined by humanity”.

My comment (#2): i agree with some of the sentiment of [redacted sentence in Sam’s draft], but note that things get weird in the context of a global arms race for potentially-civilization-ending tech. i, for one, am in favor of people saying “we are now doing our AGI research behind closed doors, because we don’t think it would be used wisely if put out in the open”.

Sam’s post: We believe that the future of humanity should be determined by humanity, $a n d t h a t i t^{'} s i m p o r t a n t t o s h a r e i n f o r m a t i o n a b o u t p r o g r e s s w i t h t h e p u b l i c$ .

Blunter follow-up: The +1 is based on a charitable read where “the future of humanity should be determined by humanity” irons out into CEV, as such things often do.

Blunter follow-up (#2): seems to me like a lot of weight is being put on “information about progress”.

there’s one read of this claim that says something like “the average human should know that a tiny cluster of engineers are about to gamble with everybody’s fate”, which does have a virtuous ring to it. and i wouldn’t personally argue for hiding that fact from anyone. but this is not a difficult fact for savvy people to learn from public information today.

is Sam arguing that there’s some concrete action in this class that the field has an unmet obligation to do, like dropping flyers in papua new guinea? this currently strikes me as a much more niche concern than the possible fast-approaching deaths of everyone (including papua new guineans!) at the hands of unfriendly AI, so i find it weird to mix those two topics together in a post about how to ensure a positive singularity.

possibly Sam has something else in mind, but if so i encourage more concreteness about what that is and why it’s important.

Sam’s draft: $[r e d a c t e d s t a t e m e n t t h a t g o t c u t]$ There should be great scrutiny of all efforts attempting to build AGI and public consultation for major decisions.

My comment: totally agree that people shouldn’t try to control the world behind closed doors. that said, i would totally endorse people building defensive technology behind closed doors, in attempts to (e.g.) buy time.

(ideally this would be done by state actors, which are at least semilegitimate. but if someone’s building superweapons, and you can build technology that thwarts them, and the state actors aren’t responding, then on my ethics it’s ok to build the technology that thwarts them, so long as this also does not put people at great risk by your own hands.)

(of course, most people building powerful tech that think they aren’t putting the world at great risk by their own hands, are often wrong; there’s various types of thinking on this topic that should simply not be trusted; etc. etc.)

[Post-hoc note: It’s maybe worth noting that on my ethics there’s an enormous difference between “a small cabal of humans exerts direct personal control over the world” and “run a CEV sovereign”, and i’m against the former but for the latter, with the extra caveat that nobody should be trying to figure out a CEV sovereign under time-pressure, nor launching an AGI unilaterally simply because they managed to convince themselves it was a CEV sovereign.]

My comment (#2): on “There should be great scrutiny of all efforts attempting to build AGI and public consultation for major decisions”, i agree with something like “it’s ethically important that the will of all humans goes into answering the questions of where superintelligence should guide the future”. this is separate from endorsing any particular design-by-committee choice of governance.

(for instance, if everybody today casts a vote for their favorite government style that they can think of, and then the AGI does the one that wins the most votes, i think that would end up pretty bad.)

which is to say, there’s some sentiment in sentences like this that i endorse, but the literal denotation makes me uneasy, and feels kinda like an applause-light. my qualms could perhaps be assuaged by a more detailed proposal, that i could either endorse or give specific qualms about.

Sam’s post: There should be great scrutiny of all efforts attempting to build AGI and public consultation for major decisions.

Sam’s draft: The first AGI will be just a point along the continuum of intelligence. We think it’s likely that progress will continue from there, possibly sustaining the rate of progress we’ve seen over the past decade for a long period of time. If this is true, the world could become extremely different from how it is today, and the risks could be extraordinary. A misaligned superintelligent AGI could cause grievous harm to the world; an autocratic regime with a decisive superintelligence lead could do that too.

My comment: +1

Sam’s post: [unchanged]

Sam’s draft: AI that can accelerate science is a special case worth thinking about, and perhaps more impactful than everything else. It’s possible that AGI capable enough to accelerate its own progress could cause major changes to happen surprisingly quickly (and even if the transition starts slowly, we expect it to happen pretty quickly in the final stages).

My comment: +1

Sam’s post: [unchanged]

Sam’s draft: We think a slower takeoff is easier to make safe, and coordination among AGI efforts to slow down at critical junctures will likely be important (even in a world where we don’t need to do this to solve technical alignment problems, slowing down may be important to give society enough time to adapt).

My comment: (sounds nice in principle, but i will note for the record that the plan i’ve heard for this is “do continuous deployment and hope to learn something”, and i don’t expect that to help much in slowing down or smoothing out a foom.)

Sam’s post: [unchanged]

Sam’s draft: [redacted version with phrasing similar to the public version]

My comment: +1

Sam’s post: Successfully transitioning to a world with superintelligence is perhaps the most important—and hopeful, and scary—project in human history. Success is far from guaranteed, and the stakes (boundless downside and boundless upside) will hopefully unite all of us.

Sam’s draft: $[r e d a c t e d s t a t e m e n t c a l l i n g b a c k t o “ O u r a p p r o a c h t o a l i g n m e n t$ $r e s e a r c h ”]$

My comment: i think this basically doesn’t work unless the early systems are already very aligned. (i have various drafts queued about this, as usual)

Sam’s post: [sentence deleted]

Comments on OpenAI’s “Planning for AGI and beyond”