Developmental Stages of GPTs

Epistemic Sta­tus: I only know as much as any­one else in my refer­ence class (I build ML mod­els, I can grok the GPT pa­pers, and I don’t work for OpenAI or a similar lab). But I think my the­sis is origi­nal.

Re­lated: Gw­ern on GPT-3

For the last sev­eral years, I’ve gone around say­ing that I’m wor­ried about trans­for­ma­tive AI, an AI ca­pa­ble of mak­ing an In­dus­trial Revolu­tion sized im­pact (the con­cept is ag­nos­tic on whether it has to be AGI or self-im­prov­ing), be­cause I think we might be one or two cog­ni­tive break­throughs away from build­ing one.

GPT-3 has made me move up my timelines, be­cause it makes me think we might need zero more cog­ni­tive break­throughs, just more re­fine­ment /​ effi­ciency /​ com­put­ing power: ba­si­cally, GPT-6 or GPT-7 might do it. My rea­son for think­ing this is com­par­ing GPT-3 to GPT-2, and re­flect­ing on what the differ­ences say about the “miss­ing pieces” for trans­for­ma­tive AI.

My Th­e­sis:

The differ­ence be­tween GPT-2 and GPT-3 has made me sus­pect that there’s a le­gi­t­i­mate com­par­i­son to be made be­tween the scale of a net­work ar­chi­tec­ture like the GPTs, and some analogue of “de­vel­op­men­tal stages” of the re­sult­ing net­work. Fur­ther­more, it’s plau­si­ble to me that the func­tions needed to be a trans­for­ma­tive AI are cov­ered by a mod­er­ate num­ber of such de­vel­op­men­tal stages, with­out re­quiring ad­di­tional struc­ture. Thus GPT-N would be a trans­for­ma­tive AI, for some not-too-large N, and we need to re­dou­ble our efforts on ways to al­ign such AIs.

The the­sis doesn’t strongly im­ply that we’ll reach trans­for­ma­tive AI via GPT-N es­pe­cially soon; I have wide un­cer­tainty, even given the the­sis, about how large we should ex­pect N to be, and whether the scal­ing of train­ing and of com­pu­ta­tion slows down progress be­fore then. But it’s also plau­si­ble to me now that the timeline is only a few years, and that no fun­da­men­tally differ­ent ap­proach will suc­ceed be­fore then. And that scares me.

Ar­chi­tec­ture and Scaling

GPT, GPT-2, and GPT-3 use nearly the same ar­chi­tec­ture; each pa­per says as much, with a sen­tence or two about minor im­prove­ments to the in­di­vi­d­ual trans­form­ers. Model size (and the amount of train­ing com­pu­ta­tion) is re­ally the only differ­ence.

GPT took 1 petaflop/​s-day to train 117M pa­ram­e­ters, GPT-2 took 10 petaflop/​s-days to train 1.5B pa­ram­e­ters, and the largest ver­sion of GPT-3 took 3,000 petaflop/​s-days to train 175B pa­ram­e­ters. By con­trast, AlphaS­tar seems to have taken about 30,000 petaflop/​s-days of train­ing in mid-2019, so the pace of AI re­search com­put­ing power pro­jects that there should be about 10x that to­day. The up­shot is that OpenAI may not be able to af­ford it, but if Google re­ally wanted to make GPT-4 this year, they could af­ford to do so.

Analogues to Devel­op­men­tal Stages

There are all sorts of (more or less well-defined) de­vel­op­men­tal stages for hu­man be­ings: image track­ing, ob­ject per­ma­nence, vo­cab­u­lary and gram­mar, the­ory of mind, size and vol­ume, emo­tional aware­ness, ex­ec­u­tive func­tion­ing, et cetera.

I was first re­minded of de­vel­op­men­tal stages a few years ago, when I saw the lay­ers of ab­strac­tion gen­er­ated in this fea­ture vi­su­al­iza­tion tool for GoogLeNet.

We don’t have fea­ture vi­su­al­iza­tion for lan­guage mod­els, but we do have gen­er­a­tive out­puts. And as you scale up an ar­chi­tec­ture like GPT, you see higher lev­els of ab­strac­tion. Gram­mar gets mas­tered, then con­tent (re­mov­ing ab­surd but gram­mat­i­cal re­sponses), then tone (first rough genre, then spook­ily ac­cu­rate au­tho­rial voice). Topic co­her­ence is mas­tered first on the phrase level, then the sen­tence level, then the para­graph level. So too with nar­ra­tive flow.

Gw­ern’s po­etry ex­per­i­ments (GPT-2, GPT-3) are good ex­am­ples. GPT-2 could more or less con­tinue the me­ter of a poem and use words that fit the ex­ist­ing theme, but even its best efforts can get stuck in topic loops:

…Me­thinks I see her in her bliss­ful dreams:
—Or, fancy-like, in some mirage she lies,
Ma­jes­tic yet ma­jes­tic, and of seem­s
The image of the un­con­quer­able skies.
Me­thinks I see her in her bliss­ful dreams:
—Or, fancy-like, in some ma­jes­tic cell,
Where lordly ser­aphs strew their balmy dream­s
On the still night, or in their golden shell.


In the dark the sun doth gleam,
And in the dark the moon doth seem
But now the evening is be­gun—
Gone is the sun upon the earth!
The silver moon doth like a cup
Of blood-red wine, and as that cup
Is drained of life, doth quench no drop.
What man will drink such wine?

GPT-3, though, has the sense of go­ing some­where, at least on the topic level. (Prompts in bold.)

O time thy pyra­mids, thy palaces for­got,
Pen­sive through na­tions wan­der with no more state
Than in thy days, when wan­der­ing was thy trade:
Mourn for Pa­traos, for the glory fled
Of him and thee, and in their stead left, left
To other eyes a length of rol­ling sand,
And such memo­ri­als as the trav­el­ler reads,
Half guessed in shade, of sculp­tured walls half round,
And long-de­faced metopes, and friezes old.

[EDIT: Pre­vi­ously I also in­cluded its com­ple­tion of a fa­mous Emily Dick­in­son poem here, but as benkuhn pointed out, GPT-3 had sim­ply mem­o­rized the poem and re­cited it. I’m re­ally em­bar­rassed, and also kind of shocked that I looked at the ac­tual text of “Be­cause I could not stop for Death” and thought, “yup, that looks like some­thing GPT-3 could pro­duce”.]

(One last shock­ing bit is that, while GPT-2 had to be fine-tuned by tak­ing the gen­eral model and train­ing it some more on a po­etry-only dataset, you’re see­ing what GPT-3′s model does with no fine-tun­ing, with just a prompt that sounds po­etic!)

Similarly, GPT-3′s abil­ity to write fic­tion is im­pres­sive- un­like GPT-2, it doesn’t lose track of the plot, it has sen­si­ble things hap­pen, it just can’t plan its way to a satis­fy­ing re­s­olu­tion.

I’d be some­what sur­prised if GPT-4 shared that last prob­lem.

What’s Next?

How could one of the GPTs be­come a trans­for­ma­tive AI, even if it be­comes a bet­ter and bet­ter imi­ta­tor of hu­man prose style? Sure, we can imag­ine it be­ing used mal­i­ciously to auto-gen­er­ate tar­geted mis­in­for­ma­tion or things of that sort, but that’s not the real risk I’m wor­ry­ing about here.

My real worry is that causal in­fer­ence and plan­ning are start­ing to look more and more like plau­si­ble de­vel­op­men­tal stages that GPT-3 is mov­ing to­wards, and that these were ex­actly the things I pre­vi­ously thought were the ob­vi­ous ob­sta­cles be­tween cur­rent AI paradigms and trans­for­ma­tive AI.

Learn­ing causal in­fer­ence from ob­ser­va­tions doesn’t seem qual­i­ta­tively differ­ent from learn­ing ar­ith­metic or cod­ing from ex­am­ples (and not only is GPT-3 ac­cu­rate at adding three-digit num­bers, but ap­par­ently at writ­ing JSX code to spec), only more com­plex in de­gree.

One might claim that causal in­fer­ence is harder to glean from lan­guage-only data than from di­rect ob­ser­va­tion of the phys­i­cal world, but that’s a moot point, as OpenAI are us­ing the same ar­chi­tec­ture to learn how to in­fer the rest of an image from one part.

Plan­ning is more com­plex to as­sess. We’ve seen GPTs as­cend from co­her­ence of the next few words, to the sen­tence or line, to the para­graph or stanza, and we’ve even seen them write work­ing code. But this can be done with­out plan­ning; GPT-3 may sim­ply have a good enough dis­tri­bu­tion over next words to prune out those that would lead to dead ends. (On the other hand, how sure are we that that’s not the same as plan­ning, if plan­ning is just prun­ing on a high enough level of ab­strac­tion?)

The big­ger point about plan­ning, though, is that the GPTs are get­ting feed­back on one word at a time in iso­la­tion. It’s hard for them to learn not to paint them­selves into a cor­ner. It would make train­ing more finicky and ex­pen­sive if we ex­panded the time hori­zon of the loss func­tion, of course. But that’s a straight­for­ward way to get the seeds of plan­ning, and surely there are other ways.

With causal mod­el­ing and plan­ning, you have the ca­pa­bil­ity of ma­nipu­la­tion with­out ex­ter­nal mal­i­cious use. And the re­ally wor­ri­some ca­pa­bil­ity comes when it mod­els its own in­ter­ac­tions with the world, and makes plans with that taken into ac­count.

Could GPT-N turn out al­igned, or at least harm­less?

GPT-3 is trained sim­ply to pre­dict con­tinu­a­tions of text. So what would it ac­tu­ally op­ti­mize for, if it had a pretty good model of the world in­clud­ing it­self and the abil­ity to make plans in that world?

One might hope that be­cause it’s learn­ing to imi­tate hu­mans in an un­su­per­vised way, that it would end up fairly hu­man, or at least act in that way. I very much doubt this, for the fol­low­ing rea­son:

  • Two hu­mans are fairly similar to each other, be­cause they have very similar ar­chi­tec­tures and are learn­ing to suc­ceed in the same en­vi­ron­ment.

  • Two con­ver­gently evolved species will be similar in some ways but not oth­ers, be­cause they have differ­ent ar­chi­tec­tures but the same en­vi­ron­men­tal pres­sures.

  • A mimic species will be similar in some ways but not oth­ers to the species it mimics, be­cause even if they share re­cent an­ces­try, the en­vi­ron­men­tal pres­sures on the poi­sonous one are differ­ent from the en­vi­ron­men­tal pres­sures on the mimic.

What we have with the GPTs is the first deep learn­ing ar­chi­tec­ture we’ve found that scales this well in the do­main (so, prob­a­bly not that much like our par­tic­u­lar ar­chi­tec­ture), learn­ing to mimic hu­mans rather than grow­ing in an en­vi­ron­ment with similar pres­sures. Why should we ex­pect it to be any­thing but very alien un­der the hood, or to con­tinue act­ing hu­man once its ac­tions take us out­side of the train­ing dis­tri­bu­tion?

More­over, there may be much more go­ing on un­der the hood than we re­al­ize; it may take much more gen­eral cog­ni­tive power to learn and imi­tate the pat­terns of hu­mans, than it re­quires us to ex­e­cute those pat­terns.

Next, we might imag­ine GPT-N to just be an Or­a­cle AI, which we would have bet­ter hopes of us­ing well. But I don’t ex­pect that an ap­prox­i­mate Or­a­cle AI could be used safely with any­thing like the pre­cau­tions that might work for a gen­uine Or­a­cle AI. I don’t know what in­ter­nal op­ti­miz­ers GPT-N ends up build­ing along the way, but I’m not go­ing to count on there be­ing none of them.

I don’t ex­pect that GPT-N will be al­igned or harm­less by de­fault. And if N isn’t that large be­fore it gets trans­for­ma­tive ca­pac­ity, that’s sim­ply ter­rify­ing.

What Can We Do?

While the short timeline sug­gested by the the­sis is very bad news from an AI safety readi­ness per­spec­tive (less time to come up with bet­ter the­o­ret­i­cal ap­proaches), there is one silver lin­ing: it at least re­duces the chance of a hard­ware over­hang. A pro­ject or coal­i­tion can fea­si­bly wait and take a bet­ter-al­igned ap­proach that uses 10x the time and ex­pense of an un­al­igned ap­proach, as long as they have that amount of re­source ad­van­tage over any com­peti­tor.

Un­for­tu­nately, the the­sis also makes it less likely that a fun­da­men­tally differ­ent ar­chi­tec­ture will reach trans­for­ma­tive sta­tus be­fore some­thing like GPT does.

I don’t want to take away from MIRI’s work (I still sup­port them, and I think that if the GPTs pe­ter out, we’ll be glad they’ve been con­tin­u­ing their work), but I think it’s an es­sen­tial time to sup­port pro­jects that can work for a GPT-style near-term AGI, for in­stance by in­cor­po­rat­ing spe­cific al­ign­ment pres­sures dur­ing train­ing. In­tu­itively, it seems as if Co­op­er­a­tive In­verse Re­in­force­ment Learn­ing or AI Safety via De­bate or Iter­ated Am­plifi­ca­tion are in this class.

We may also want to do a lot of work on how bet­ter to mold a GPT-in-train­ing into the shape of an Or­a­cle AI.

It would also be very use­ful to build some GPT fea­ture “vi­su­al­iza­tion” tools ASAP.

In the mean­time, uh, en­joy AI Dun­geon, I guess?