Factored Cognition

Note: This post (origi­nally pub­lished here) is the tran­script of a pre­sen­ta­tion about a pro­ject worked on at the non-profit Ought. It is in­cluded in the se­quence be­cause it con­tains a very clear ex­pla­na­tion of some of the key ideas be­hind iter­ated am­plifi­ca­tion.


The pre­sen­ta­tion be­low mo­ti­vates our Fac­tored Cog­ni­tion pro­ject from an AI al­ign­ment an­gle and de­scribes the state of our work as of May 2018. An­dreas gave ver­sions of this pre­sen­ta­tion at CHAI (4/​25), a Deep­mind-FHI sem­i­nar (5/​24) and FHI (5/​25).

I’ll talk about Fac­tored Cog­ni­tion, our cur­rent main pro­ject at Ought. This is joint work with Ozzie Gooen, Ben Rach­bach, An­drew Schreiber, Ben We­in­stein-Raun, and (as board mem­bers) Paul Chris­ti­ano and Owain Evans.

Be­fore I get into the de­tails of the pro­ject, I want to talk about the broader re­search pro­gram that it is part of. And to do that, I want to talk about re­search pro­grams for AGI more gen­er­ally.

Right now, the dom­i­nant paradigm for re­searchers who ex­plic­itly work to­wards AGI is what you could call “scal­able learn­ing and plan­ning in com­plex en­vi­ron­ments”. This paradigm sub­stan­tially re­lies on train­ing agents in simu­lated phys­i­cal en­vi­ron­ments to solve tasks that are similar to the sorts of tasks an­i­mals and hu­mans can solve, some­times in iso­la­tion and some­times in com­pet­i­tive multi-agent set­tings.

To be clear, not all tasks are phys­i­cal tasks. There’s also in­ter­est in more ab­stract en­vi­ron­ments as in the case of play­ing Go, prov­ing the­o­rems, or par­ti­ci­pat­ing in goal-based di­a­log.

For our pur­poses, the key char­ac­ter­is­tic of this re­search paradigm is that agents are op­ti­mized for suc­cess at par­tic­u­lar tasks. To the ex­tent that they learn par­tic­u­lar de­ci­sion-mak­ing strate­gies, those are learned im­plic­itly. We only provide ex­ter­nal su­per­vi­sion, and it wouldn’t be en­tirely wrong to call this sort of ap­proach “re­ca­pitu­lat­ing evolu­tion”, even if this isn’t ex­actly what is go­ing on most of the time.

As many peo­ple have pointed out, it could be difficult to be­come con­fi­dent that a sys­tem pro­duced through this sort of pro­cess is al­igned—that is, that all its cog­ni­tive work is ac­tu­ally di­rected to­wards solv­ing the tasks it is in­tended to help with. The rea­son for this is that al­ign­ment is a prop­erty of the de­ci­sion-mak­ing pro­cess (what the sys­tem is “try­ing to do”), but that is un­ob­served and only im­plic­itly con­trol­led.

Aside: Could more ad­vanced ap­proaches to trans­parency and in­ter­pretabil­ity help here? They’d cer­tainly be use­ful in di­ag­nos­ing failures, but un­less we can also lev­er­age them for train­ing, we might still be stuck with ar­chi­tec­tures that are difficult to al­ign.

What’s the al­ter­na­tive? It is what we could call in­ter­nal su­per­vi­sion—su­per­vis­ing not just in­put-out­put be­hav­ior, but also cog­ni­tive pro­cesses. There is some prior work, with Neu­ral Pro­gram­mer-In­ter­preters per­haps be­ing the most no­table in­stance of that class. How­ever, de­pend­ing on how you look at it, there is cur­rently much less in­ter­est in such ap­proaches than in end-to-end train­ing, which isn’t sur­pris­ing: A big part of the ap­peal of AI over tra­di­tional pro­gram­ming is that you don’t need to spec­ify how ex­actly prob­lems are solved.

In this talk, I’ll dis­cuss an al­ter­na­tive re­search pro­gram for AGI based on in­ter­nal su­per­vi­sion. This pro­gram is based on imi­tat­ing hu­man rea­son­ing and meta-rea­son­ing, and will be much less de­vel­oped than the one based on ex­ter­nal su­per­vi­sion and train­ing in com­plex en­vi­ron­ments.

The goal for this al­ter­na­tive pro­gram is to cod­ify rea­son­ing that peo­ple con­sider “good” (“helpful”, “cor­rigible”, “con­ser­va­tive”). This could in­clude some prin­ci­ples of good rea­son­ing that we know how to for­mal­ize (such as prob­a­bil­ity the­ory and ex­pected value calcu­la­tions), but could also in­clude heuris­tics and san­ity checks that are only lo­cally valid.

For a sys­tem built this way, it could be sub­stan­tially eas­ier to be­come con­fi­dent that it is al­igned. Any bad out­comes would need to be pro­duced through a se­quence of hu­man-en­dorsed rea­son­ing steps. This is far from a guaran­tee that the re­sult­ing be­hav­ior is good, but seems like a much bet­ter start­ing point. (See e.g. Dewey 2017.)

The hope would be to (wher­ever pos­si­ble) punt on solv­ing hard prob­lems such as what de­ci­sion the­ory agents should use, and how to ap­proach episte­mol­ogy and value learn­ing, and in­stead to build AI sys­tems that in­herit our epistemic situ­a­tion, i.e. that are un­cer­tain about those top­ics to the ex­tent that we are un­cer­tain.

I’ve de­scribed ex­ter­nal and in­ter­nal su­per­vi­sion as differ­ent ap­proaches, but in re­al­ity there is a spec­trum, and it is likely that prac­ti­cal sys­tems will com­bine both.

How­ever, the right end of the spec­trum—and es­pe­cially ap­proaches based on learn­ing to rea­son from hu­mans—are more ne­glected right now. Ought aims to speci­fi­cally make progress on au­tomat­ing hu­man-like or hu­man-en­dorsed de­liber­a­tion.

A key challenge for these ap­proaches is scal­a­bil­ity: Even if we could learn to imi­tate how hu­mans solve par­tic­u­lar cog­ni­tive tasks, that wouldn’t be enough. In most cases where we figured out how to au­to­mate cog­ni­tion, we didn’t just match hu­man abil­ity, but ex­ceeded it, some­times by a large mar­gin. There­fore, one of the fea­tures we’d want an ap­proach to AI based on imi­tat­ing hu­man metar­ea­son­ing to have is a story for how we could use that ap­proach to even­tu­ally ex­ceed hu­man abil­ity.

Aside: Usu­ally, I fold “al­igned” into the defi­ni­tion of “scal­able” and de­scribe Ought’s mis­sion as “find­ing scal­able ways to lev­er­age ML for de­liber­a­tion”.

What does it mean to “au­to­mate de­liber­a­tion”? Un­like in more con­crete set­tings such as play­ing a game of Go, this is not im­me­di­ately clear.

For Go, there’s a clear task (choose moves based on a game state), there’s rele­vant data (recorded hu­man games), and there’s an ob­vi­ous ob­jec­tive (to win the game). For de­liber­a­tion, none of these are ob­vi­ous.

As a task, we’ll choose ques­tion-an­swer­ing. This en­com­passes ba­si­cally all other tasks, at least if ques­tions are al­lowed to be big (i.e. can point to ex­ter­nal data).

The data we’ll train on will be recorded hu­man ac­tions in cog­ni­tive workspaces. I’ll show an ex­am­ple in a cou­ple of slides. The ba­sic idea is to make think­ing ex­plicit by re­quiring peo­ple to break it down into small rea­son­ing steps, to limit con­tex­tual in­for­ma­tion, and to record what in­for­ma­tion is available at each step.

An im­por­tant point here is that our goal is not to cap­ture hu­man rea­son­ing ex­actly as it is in day-to-day life, but to cap­ture a way of rea­son­ing that peo­ple would en­dorse. This is im­por­tant, be­cause the strate­gies we need to use to make think­ing ex­plicit will nec­es­sar­ily change how peo­ple think.

Fi­nally, the ob­jec­tive will be to choose cog­ni­tive ac­tions that peo­ple would en­dorse af­ter de­liber­a­tion.

Note the weird loop—since our task is au­tomat­ing de­liber­a­tion, the ob­jec­tive is par­tially defined in terms of the be­hav­ior that we are aiming to im­prove through­out the train­ing pro­cess. This sug­gests that we might be able to set up train­ing dy­nam­ics where the su­per­vi­sion sig­nal always stays a step ahead of the cur­rent best sys­tem, analo­gous to GANs and self-play.

We can de­com­pose the prob­lem of au­tomat­ing de­liber­a­tion into two parts:

  1. How can we make de­liber­a­tion suffi­ciently ex­plicit that we could in prin­ci­ple repli­cate it us­ing ma­chine learn­ing? In other words, how do we gen­er­ate the ap­pro­pri­ate kind of train­ing data?

  2. How do we ac­tu­ally au­to­mate it?

In case you’re fa­mil­iar with Iter­ated Distil­la­tion and Am­plifi­ca­tion: The two parts roughly cor­re­spond to am­plifi­ca­tion (first part) and dis­til­la­tion (sec­ond part).

The core con­cept be­hind our ap­proach is that of a cog­ni­tive workspace. A workspace is as­so­ci­ated with a ques­tion and a hu­man user is tasked with mak­ing progress on think­ing through that ques­tion. To do so, they have mul­ti­ple ac­tions available:

  • They can re­ply to the ques­tion.

  • They can edit a scratch­pad, writ­ing down notes about in­ter­me­di­ate re­sults and ideas on how to make progress on this ques­tion.

  • They can ask sub-ques­tions that help them an­swer the over­all ques­tion.

Sub-ques­tions are an­swered in the same way, each by a differ­ent user. This gives rise to a tree of ques­tions and an­swers. The size of this tree is con­trol­led by a bud­get that is as­so­ci­ated with each workspace and that the cor­re­spond­ing user can dis­tribute over sub-ques­tions.

The ap­proach we’ll take to au­tomat­ing cog­ni­tion is based on record­ing and imi­tat­ing ac­tions in such workspaces. Apart from in­for­ma­tion passed in through the ques­tion and through an­swers to sub-ques­tions, each workspace is iso­lated from the oth­ers. If we show each workspace to a differ­ent user and limit the to­tal time for each workspace to be short, e.g. 15 min­utes, we fac­tor the prob­lem-solv­ing pro­cess in a way that guaran­tees that there is no un­ob­served la­tent state that is ac­cu­mu­lated over time.

There are a few more tech­ni­cal­ities that are im­por­tant to mak­ing this work in prac­tice.

The most im­por­tant one is prob­a­bly the use of poin­t­ers. If we can only ask plain-text ques­tions and sub-ques­tions, the band­width of the ques­tion-an­swer­ing sys­tem is severely limited. For ex­am­ple, we can’t ask “Why did the pro­tag­o­nist crash the car in book X” be­cause the book X would be too large to pass in as a literal ques­tion. Similarly, we can’t del­e­gate “Write an in­spiring es­say about ar­chi­tec­ture”, be­cause the es­say would be too large to pass back.

We can lift this re­stric­tion by al­low­ing users to cre­ate and pass around poin­t­ers to datas­truc­tures. A sim­ple ap­proach for do­ing this is to re­place plain text ev­ery­where with mes­sages that con­sist of text in­ter­spersed with refer­ences to other mes­sages.

The com­bi­na­tion of poin­t­ers and short per-workspace time limits leads to a sys­tem where many prob­lems are best tack­led in an al­gorith­mic man­ner. For ex­am­ple, in many situ­a­tions all a workspace may be do­ing is map­ping a func­tion (rep­re­sented as a nat­u­ral lan­guage mes­sage) over a list (a mes­sage with linked list struc­ture), with­out the user know­ing or car­ing about the con­tent of the func­tion and list.

Now let’s try to be a bit more pre­cise about the parts of the sys­tem we’ve seen.

One com­po­nent is the hu­man policy, which we treat as a state­less map from con­texts (im­mutable ver­sions of workspaces) to ac­tions (such as ask­ing a par­tic­u­lar sub-ques­tion).

Com­ing up with a sin­gle such ac­tions should take the hu­man at most a few min­utes.

The other main com­po­nent is the tran­si­tion func­tion, which con­sumes a con­text and an ac­tion and gen­er­ates a set of new con­texts.

For ex­am­ple, if the ac­tion is to ask a sub-ques­tion, there will be two new con­texts:

  1. The suc­ces­sor of the par­ent con­text that now has an ad­di­tional refer­ence to a sub-ques­tion.

  2. The ini­tial con­text for the newly gen­er­ated sub-ques­tion workspace.

Com­posed to­gether, the hu­man policy and the tran­si­tion func­tion define a kind of eval­u­a­tor: A map from a con­text to a set of new con­texts.

In what fol­lows, nodes (de­picted as cir­cles) re­fer to workspaces. Note that both in­puts and out­puts of workspaces can be mes­sages with poin­t­ers, i.e. can be very large ob­jects.

I’ll mostly col­lapse workspaces to just ques­tions and an­swers, so that we can draw en­tire trees of workspaces more eas­ily.

By iter­a­tively ap­ply­ing the eval­u­a­tor, we gen­er­ate in­creas­ingly large trees of workspaces. Over the course of this pro­cess, the root ques­tion will be­come in­creas­ingly in­formed by an­swers to sub-com­pu­ta­tions, and should thus be­come in­creas­ingly cor­rect and helpful. (What ex­actly hap­pens de­pends on how the tran­si­tion func­tion is set up, and what in­struc­tions we give to the hu­man users.)

This pro­cess is es­sen­tially iden­ti­cal to what Paul Chris­ti­ano refers to as am­plifi­ca­tion: A sin­gle am­plifi­ca­tion step aug­ments an agent (in our case, a hu­man ques­tion-an­swerer) by giv­ing it ac­cess to calls to it­self. Mul­ti­ple am­plifi­ca­tion steps gen­er­ate trees of agents as­sist­ing each other.

I’ll now walk through a few ex­am­ples of differ­ent types of think­ing by re­cur­sive de­com­po­si­tion.

The longer-term goal be­hind these ex­am­ples is to un­der­stand: How de­com­pos­able is cog­ni­tive work? That is, can am­plifi­ca­tion work—in gen­eral, or for spe­cific prob­lems, with or with­out strong bounds on the ca­pa­bil­ity of the re­sult­ing sys­tem?

Per­haps the eas­iest non-triv­ial case is ar­ith­metic: To mul­ti­ply two num­bers, we can use the rules of ad­di­tion and mul­ti­pli­ca­tion to break down the mul­ti­pli­ca­tion into a few mul­ti­pli­ca­tions of smaller num­bers and add up the re­sults.

If we wanted to scale to very large num­bers, we’d have to rep­re­sent each num­ber as a nested poin­ter struc­ture in­stead of plain text as shown here.

We can also im­ple­ment other kinds of al­gorithms. Here, we’re given a se­quence of num­bers as a linked list and we sum it up one by one. This ends up look­ing pretty much the same as how you’d sum up a list of num­bers in a purely func­tional pro­gram­ming lan­guage such as Lisp or Scheme.

In­deed, we can im­ple­ment any al­gorithm us­ing this frame­work—it is com­pu­ta­tion­ally uni­ver­sal. One way to see this is to im­ple­ment an eval­u­a­tor for a pro­gram­ming lan­guage, e.g. fol­low­ing the ex­am­ple of the meta-cir­cu­lar eval­u­a­tor in SICP.

As a con­se­quence, if there’s a prob­lem we can’t solve us­ing this sort of frame­work, it’s not be­cause the frame­work can’t run the pro­gram re­quired to solve it. It’s be­cause the frame­work can’t come up with the pro­gram by com­pos­ing short-term tasks.

Let’s start mov­ing away from ob­vi­ously al­gorith­mic ex­am­ples. This ex­am­ple shows how one could gen­er­ate a Fermi es­ti­mate of a quan­tity by com­bin­ing up­per and lower bounds for the es­ti­mates of com­po­nent quan­tities.

This ex­am­ple hints at how one might im­ple­ment con­di­tion­ing for prob­a­bil­ity dis­tri­bu­tions. We could first gen­er­ate a list of all pos­si­ble out­comes to­gether with their as­so­ci­ated prob­a­bil­ities, then filter the list of out­comes to only in­clude those that satisfy our con­di­tion, and renor­mal­ize the re­sult­ing (sub-)dis­tri­bu­tion such that the prob­a­bil­ities of all out­comes sum to one again.

The gen­eral prin­ci­ple here is that we’re happy to run very ex­pen­sive com­pu­ta­tions as long as they’re se­man­ti­cally cor­rect. What I’ve de­scribed for con­di­tion­ing is more or less the text­book defi­ni­tion of ex­act in­fer­ence, but in gen­eral that is com­pu­ta­tion­ally in­tractable for dis­tri­bu­tions with many vari­ables. The rea­son we’re happy with ex­pen­sive com­pu­ta­tions is that even­tu­ally we won’t in­stan­ti­ate them ex­plic­itly, but rather em­u­late them us­ing cheap ML-based func­tion ap­prox­i­ma­tors.

If we want to use this frame­work to im­ple­ment agents that can even­tu­ally ex­ceed hu­man ca­pa­bil­ity, we can’t use most hu­man ob­ject-level knowl­edge, but rather need to set up a pro­cess that can learn hu­man-like abil­ities from data in a more scal­able way.

Con­sider the ex­am­ple of un­der­stand­ing nat­u­ral lan­guage: If we wanted to de­ter­mine whether a pair of sen­tences is a con­tra­dic­tion, en­tail­ment, or neu­tral (as in the SNLI dataset), we could sim­ply ask the hu­man to judge—but this won’t scale to lan­guages that none of the hu­man judges know.

Alter­na­tively, we can break down nat­u­ral lan­guage un­der­stand­ing into (very) many small com­po­nent tasks and try to solve the task with­out lev­er­ag­ing the hu­mans’ na­tive lan­guage un­der­stand­ing fa­cil­ities much. For ex­am­ple, we might start by com­put­ing the mean­ing of a sen­tence as a func­tion of the mean­ings of all pos­si­ble pairs of sub-phrases.

As in the case of prob­a­bil­is­tic in­fer­ence, this will be com­pu­ta­tion­ally in­tractable, and get­ting the de­com­po­si­tion right in the first place is sub­stan­tially harder than solv­ing the ob­ject-level task.

Here’s a class of prob­lems that seems par­tic­u­larly challeng­ing for fac­tored cog­ni­tion: Prob­lems where peo­ple would usu­ally learn con­cepts over an ex­tended pe­riod of time.

Con­sider solv­ing a prob­lem that is posed halfway through a math text­book. Usu­ally, the text­book reader would have solved many sim­pler prob­lems up to this point and would have built up con­cep­tual struc­tures and heuris­tics that then al­low them to solve this new prob­lem. If we need to solve the prob­lem by com­pos­ing work done by a large col­lec­tion of hu­mans, none of which can spend more than 15 min­utes on the task, we’ll have to re­place this in­tu­itive, im­plicit pro­cess with an ex­ter­nal­ized, ex­plicit al­ter­na­tive.

It’s not en­tirely clear to me how to do that, but one way to start would be to build up knowl­edge about the propo­si­tions and en­tities that are part of the prob­lem state­ment by effec­tively ap­ply­ing se­man­tic pars­ing to the rele­vant parts of the text­book, so that we can later ask whether (e.g.) a propo­si­tion with mean­ing X im­plies a propo­si­tion with mean­ing Y, where both X and Y are large nested poin­ter struc­tures that en­code de­tailed mean­ing rep­re­sen­ta­tions.

If this re­minds you of Good Old-Fash­ioned AI, it is not by ac­ci­dent. We’re es­sen­tially try­ing to suc­ceed where GOFAI failed, and our pri­mary ad­van­tage is that we’re okay with ex­po­nen­tially ex­pen­sive com­pu­ta­tions, be­cause we’re not plan­ning to ever run them di­rectly. More on that soon.

So far, the workspaces we’ve looked at were quite sparse. All ques­tions and an­swers were limited to a sen­tence or two. This “low-band­width” set­ting is not the only way to use the sys­tem—we could al­ter­na­tively in­struct the hu­man users to provide more de­tail in their ques­tions and to write longer an­swers.

For the pur­pose of au­toma­tion, low band­width has ad­van­tages, both in the short term (where it makes au­toma­tion eas­ier) and in the long term (where it re­duces a par­tic­u­lar class of po­ten­tial se­cu­rity vuln­er­a­bil­ities).

Em­piri­cal ev­i­dence from ex­per­i­ments with hu­mans will need to in­form this choice as well, and the cor­rect an­swer is prob­a­bly at least slightly more high-band­width than the ex­am­ples shown so far.

Here’s a kind of rea­son­ing that I feel rel­a­tively op­ti­mistic that we can im­ple­ment us­ing fac­tored cog­ni­tion: Causal rea­son­ing, both learn­ing causal struc­tures from data as well as com­put­ing the re­sults of in­ter­ven­tions and coun­ter­fac­tu­als.

The par­tic­u­lar tree of workspaces shown here doesn’t re­ally illus­trate this, but I can imag­ine im­ple­ment­ing Pearl-style al­gorithms for causal in­fer­ence in a way where each step lo­cally makes sense and slightly sim­plifies the over­all prob­lem.

The fi­nal ex­am­ple, meta-rea­son­ing, is in some ways the most im­por­tant one: If we want fac­tored cog­ni­tion to even­tu­ally pro­duce very good solu­tions to prob­lems—per­haps be­ing com­pet­i­tive with any other sys­tem­atic ap­proach—then it’s not enough to rely on the users di­rectly choos­ing a good ob­ject-level de­com­po­si­tion for the prob­lem at hand. In­stead, they’ll need to go meta and use the sys­tem to rea­son about what de­com­po­si­tions would work well, and how to find them.

One kind of gen­eral pat­tern for this is that users can ask some­thing like “What ap­proach should we take to prob­lem #1?” as a first sub-prob­lem, get back an an­swer #2, and then ask “What is the re­sult of ex­e­cut­ing ap­proach #2 to ques­tion #1?” as a sec­ond sub-ques­tion. As we in­crease the bud­get for the meta-ques­tion, the ob­ject-level ap­proach can change rad­i­cally.

And, of course, we could also go meta twice, ask about ap­proaches to solv­ing the first meta-level prob­lem, and the same con­sid­er­a­tion ap­plies: Our meta-level ap­proach to find­ing good ob­ject-level ap­proaches could im­prove sub­stan­tially as we in­vest more bud­get in meta-meta.

So far, I’ve shown one par­tic­u­lar in­stan­ti­a­tion of fac­tored cog­ni­tion: a way to struc­ture workspaces, a cer­tain set of ac­tions, and a cor­re­spond­ing im­ple­men­ta­tion of the tran­si­tion func­tion that gen­er­ates new workspace ver­sions.

By vary­ing each of these com­po­nents, we can gen­er­ate other ways to build sys­tems in this space. For ex­am­ple, we might in­clude ac­tions for ask­ing clar­ify­ing ques­tions. I’ve writ­ten about these de­grees of free­dom on our tax­on­omy page.

Here’s one ex­am­ple of an al­ter­nate sys­tem. This is a straight­for­ward Javascript port of parts of Paul Chris­ti­ano’s ALBA im­ple­men­ta­tion.

Workspaces are struc­tured as se­quences of ob­ser­va­tions and ac­tions. All ac­tions are com­mands that the user types, in­clud­ing ask, re­ply, view (for ex­pand­ing a poin­ter), and re­flect (for get­ting a poin­ter to the cur­rent con­text).

The com­mand-line ver­sion is available on Github.

A few days ago, we open-sourced Patch­work, a new com­mand-line app for re­cur­sive ques­tion-an­swer­ing where we paid par­tic­u­lar at­ten­tion to build it in a way that is a good ba­sis for mul­ti­ple users and au­toma­tion. To see a brief screen­cast, take a look at the README.

Sup­pose de­com­po­si­tion worked and we could solve difficult prob­lems us­ing fac­tored cog­ni­tion—how could we tran­si­tion from only us­ing hu­man la­bor to par­tial au­toma­tion and even­tu­ally full au­toma­tion? I’ll dis­cuss a few ap­proaches, start­ing from very ba­sic ideas that we can im­ple­ment now and pro­gress­ing to ones that will not be tractable us­ing pre­sent-day ML.

Let’s again con­sider a tree of workspaces, and in each workspace, one or more hu­mans tak­ing one or more ac­tions.

For sim­plic­ity, I’ll pre­tend that there is just a sin­gle ac­tion per workspace. This al­lows me to equiv­o­cate nodes and ac­tions be­low. Noth­ing sub­stan­tial changes if there are mul­ti­ple ac­tions.

I’ll also pre­tend that all hu­mans are es­sen­tially iden­ti­cal, which is ob­vi­ously false, but al­lows me to con­sider the sim­pler prob­lem of learn­ing a sin­gle hu­man policy from data.

As a first step to­wards au­toma­tion, we’ll mem­o­ize the hu­man H. That is, when­ever we would show a con­text to H, we first check whether we’ve shown this con­text to some other H be­fore, and if so, we di­rectly reuse the ac­tion that was taken pre­vi­ously.

This is a big win if many con­texts are sim­ple. For ex­am­ple, it may be very com­mon to want to map a func­tion over a list, and this op­er­a­tion will always in­volve the same kinds of sub-ques­tions (check if the list is empty, if not get the first el­e­ment, ap­ply the func­tion to the first el­e­ment, etc). Ideally, we only do this sort of work once and then reuse it in the fu­ture. Me­moiza­tion gets us part of the way there.

A sig­nifi­cant step up in difficulty, we can try to imi­tate the be­hav­ior of H even in cases where the new con­text doesn’t match any el­e­ment of our “train­ing set” ex­actly.

Of course, for gen­eral ques­tion-an­swer­ing, we won’t be able to fully au­to­mate the hu­man policy any time soon. This means that any imi­ta­tion al­gorithms we do ap­ply will need to make choices about whether a con­text un­der con­sid­er­a­tion is the sort of situ­a­tion where they can make good pre­dic­tions about what a hu­man would do, or whether to ab­stain.

If al­gorithms can make mul­ti­ple choices in se­quence, we need al­gorithms that are well-cal­ibrated about when their ac­tions are ap­pro­pri­ate, and that in par­tic­u­lar have very few false pos­i­tives. Other­wise, even a rel­a­tively low prob­a­bil­ity of false pos­i­tives could cas­cade into se­quences of in­ap­pro­pri­ate ac­tions.

We’ve tried to iso­late and study this par­tic­u­lar prob­lem—mak­ing well-cal­ibrated pre­dic­tions in AI-com­plete do­mains—in a sep­a­rate pro­ject called Pre­dict­ing Slow Judg­ments. So far, we’ve found it challeng­ing to make non-triv­ial pre­dic­tions about hu­man re­sponses for the dataset we’ve col­lected there.

How use­ful would it be to be able to au­to­mate some frac­tion of hu­man ac­tions? If the to­tal num­ber of ac­tions needed to solve a task is ex­po­nen­tially large (e.g. be­cause we’re enu­mer­at­ing all po­ten­tial sub-phrases of a para­graph of text), even be­ing able to au­to­mate 90% of all ac­tions wouldn’t be enough to make this ap­proach com­pu­ta­tion­ally tractable. To get to tractabil­ity in that regime, we need to au­to­mate en­tire sub­trees. (And we need to do so us­ing an amount of train­ing data that is not it­self ex­po­nen­tially large—an im­por­tant as­pect that this talk won’t ad­dress at all.)

Let’s re­con­sider am­plifi­ca­tion. Re­call that in this con­text, each node rep­re­sents the ques­tion-an­swer be­hav­ior im­ple­mented by a workspace op­er­ated on by some agent (to start with, a hu­man). This agent can pose sub-ques­tions to other agents who may or may not them­selves get to ask such sub-ques­tions, as in­di­cated by whether they have nodes be­low them or not.

Each step grows the tree of agents by one level, so af­ter n steps, we have a tree of size . This pro­cess will be­come in­tractable be­fore long.

(The next few slides de­scribe Paul Chris­ti­ano’s Iter­ated Distil­la­tion and Am­plifi­ca­tion ap­proach to train­ing ML sys­tems.)

In­stead of iter­at­ing am­plifi­ca­tion, let’s pause af­ter one step. We started out with a sin­gle agent (a hu­man) and then built a com­pos­ite sys­tem us­ing mul­ti­ple agents (also all hu­mans). This com­pos­ite sys­tem is slower than the one we started out with. This slow­down per­haps isn’t too bad for a sin­gle step, but it will add up over the course of mul­ti­ple steps. To iter­ate am­plifi­ca­tion many times, we need to avoid this slow­down. What can we do?

The ba­sic idea is to train an ML-based agent to imi­tate the be­hav­ior of the com­pos­ite sys­tem. A sim­ple (but in­suffi­cient!) ap­proach would be to gen­er­ate train­ing data—ques­tions and an­swers—based on the be­hav­ior of the com­pos­ite sys­tem, and to train a su­per­vised learner us­ing this dataset.

In prac­tice, this sort of train­ing (“dis­til­la­tion”) would prob­a­bly need to in­volve not just imi­ta­tion, but more ad­vanced tech­niques, in­clud­ing ad­ver­sar­ial train­ing and ap­proaches to in­ter­pretabil­ity that al­low the com­pos­ite sys­tem (the “over­seer”) to rea­son about the in­ter­nals of its fast ML-based suc­ces­sor.

If we wanted to im­ple­ment this train­ing step in rich do­mains, we’d need ML tech­niques that are sub­stan­tially bet­ter than the state of the art as of May 2018, and even then, some do­mains would al­most cer­tainly re­sist effi­cient dis­til­la­tion.

But, hy­po­thet­i­cally, if we could im­ple­ment faith­ful dis­til­la­tion, we would have a much bet­ter start­ing point for the next am­plifi­ca­tion step: We could com­pose to­gether mul­ti­ple in­stances of the fast ML-based learner, and the re­sult would be a tree of agents that is only as large as the one we built in the first step (3 nodes, say), but ex­hibits the ques­tion-an­swer be­hav­ior of an agent that has mul­ti­ple ad­vi­sors, each of which as ca­pa­ble as the en­tire tree at the first step.

We can re­peat what­ever train­ing pro­cess we used in the first step to get a yet bet­ter dis­til­led sys­tem that “imi­tates” the be­hav­ior of the over­seer com­posed of the sys­tems trained in the pre­vi­ous step.

Through re­peated am­plifi­ca­tion and dis­til­la­tion, we could hope to even­tu­ally sa­ti­ate the rep­re­sen­ta­tional and com­pu­ta­tional abil­ities of what­ever ML sys­tem we’re us­ing in the dis­til­la­tion step, while guid­ing it to­wards im­ple­ment­ing ques­tion-an­swer be­hav­ior that cor­re­sponds to what H would do if they had a large num­ber of well-re­sourced as­sis­tants.

In prac­tice, we might not want to im­ple­ment this pro­cess as a se­ries of dis­tinct sys­tems, and in­stead run self-play where a sin­gle sys­tem serves both as the over­seer and the sys­tem-to-be-trained.

If Iter­ated Am­plifi­ca­tion and Distil­la­tion can work, we might be able to ap­prox­i­mate the re­sults of run­ning some com­pu­ta­tions that would naively take ex­po­nen­tial time: af­ter n steps of am­plifi­ca­tion and dis­til­la­tion, we’d use a fast ML-based ap­prox­i­ma­tor to run com­pu­ta­tions that would take time if we in­stan­ti­ated them ex­plic­itly.

As a par­tic­u­larly in­ter­est­ing spe­cial case, this might in­clude the kinds of hu­man-guided com­pu­ta­tions that arise from peo­ple take ac­tions in cog­ni­tive workspaces.

There are many open ques­tions for the scheme de­scribed above, both on whether we can make rea­son­ing ex­plicit, and on whether we can au­to­mate it effi­ciently even if it is made ex­plicit. While I’ve talked a bit about au­toma­tion, any­thing be­yond ba­sic au­toma­tion is out of scope for Ought right now, so I’ll fo­cus on open ques­tions re­lated to de­com­po­si­tion.

For de­com­po­si­tion, the two main ques­tions we ask our­selves are:

  1. Can fac­tored cog­ni­tion re­cover the abil­ity of a sin­gle hu­man work­ing over time for es­sen­tially all im­por­tant tasks?

  2. If so, can we ex­ceed the ca­pa­bil­ity of other sys­tem­atic ap­proaches to prob­lem-solv­ing if we just use suffi­ciently large bud­gets, i.e. com­pose suffi­ciently many small workspaces in suffi­ciently large trees? Equiv­a­lently, can we reach es­sen­tially ar­bi­trar­ily high ca­pa­bil­ity if we ex­e­cute suffi­ciently many am­plifi­ca­tions steps?

Our plan is to study both of these ques­tions us­ing a set of challenge prob­lems.

The idea be­hind these challenge prob­lems is to pick prob­lems that are par­tic­u­larly likely to stretch the ca­pa­bil­ities of prob­lem solv­ing by de­com­po­si­tion:

  1. When peo­ple tackle tricky math or pro­gram­ming puz­zles, they some­times give up, go to bed, and the next day in the shower they sud­denly know how to solve it. Can we solve such puz­zles even if no sin­gle in­di­vi­d­ual spends more than 15 min­utes on the prob­lem?

  2. We’ve already seen a math text­book ex­am­ple ear­lier. We want to know more gen­er­ally whether we can repli­cate the effects of learn­ing over time, and are plan­ning to study this us­ing differ­ent kinds of text­book prob­lems.

  3. Similarly, when peo­ple rea­son about ev­i­dence, e.g. about whether a state­ment that a poli­ti­cian made is true, they seem to make in­cre­men­tal up­dates to opaque in­ter­nal mod­els and may use heuris­tics that they find difficult to ver­bal­ize. If we in­stead re­quire all ev­i­dence to be ag­gre­gated ex­plic­itly, can we still match or ex­ceed their fact-check­ing ca­pa­bil­ities?

  4. All ex­am­ples of prob­lems we’ve seen are one-off prob­lems. How­ever, ul­ti­mately we want to use au­to­mated sys­tems to in­ter­act with a state­ful world, e.g. through di­a­log. Ab­stractly, we know how to ap­proach this situ­a­tion, but we’d like to try it in prac­tice e.g. on per­sonal ques­tions such as “Where should I go on va­ca­tion?“.

  5. For sys­tems to scale to high ca­pa­bil­ity, we’ve noted ear­lier that they will need to rea­son about cog­ni­tive strate­gies, not just ob­ject-level facts. Pri­ori­tiz­ing tasks for a user might be a do­main par­tic­u­larly suit­able for test­ing this, since the same kind of rea­son­ing (what to work on next) could be used on both ob­ject- and meta-level.

If we make progress on the fea­si­bil­ity of fac­tored cog­ni­tion and come to be­lieve that it might be able to match and even­tu­ally ex­ceed “nor­mal” think­ing, we’d like to move to­wards learn­ing more about how this pro­cess would play out.

What would the hu­man policy—the map from con­texts to ac­tions—look like that would have these prop­er­ties? What con­cepts would be part of this policy? For scal­ing to high ca­pa­bil­ity, it prob­a­bly can’t lev­er­age most of the ob­ject-level knowl­edge peo­ple have. But what else? Ab­stract knowl­edge about how to rea­son? About causal­ity, ev­i­dence, agents, logic? And how big would this policy be—could we effec­tively treat it as a lookup table, or are there many ques­tions and an­swers in dis­tinct do­mains that we could only re­ally learn to imi­tate us­ing so­phis­ti­cated ML?

What would hap­pen if we scaled up by iter­at­ing this learned hu­man policy many times? What in­struc­tions would the hu­mans that gen­er­ate our train­ing data need to fol­low for the re­sult­ing sys­tem to re­main cor­rigible, even if run with ex­tremely large amounts of com­pu­ta­tion (as might be the case if dis­til­la­tion works)? Would the be­hav­ior of the re­sult­ing sys­tem be chaotic, strongly de­pen­dent on its ini­tial con­di­tions, or could we be con­fi­dent that there is a basin of at­trac­tion that all care­ful ways of set­ting up such a sys­tem con­verge to?