Bridge Collapse: Reductionism as Engineering Problem

Fol­lowup to: Build­ing Phenomenolog­i­cal Bridges

Sum­mary: AI the­o­rists of­ten use mod­els in which agents are crisply sep­a­rated from their en­vi­ron­ments. This sim­plify­ing as­sump­tion can be use­ful, but it leads to trou­ble when we build ma­chines that pre­sup­pose it. A ma­chine that be­lieves it can only in­ter­act with its en­vi­ron­ment in a nar­row, fixed set of ways will not un­der­stand the value, or the dan­gers, of self-mod­ifi­ca­tion. By anal­ogy with Descartes’ mind/​body du­al­ism, I re­fer to agent/​en­vi­ron­ment du­al­ism as Carte­si­anism. The open prob­lem in Friendly AI (OPFAI) I’m call­ing nat­u­ral­ized in­duc­tion is the pro­ject of re­plac­ing Carte­sian ap­proaches to sci­en­tific in­duc­tion with re­duc­tive, phys­i­cal­is­tic ones.

I’ll be­gin with a story about a sto­ry­tel­ler.

Once upon a time — speci­fi­cally, 1976 — there was an AI named TALE-SPIN. This AI told sto­ries by in­fer­ring how char­ac­ters would re­spond to prob­lems from back­ground knowl­edge about the char­ac­ters’ traits. One day, TALE-SPIN con­structed a most pe­cu­liar tale.

Henry Ant was thirsty. He walked over to the river bank where his good friend Bill Bird was sit­ting. Henry slipped and fell in the river. Grav­ity drowned.

Since Henry fell in the river near his friend Bill, TALE-SPIN con­cluded that Bill res­cued Henry. But for Henry to fall in the river, grav­ity must have pul­led Henry. Which means grav­ity must have been in the river. TALE-SPIN had never been told that grav­ity knows how to swim; and TALE-SPIN had never been told that grav­ity has any friends. So grav­ity drowned.

TALE-SPIN had pre­vi­ously been pro­grammed to un­der­stand in­vol­un­tary mo­tion in the case of char­ac­ters be­ing pul­led or car­ried by other char­ac­ters — like Bill res­cu­ing Henry. So it was pro­grammed to un­der­stand ‘char­ac­ter X fell to place Y’ as ‘grav­ity moves X to Y’, as though grav­ity were a char­ac­ter in the story.1

For us, the hy­poth­e­sis ‘grav­ity drowned’ has low prior prob­a­bil­ity be­cause we know grav­ity isn’t the type of thing that swims or breathes or makes friends. We want agents to se­ri­ously con­sider whether the law of grav­ity pulls down rocks; we don’t want agents to se­ri­ously con­sider whether the law of grav­ity pulls down the law of elec­tro­mag­netism. We may not want an AI to as­sign zero prob­a­bil­ity to ‘grav­ity drowned’, but we at least want it to ne­glect the pos­si­bil­ity as Ridicu­lous-By-De­fault.

When we in­tro­duce deep type dis­tinc­tions, how­ever, we also in­tro­duce new ways our sto­ries can fail.

Hut­ter’s cy­ber­netic agent model

Rus­sell and Norvig’s lead­ing AI text­book cred­its Solomonoff with set­ting the agenda for the field of AGI: “AGI looks for a uni­ver­sal al­gorithm for learn­ing and act­ing in any en­vi­ron­ment, and has its roots in the work of Ray Solomonoff[.]” As an ap­proach to AGI, Solomonoff in­duc­tion pre­sup­poses a model with a strong type dis­tinc­tion be­tween the ‘agent’ and the ‘en­vi­ron­ment’. To make its in­tu­itive ap­peal and at­ten­dant prob­lems more ob­vi­ous, I’ll sketch out the model.

A Solomonoff-in­spired AI can most eas­ily be rep­re­sented as a multi-tape Tur­ing ma­chine like the one Alex Al­tair de­scribes in An In­tu­itive Ex­pla­na­tion of Solomonoff In­duc­tion. The ma­chine has:

- three tapes, la­beled ‘in­put’, ‘work’, and ‘out­put’. Each ini­tially has an in­finite strip of 0s writ­ten in dis­crete cells.

- one head per tape, with the in­put head able to read its cell’s digit and move to the right, the out­put head able to write 0 or 1 to its cell and move to the right, and the work head able to read, write, and move in ei­ther di­rec­tion.

- a pro­gram, con­sist­ing of a finite, fixed set of tran­si­tion rules. Each rule says when heads read, write, move, or do noth­ing, and how to tran­si­tion to an­other rule.

A three-tape Tur­ing ma­chine.

We could imag­ine two such Tur­ing ma­chines com­mu­ni­cat­ing with each other. Call them ‘Agent’ and ‘En­vi­ron­ment’, or ‘Alice’ and ‘Everett’. Alice and Everett take turns act­ing. After Everett writes a bit to his out­put tape, that bit mag­i­cally ap­pears on Alice’s in­put tape; and like­wise, when Alice writes to her out­put tape, it gets copied to Everett’s in­put tape. AI the­o­rists have used this setup, which Mar­cus Hut­ter calls the cy­ber­netic agent model, as an ex­tremely sim­ple rep­re­sen­ta­tion of an agent that can per­ceive its en­vi­ron­ment (us­ing the in­put tape), think (us­ing the work tape), and act (us­ing the out­put tape).2

A Tur­ing ma­chine model of agent-en­vi­ron­ment in­ter­ac­tions. At first, the ma­chines differ only in their pro­grams. ‘Alice’ is the agent we want to build, while ‘Everett’ stands for ev­ery­thing else that’s causally rele­vant to Alice’s suc­cess.

We can define Alice and Everett’s be­hav­ior in terms of any bit-pro­duc­ing Tur­ing ma­chines we’d like, in­clud­ing ones that rep­re­sent prob­a­bil­ity dis­tri­bu­tions and do Bayesian up­dat­ing. Alice might, for ex­am­ple, use her work tape to track four dis­tinct pos­si­bil­ities and up­date prob­a­bil­ities over them:3

  • (a) Everett always out­puts 0.

  • (b) Everett always out­puts 1.

  • (c) Everett out­puts its in­put.

  • (d) Everett out­puts the op­po­site of its in­put.

Alice starts with a uniform prior, i.e., 25% prob­a­bil­ity each. If Alice’s first out­put is 1, and Everett re­sponds with 1, then Alice can store those two facts on her work tape and con­di­tion­al­ize on them both, treat­ing them as though they were cer­tain. This re­sults in 0.5 prob­a­bil­ity each for (b) and (c), 0 prob­a­bil­ity for (a) and (d).

We care about an AI’s episte­mol­ogy only be­cause it in­forms the AI’s be­hav­ior — on this model, its bit out­put. If Alice out­puts what­ever bits max­i­mize her ex­pected chance of re­ceiv­ing 1s as in­put, then we can say that Alice prefers to per­ceive 1. In the ex­am­ple I just gave, such a prefer­ence pre­dicts that Alice will pro­ceed to out­put 1 for­ever. Fur­ther ex­plo­ra­tion is un­nec­es­sary, since she knows of no other im­por­tantly differ­ent hy­pothe­ses to test.

En­rich­ing Alice’s set of hy­pothe­ses for how Everett could act will let Alice win more games against a wider va­ri­ety of Tur­ing ma­chines. The more pro­grams Alice can pick out and as­sign a prob­a­bil­ity to, the more Tur­ing ma­chines Alice will be able to iden­tify and in­tel­li­gently re­spond to. If we aren’t wor­ried about whether it takes Alice ten min­utes or a billion years to com­pute an up­date, and Everett will always pa­tiently wait his turn, then we can sim­ply have Alice perform perfect Bayesian up­dates; if her pri­ors are right, and she trans­lates her be­liefs into sen­si­ble ac­tions, she’ll then be able to op­ti­mally re­spond to any en­vi­ron­men­tal Tur­ing ma­chine.

For AI re­searchers fol­low­ing Solomonoff’s lead, that’s the name of the game: Figure out the pro­gram that will let Alice be­have op­ti­mally while com­mu­ni­cat­ing with as wide a range of Tur­ing ma­chines as pos­si­ble, and you’ve at least solved the the­o­ret­i­cal prob­lem of pick­ing out the op­ti­mal ar­tifi­cial agent from the space of pos­si­ble rea­son­ers. The agent/​en­vi­ron­ment model here may look sim­ple, but a num­ber of the­o­rists see it as dis­till­ing into its most ba­sic form the task of an AGI.2

Yet a Tur­ing ma­chine, like a cel­lu­lar au­toma­ton, is an ab­stract ma­chine — a crea­ture of thought ex­per­i­ments and math­e­mat­i­cal proofs. Phys­i­cal com­put­ers can act like ab­stract com­put­ers, in just the same sense that heaps of ap­ples can be­have like the ab­stract ob­jects we call ‘num­bers’. But com­put­ers and ap­ples are high-level gen­er­al­iza­tions, im­perfectly rep­re­sented by con­cise equa­tions.4 When we move from our men­tal mod­els to try­ing to build an ac­tual AI, we have to pause and ask how well our for­mal­ism cap­tures what’s go­ing on in re­al­ity.

The prob­lem with Alice

‘Sen­sory in­put’ or ‘data’ is what I call the in­for­ma­tion Alice con­di­tion­al­izes on; and ‘be­liefs’ or ‘hy­pothe­ses’ is what I call the re­sul­tant prob­a­bil­ity dis­tri­bu­tion and rep­re­sen­ta­tion of pos­si­bil­ities (in Alice’s pro­gram or work tape). This dis­tinc­tion seems ba­sic to rea­son­ing, so I en­dorse pro­gram­ming agents to treat them as two clearly dis­tinct types. But in build­ing such agents, we in­tro­duce the pos­si­bil­ity of Carte­si­anism.

René Descartes held that hu­man minds and brains, al­though able to causally in­ter­act with each other, can each ex­ist in the ab­sence of the other; and, more­over, that the prop­er­ties of purely ma­te­rial things can never fully ex­plain minds. In his honor, we can call a model or pro­ce­dure Carte­sian if it treats the rea­soner as a be­ing sep­a­rated from the phys­i­cal uni­verse. Such a be­ing can per­ceive (and per­haps al­ter) phys­i­cal pro­cesses, but it can’t be iden­ti­fied with any such pro­cess.5

The rele­vance of Carte­si­ans to AGI work is that we can model them as agents ex­pe­rienc­ing a strong type dis­tinc­tion be­tween ‘mind’ and ‘mat­ter’, and an un­shak­able be­lief in the meta­phys­i­cal in­de­pen­dence of those two cat­e­gories; be­cause they’re of such differ­ent kinds, they can vary in­de­pen­dently. So we end up with AI er­rors that are the op­po­site of TALE-SPIN’s — like an in­duc­tion pro­ce­dure that dis­t­in­guishes grav­ity’s type from em­bod­ied char­ac­ters’ types so strongly that it can­not hy­poth­e­size that, say, par­ti­cles un­der­lie or me­di­ate both phe­nom­ena.

My claim is that if we plug in ‘Alice’s sen­sory data’ for ‘mind’ and ‘the stuff Alice hy­poth­e­sizes as caus­ing the sen­sory data’ for ‘mat­ter’, then agents that can only model them­selves us­ing the cy­ber­netic agent model are Carte­sian in the rele­vant sense.6

The model is Carte­sian be­cause the agent and its en­vi­ron­ment can only in­ter­act by com­mu­ni­cat­ing. That is, their only way of af­fect­ing each other is by trad­ing bits printed to tapes.

If we build an ac­tual AI that be­lieves it’s like Alice, it will be­lieve that the en­vi­ron­ment can’t af­fect it in ways that aren’t im­me­di­ately de­tectable, can’t edit its source code, and can’t force it to halt. But that makes the Alice-Everett sys­tem al­most noth­ing like a phys­i­cal agent em­bed­ded in a real en­vi­ron­ment. Un­der many cir­cum­stances, a real AI’s en­vi­ron­ment will al­ter it di­rectly. E.g., the AI can fall into a vol­cano. A vol­cano doesn’t harm the agent by feed­ing un­helpful bits into its en­vi­ron­men­tal sen­sors. It harms the agent by de­stroy­ing it.

A more nat­u­ral­is­tic model would say: Alice out­puts a bit; Everett reads it; and then Everett does what­ever the heck he wants. That might be feed­ing a new bit into Alice. Or it might be van­dal­iz­ing Alice’s work tape, or smash­ing Alice flat.

A robotic Everett tam­per­ing with an agent that mis­tak­enly as­sumes Carte­si­anism. A real-world agent’s com­pu­ta­tional states have phys­i­cal cor­re­lates that can be di­rectly ed­ited by the en­vi­ron­ment. If the agent can’t model such sce­nar­ios, its rea­son­ing (and re­sul­tant de­ci­sion-mak­ing) will suffer.

A still more nat­u­ral­is­tic ap­proach would be to place Alice in­side of Everett, as a sub­sys­tem. In the real world, agents are sur­rounded by their en­vi­ron­ments. The two form a co­he­sive whole, bound by the same phys­i­cal laws, freely in­ter­act­ing and com­min­gling.

If Alice only wor­ries about whether Everett will out­put a 0 or 1 to her sen­sory tape, then no mat­ter how com­plex an un­der­stand­ing Alice has of Everett’s in­ner work­ings, Alice will fun­da­men­tally mi­s­un­der­stand the situ­a­tion she’s in. Alice won’t be able to rep­re­sent hy­pothe­ses about how, for ex­am­ple, a pill might erase her mem­o­ries or oth­er­wise mod­ify her source code.

Hu­mans, in con­trast, can read­ily imag­ine a pill that mod­ifies our mem­o­ries. It seems childishly easy to hy­poth­e­size be­ing changed by av­enues other than per­ceived sen­sory in­for­ma­tion. The limi­ta­tions of the cy­ber­netic agent model aren’t im­me­di­ately ob­vi­ous, be­cause it isn’t easy for us to put our­selves in the shoes of agents with alien blind spots.

There is an agent-en­vi­ron­ment dis­tinc­tion, but it’s a prag­matic and ar­tifi­cial one. The bound­ary be­tween the part of the world we call ‘agent’ and the part we call ‘not-agent’ (= ‘en­vi­ron­ment’) is fre­quently fuzzy and muta­ble. If we want to build an agent that’s ro­bust across many en­vi­ron­ments and self-mod­ifi­ca­tions, we can’t just de­sign a pro­gram that ex­cels at pre­dict­ing sen­sory se­quences gen­er­ated by Tur­ing ma­chines. We need an agent that can form ac­cu­rate be­liefs about the ac­tual world it lives in, in­clud­ing ac­cu­rate be­liefs about its own phys­i­cal un­der­pin­nings.

From Carte­si­anism to naturalism

What would a nat­u­ral­ized self-model, a model of the agent as a pro­cess em­bed­ded in a lawful uni­verse, look like? As a first at­tempt, one might point to the pic­tures of Cai in Build­ing Phenomenolog­i­cal Bridges.

Cai has a sim­ple phys­i­cal model of it­self as a black tile at the cen­ter of a cel­lu­lar au­toma­ton grid. Cai’s phe­nomenolog­i­cal bridge hy­pothe­ses re­late its sen­sory data to sur­round­ing tiles’ states.

But this doesn’t yet spec­ify a non-Carte­sian agent. To treat Cai as a Carte­sian, we could view the tiles sur­round­ing Cai as the work tape of Everett, and the dy­nam­ics of Cai’s en­vi­ron­ment as Everett’s pro­gram. (We can also con­vert Cai’s per­cep­tual ex­pe­riences into a bi­nary se­quence on Alice/​Cai’s in­put tape, with a trans­la­tion like ‘cyan = 01, ma­genta = 10, yel­low = 11’.)

Alice/​Cai as a cy­ber­netic agent in a Tur­ing ma­chine cir­cuit.

The prob­lem isn’t that Cai’s world is Tur­ing-com­putable, of course. It’s that if Cai’s hy­pothe­ses are solely about what sorts of per­cep­tion-cor­re­lated pat­terns of en­vi­ron­men­tal change can oc­cur, then Cai’s mod­els will be Carte­sian.

Cai as a Carte­sian treats its sen­sory ex­pe­riences as though they ex­ist in a sep­a­rate world.

Carte­sian Cai rec­og­nizes that its two uni­verses, its sen­sory ex­pe­riences and hy­poth­e­sized en­vi­ron­ment, can in­ter­act. But it thinks they can only do so via a nar­row range of sta­ble path­ways. No ac­tual agent’s mind-mat­ter con­nec­tions can be that sim­ple and uniform.

If Cai were a robot in a world re­sem­bling its model, it would it­self be a com­plex pat­tern of tiles. To form ac­cu­rate pre­dic­tions, it would need to have self-mod­els and bridge hy­pothe­ses that were more so­phis­ti­cated than any I’ve con­sid­ered so far. Hu­mans are the same way: No bridge hy­poth­e­sis ex­plain­ing the phys­i­cal con­di­tions for sub­jec­tive ex­pe­rience will ever fit on a T-shirt.

Cai’s world di­vided up into a 9x9 grid. Cai is the cen­tral 3x3 grid. Barely visi­ble: Com­plex com­pu­ta­tions like Cai’s rea­son­ing are pos­si­ble in this world be­cause they’re im­ple­mented by even finer tile pat­terns at smaller scales.

Chang­ing Cai’s tiles’ states — from black to white, for ex­am­ple — could have a large im­pact on its com­pu­ta­tions, analo­gous to chang­ing a hu­man brain from solid to gaseous. But if an agent’s hy­pothe­ses are all shaped like the cy­ber­netic agent model, ‘my in­put/​out­put al­gorithm is re­placed by a dust cloud’ won’t be in the hy­poth­e­sis space.

If you pro­grammed some­thing to thinks like Carte­sian Cai, it might de­cide that its se­quence of vi­sual ex­pe­riences will per­sist even if the tiles form­ing its brain com­pletely change state. It wouldn’t be able to en­ter­tain thoughts like ‘if Cai performs self-mod­ifi­ca­tion #381, Cai will ex­pe­rience its en­vi­ron­ment as smells rather than col­ors’ or ‘if Cai falls into a vol­cano, Cai gets de­stroyed’. No pat­tern of per­ceived col­ors is iden­ti­cal to a per­ceived smell, or to the ab­sence of per­cep­tion.

To form nat­u­ral­is­tic self-mod­els and world-mod­els, Cai needs hy­pothe­ses that look less like con­ver­sa­tions be­tween in­de­pen­dent pro­grams, and more like wor­lds in which it is a fairly or­di­nary sub­pro­cess, gov­erned by the same gen­eral pat­terns. It needs to form and priv­ilege phys­i­cal hy­pothe­ses un­der which it has parts, as well as bridge hy­pothe­ses un­der which those parts cor­re­spond in plau­si­ble ways to its high-level com­pu­ta­tional states.

Cai wouldn’t need a com­plete self-model in or­der to rec­og­nize gen­eral facts about its sub­sys­tems. Sup­pose, for in­stance, that Cai has just one sen­sor, on its left side, and a mo­tor on its right side. Cai might rec­og­nize that the mo­tor and sen­sor re­gions of its body cor­re­spond to its in­tro­spectible de­ci­sions and per­cep­tions, re­spec­tively.

A nat­u­ral­ized agent can rec­og­nize that it has phys­i­cal parts with vary­ing func­tions. Cai’s top and bot­tom lack sen­sors and mo­tors al­to­gether, mak­ing it clearer that Cai’s en­vi­ron­ment can im­pact Cai by en­tirely non-sen­sory means.

We care about Cai’s mod­els be­cause we want to use Cai to mod­ify its en­vi­ron­ment. For ex­am­ple, we may want Cai to con­vert as much of its en­vi­ron­ment as pos­si­ble into grey tiles. Our in­ter­est is then in the al­gorithm that re­li­ably out­puts max­i­mally grey­ify­ing ac­tions when handed per­cep­tual data.

If Cai is able to form so­phis­ti­cated self-mod­els, then Cai can rec­og­nize that it’s a grey tile max­i­mizer. Since it wants there to be more grey tiles, it also wants to make sure that it con­tinues to ex­ist, pro­vided it be­lieves that it’s bet­ter than chance at pur­su­ing its goals.

More speci­fi­cally, Nat­u­ral­ized Cai can rec­og­nize that its ac­tions are some black-box func­tion of its per­cep­tual com­pu­ta­tions. Since it has a bridge hy­poth­e­sis link­ing its per­cep­tions to its mid­dle-left tile, it will then rea­son that it should pre­serve its sen­sory hard­ware. Cai’s self-model tells it that if its sen­sor fails, then its ac­tions will be based on be­liefs that are much less cor­re­lated with the en­vi­ron­ment. And its self-model tells it that if its ac­tions are poorly cal­ibrated, then there will be fewer grey tiles in the uni­verse. Which is bad.

A nat­u­ral­is­tic ver­sion of Cai can rea­son in­tel­li­gently from the knowl­edge that its ac­tions (mo­tor out­put) de­pend on a spe­cific part of its body that’s re­spon­si­ble for per­cep­tion (en­vi­ron­men­tal in­put).

A phys­i­cal Cai might need to fore­see sce­nar­ios like ‘an anvil crashes into my head and de­stroys me’, and as­sign prob­a­bil­ity mass to them. Bridge hy­pothe­ses ex­pres­sive enough to con­sider that pos­si­bil­ity would not just re­late ex­pe­riences to en­vi­ron­men­tal or hard­ware states; they would also rec­og­nize that the agent’s ex­pe­riences can be ab­sent al­to­gether.

An anvil can de­stroy Cai’s per­cep­tual hard­ware by crash­ing into it. A Carte­sian might not worry about this even­tu­al­ity, ex­pect­ing its ex­pe­rience to per­sist af­ter its body is smashed. But a nat­u­ral­ized rea­soner will form hy­pothe­ses like the above, on which its se­quence of color ex­pe­riences sud­denly ter­mi­nates when its sen­sors are de­stroyed.

This point gen­er­al­izes to other ways Cai might self-mod­ify, and to other things Cai might al­ter about it­self. For ex­am­ple, Cai might learn that other por­tions of its brain cor­re­spond to its hy­pothe­ses and de­sires.

Another very sim­ple model of how differ­ent phys­i­cal struc­tures are as­so­ci­ated with differ­ent com­pu­ta­tional pat­terns.

This al­lows Cai to rec­og­nize that its goals de­pend on the proper func­tion­ing of many of its hard­ware com­po­nents. If Cai be­lieves that its ac­tions de­pend on its brain’s goal unit’s work­ing a spe­cific way, then it will avoid tak­ing pills that fore­see­ably change its goal unit. If Cai’s causal model tells it that agents like it stop ex­hibit­ing fu­ture-steer­ing be­hav­iors when they self-mod­ify to have mad pri­ors, then it won’t self-mod­ify to ac­quire mad pri­ors. And so on.

If Cai’s mo­tor fails, its effect on the world can change as a re­sult. The same is true if its hard­ware is mod­ified in ways that change its thoughts, or its prefer­ences (i.e., the thing link­ing its con­clu­sions to its mo­tor).

Once Cai rec­og­nizes that its brain needs to work in a very spe­cific way for its goals to be achieved, its prefer­ences can take its phys­i­cal state into ac­count in sen­si­ble ways, with­out our need­ing to hand-code Cai at the out­set to have the right be­liefs or prefer­ences over ev­ery in­di­vi­d­ual thing that could change in its brain.

Just the op­po­site is true for Carte­si­ans. Since they can’t form hy­pothe­ses like ‘my tape heads will stop com­put­ing digits if I dis­assem­ble them’, they can only in­tel­li­gently nav­i­gate such risks if they’ve been hand-coded in ad­vance to avoid per­cep­tual ex­pe­riences the pro­gram­mer thought would cor­re­late with such dan­gers.

In other words, even though all of this is still highly in­for­mal, there’s already some cause to think that a rea­son­ing pat­tern like Nat­u­ral­ized Cai can gen­er­al­ize in ways that Carte­si­ans can’t. The pro­gram­mers don’t need to know ev­ery­thing about Cai’s phys­i­cal state, or an­ti­ci­pate ev­ery­thing about what fu­ture changes Cai might un­dergo, if Cai’s episte­mol­ogy al­lows it to eas­ily form ac­cu­rate re­duc­tive be­liefs and be­have ac­cord­ingly. An agent like this might be adap­tive and self-cor­rect­ing in very novel cir­cum­stances, leav­ing more wig­gle room for pro­gram­mers to make hu­man mis­takes.

Bridg­ing maps of wor­lds and maps of minds

Solomonoff-style du­al­ists have alien blind spots that lead them to ne­glect the pos­si­bil­ity that some hard­ware state is equiv­a­lent to some in­tro­spected com­pu­ta­tion ’000110′. TALE-SPIN-like AIs, on the other hand, have blind spots that lead to mis­takes like try­ing to figure out the an­gu­lar mo­men­tum of ‘000110’.

A nat­u­ral­ized agent doesn’t try to do away with the data/​hy­poth­e­sis type dis­tinc­tion and ac­quire a ty­pol­ogy as sim­ple as TALE-SPIN’s. Rather, it tries to tightly in­ter­con­nect its types us­ing bridges. Nat­u­ral­iz­ing in­duc­tion is about com­bin­ing the du­al­ist’s use­ful map/​ter­ri­tory dis­tinc­tion with a more so­phis­ti­cated meta­phys­i­cal monism than TALE-SPIN ex­hibits, re­sult­ing in a re­duc­tive monist AI.7

Alice’s sim­ple fixed bridge ax­iom, {en­vi­ron­men­tal out­put 0 ↔ per­cep­tual in­put 0, en­vi­ron­men­tal out­put 1 ↔ per­cep­tual in­put 1}, is in­ad­e­quate for phys­i­cally em­bod­ied agents. And the prob­lem isn’t just that Alice lacks other bridge rules and can’t weigh ev­i­dence for or against each one. Bridge hy­pothe­ses are a step in the right di­rec­tion, but they need to be di­verse enough to ex­press a va­ri­ety of cor­re­la­tions be­tween the agent’s sen­sory ex­pe­riences and the phys­i­cal world, and they need a sen­si­ble prior. An agent that only con­sid­ers bridge hy­pothe­ses com­pat­i­ble with the cy­ber­netic agent model will falter when­ever it and the en­vi­ron­ment in­ter­act in ways that look noth­ing like ex­chang­ing sen­sory bits.

With the help of an in­duc­tive al­gorithm that uses bridge hy­pothe­ses to re­late sen­sory data to a con­tin­u­ous phys­i­cal uni­verse, we can avoid mak­ing our AIs Carte­si­ans. This will make their episte­molo­gies much more se­cure. It will also make it pos­si­ble for them to want things to be true about the phys­i­cal uni­verse, not just about the par­tic­u­lar sen­sory ex­pe­riences they en­counter. Ac­tu­ally writ­ing a pro­gram that does all this is an OPFAI. Even for­mal­iz­ing how bridge hy­pothe­ses ought to work in prin­ci­ple is an OPFAI.

In my next post, I’ll move away from toy mod­els and dis­cuss AIXI, Hut­ter’s op­ti­mal­ity defi­ni­tion for cy­ber­netic agents. In ask­ing whether the best Carte­sian can over­come the difficul­ties I’ve de­scribed, we’ll get a clearer sense of why Solomonoff in­duc­tors aren’t re­flec­tive and re­duc­tive enough to pre­dict dras­tic changes to their sense-in­put-to-mo­tor-out­put re­la­tion — and why they can’t be that re­flec­tive and re­duc­tive — and why this mat­ters.


1 Mee­han (1977). Colin Allen first in­tro­duced me to this story. Den­nett dis­cusses it as well.

2 E.g., Du­rand, Much­nik, Ushakov & Vereshcha­gin (2004), Ep­stein & Betke (2011), Legg & Ve­ness (2013), Solomonoff (2011). Hut­ter (2005) uses the term “cy­ber­netic agent model” to em­pha­size the par­allelism be­tween his Tur­ing ma­chine cir­cuit and con­trol the­ory’s cy­ber­netic sys­tems.

3 One sim­ple rep­re­sen­ta­tion would be: Pro­gram Alice to write to her work tape, on round one, 0010 (stand­ing for ‘if I out­put 0, Everett out­puts 0; if I out­put 1, Everett out­puts 0’). Ditto for the other three hy­pothe­ses, 0111, 0011, and 0110. Then write the hy­poth­e­sis’ prob­a­bil­ity in bi­nary (ini­tially 25%, rep­re­sented ’11001′) to the right of each, and pro­gram Alice to edit this num­ber as she re­ceives new ev­i­dence. Since the first and third digit stay the same, we can sim­plify the hy­pothe­ses’ en­cod­ing to 00, 11, 01, 10. In­deed, if the hy­pothe­ses re­main the same over time there’s no rea­son to visi­bly dis­t­in­guish them in the work tape at all, when we can in­stead just pro­gram Alice to use the left-to-right or­der­ing of the four prob­a­bil­ities to dis­t­in­guish the hy­pothe­ses.

4 To the ex­tent our uni­verse perfectly re­sem­bles any math­e­mat­i­cal struc­ture, it’s much more likely to do so at the level of gluons and mesons than at the level of medium-sized dry goods. The re­sem­blance of ap­ples to nat­u­ral num­bers is much more ap­prox­i­mate. Two ap­ples and three ap­ples gen­er­ally make five ap­ples, but when you start cut­ting up or pul­ver­iz­ing or ge­net­i­cally al­ter­ing ap­ples, you may find that other math­e­mat­i­cal mod­els do a su­pe­rior job of pre­dict­ing the ap­ples’ be­hav­ior. It seems likely that the only perfectly gen­eral and faith­ful math­e­mat­i­cal rep­re­sen­ta­tion of ap­ples will be some dras­ti­cally large and un­wieldy physics equa­tion.

Ditto for ma­chines. It’s some­times pos­si­ble to build a phys­i­cal ma­chine that closely mimics a given Tur­ing ma­chine — but only ‘closely’, as Tur­ing ma­chines have un­bound­edly large tapes. And al­though any halt­ing Tur­ing ma­chine can in prin­ci­ple be simu­lated with a bounded tape (Cock­shott & Michael­son (2007)), nearly all Tur­ing ma­chine pro­grams are too large to even be ap­prox­i­mated by any phys­i­cal pro­cess.

All phys­i­cal ma­chines struc­turally re­sem­ble Tur­ing ma­chines in ways that al­low us to draw pro­duc­tive in­fer­ences from the one group to the other. See Pic­cin­ini’s (2011) dis­cus­sion of the phys­i­cal Church-Tur­ing the­sis. But, for all that, the con­crete ma­chine and the ab­stract one re­main dis­tinct.

5 Descartes (1641): “[A]lthough I cer­tainly do pos­sess a body with which I am very closely con­joined; nev­er­the­less, be­cause, on the one hand, I have a clear and dis­tinct idea of my­self, in as far as I am only a think­ing and un­ex­tended thing, and as, on the other hand, I pos­sess a dis­tinct idea of body, in as far as it is only an ex­tended and un­think­ing thing, it is cer­tain that I (that is, my mind, by which I am what I am) am en­tirely and truly dis­tinct from my body, and may ex­ist with­out it.”

From this it’s clear that Descartes also be­lieved that the mind can ex­ist with­out the body. This in­ter­est­ingly par­allels the anvil prob­lem, which I’ll dis­cuss more in my next post. How­ever, I don’t build im­mor­tal­ity into my defi­ni­tion of ‘Carte­si­anism’. Not all agents that act as though there is a Carte­sian bar­rier be­tween their thoughts and the world think that their ex­pe­riences are fu­ture-eter­nal. I’m tak­ing care not to con­flate Carte­si­anism with the anvil prob­lem be­cause the for­mal­ism I’ll dis­cuss next time, AIXI, does face both of them. Though the prob­lems are log­i­cally dis­tinct, it’s true that a nat­u­ral­ized rea­son­ing method would be much less likely to face the anvil prob­lem.

6 This isn’t to say that a Solomonoff in­duc­tor would need to be con­scious in any­thing like the way hu­mans are con­scious. It can be fruit­ful to point to similar­i­ties be­tween the rea­son­ing pat­terns of hu­mans and un­con­scious pro­cesses. In­deed, this already hap­pens when we speak of un­con­scious men­tal pro­cesses within hu­mans.

Part­ing ways with Descartes (cf. Kirk (2012)), many pre­sent-day du­al­ists would in fact go even fur­ther than re­duc­tion­ists in al­low­ing for struc­tural similar­i­ties be­tween con­scious and un­con­scious pro­cesses, treat­ing all cog­ni­tive or func­tional men­tal states as (in the­ory) re­al­iz­able with­out con­scious­ness. E.g., Chalmers (1996): “Although con­scious­ness is a fea­ture of the world that we would not pre­dict from the phys­i­cal facts, the things we say about con­scious­ness are a gar­den-va­ri­ety cog­ni­tive phe­nomenon. Some­body who knew enough about cog­ni­tive struc­ture would im­me­di­ately be able to pre­dict the like­li­hood of ut­ter­ances such as ‘I feel con­scious, in a way that no phys­i­cal ob­ject could be,’ or even Descartes’s ‘Cog­ito ergo sum.’ In prin­ci­ple, some re­duc­tive ex­pla­na­tion in terms of in­ter­nal pro­cesses should ren­der claims about con­scious­ness no more deeply sur­pris­ing than any other as­pect of be­hav­ior.”

7 And since we hap­pen to live in a world made of physics, the kind of monist we want in prac­tice is a re­duc­tive phys­i­cal­ist AI. We want a ‘phys­i­cal­ist’ as op­posed to a re­duc­tive monist that thinks ev­ery­thing is made of mon­ads, or ab­stract ob­jects, or moral­ity fluid, or what-have-you.


∙ Chalmers (1996). The Con­scious Mind: In Search of a Fun­da­men­tal The­ory. Oxford Univer­sity Press.

∙ Cock­shott & Michael­son (2007). Are there new mod­els of com­pu­ta­tion? Re­ply to Weg­ner and Eber­bach. The Com­puter Jour­nal, 50: 232-247.

∙ Descartes (1641). Med­i­ta­tions on first philos­o­phy, in which the ex­is­tence of God and the im­mor­tal­ity of the soul are demon­strated.

∙ Du­rand, Much­nik, Ushakov & Vereshcha­gin (2004). Ecolog­i­cal Tur­ing ma­chines. Lec­ture Notes in Com­puter Science, 3142: 457-468.

∙ Ep­stein & Betke (2011). An in­for­ma­tion-the­o­retic rep­re­sen­ta­tion of agent dy­nam­ics as set in­ter­sec­tions. Lec­ture Notes in Com­puter Science, 6830: 72-81.

∙ Hut­ter (2005). Univer­sal Ar­tifi­cial In­tel­li­gence: Se­quence De­ci­sions Based on Al­gorith­mic Prob­a­bil­ity. Springer.

∙ Kirk (2012). Zom­bies. In Zalta (ed.), The Stan­ford En­cy­clo­pe­dia of Philos­o­phy.

∙ Legg & Ve­ness (2013). An ap­prox­i­ma­tion of the Univer­sal In­tel­li­gence Mea­sure. Lec­ture Notes in Com­puter Science, 7070: 236-249.

∙ Mee­han (1977). TALE-SPIN, an in­ter­ac­tive pro­gram that writes sto­ries. Pro­ceed­ings of the 5th In­ter­na­tional Joint Con­fer­ence on Ar­tifi­cial In­tel­li­gence: 91-98.

∙ Pic­cin­ini (2011). The phys­i­cal Church-Tur­ing the­sis: Modest or bold? Bri­tish Jour­nal for the Philos­o­phy of Science, 62: 733-769.

∙ Rus­sell & Norvig (2010). Ar­tifi­cial In­tel­li­gence: A Modern Ap­proach. Pren­tice Hall.

∙ Solomonoff (2011). Al­gorith­mic prob­a­bil­ity — its dis­cov­ery — its prop­er­ties and ap­pli­ca­tion to Strong AI. In Ze­nil (ed.), Ran­dom­ness Through Com­pu­ta­tion: Some An­swers, More Ques­tions (pp. 149-157).