Do Sufficiently Advanced Agents Use Logic?

This is a con­tinu­a­tion of a dis­cus­sion with Vanessa from the MIRIxDis­cord group. I’ll make some com­ments on things Vanessa has said, but those should not be con­sid­ered a sum­mary of the dis­cus­sion so far. My com­ments here are also in­formed by dis­cus­sion with Sam.

1: Logic as Proxy

1a: The Role of Prediction

Vanessa has said that pre­dic­tive ac­cu­racy is suffi­cient; con­sid­er­a­tion of logic is not needed to judge (par­tial) mod­els. A hy­poth­e­sis should ul­ti­mately ground out to per­cep­tual in­for­ma­tion. So why is there any need to con­sider other sorts of “pre­dic­tions” it can make? (IE, why should we think of it as pos­sess­ing in­ter­nal propo­si­tions which have a logic of their own?)

But similarly, why should agents use pre­dic­tive ac­cu­racy to learn? What’s the ar­gu­ment for it? Ul­ti­mately, pre­dict­ing per­cep­tions ahead of time should only be in ser­vice of achiev­ing higher re­ward.

We could in­stead learn from re­ward feed­back alone. A (par­tial) “hy­poth­e­sis” would re­ally be a (par­tial) strat­egy, helping us to gen­er­ate ac­tions. We would judge strate­gies on (some­thing like) av­er­age re­ward achieved, not even try­ing to pre­dict pre­cise re­ward sig­nals. The agent still re­ceives in­com­ing per­cep­tual in­for­ma­tion, and strate­gies can use it to up­date in­ter­nal states and to in­form ac­tions. How­ever, strate­gies are not asked to pro­duce any pre­dic­tions. (The frame­work I’m de­scribing is, of course, model-free RL.)

In­tu­itively, it seems as if this is miss­ing some­thing. A model-based agent can learn a lot about the world just by watch­ing, tak­ing no ac­tions. How­ever, in­di­vi­d­ual strate­gies can im­ple­ment pre­dic­tion-based learn­ing within them­selves. So, it seems difficult to say what benefit model-based RL pro­vides be­yond model-free RL, be­sides a bet­ter prior over strate­gies.

It might be that we can’t say any­thing recom­mend­ing model-based learn­ing over model free in a stan­dard bounded-re­gret frame­work. (I ac­tu­ally haven’t thought about it much—but the ar­gu­ment that model-free strate­gies can im­ple­ment mod­els in­ter­nally seems po­ten­tially strong. Per­haps you just can’t get much in the AIXI frame­work be­cause there are no good loss bounds in that frame­work at all, as Vanessa men­tions.) How­ever, if so, this seems like a weak­ness of stan­dard bounded-re­gret frame­works. Pre­dict­ing the world seems to be a sig­nifi­cant as­pect of in­tel­li­gence; we should be able to talk about this for­mally some­how.

Granted, it doesn’t make sense for bounded agents to pur­sue pre­dic­tive ac­cu­racy above all else. There is a com­pu­ta­tional trade-off, and you don’t need to pre­dict some­thing which isn’t im­por­tant. My claim is some­thing like, you should try and pre­dict when you don’t yet have an effec­tive strat­egy. After you have an effec­tive strat­egy, you don’t re­ally need to gen­er­ate pre­dic­tions. Be­fore that, you need to gen­er­ate pre­dic­tions be­cause you’re still grap­pling with the world, try­ing to un­der­stand what’s ba­si­cally go­ing on.

If we’re try­ing to un­der­stand in­tel­li­gence, the idea that model-free learn­ers can in­ter­nally man­age these trade-offs (by choos­ing strate­gies which ju­di­ciously choose to learn from pre­dic­tions when it is effi­ca­cious to do so) seems less satis­fy­ing than a proper the­ory of learn­ing from pre­dic­tion. What is fun­da­men­tal vs non-fun­da­men­tal to in­tel­li­gence can get fuzzy, but learn­ing from pre­dic­tion seems like some­thing we ex­pect any suffi­ciently in­tel­li­gent agent to do (whether it was built-in or learned be­hav­ior).

On the other hand, judg­ing hy­pothe­ses on their pre­dic­tive ac­cu­racy is kind of a weird thing to do if what you ul­ti­mately want a hy­poth­e­sis to do for you is gen­er­ate ac­tions. It’s like this: You’ve got two tasks; task A, and task B. Task A is what you re­ally care about, but it might be quite difficult to tackle on its own. Task B is re­ally very differ­ent from task A, but you can get a lot of feed­back on task B. So you ask your hy­pothe­ses to com­pete on task B, and judge them on that in ad­di­tion to task A. Some­how you’re ex­pect­ing to get a lot of in­for­ma­tion about task A from perfor­mance on task B. And in­deed, it seems you do: pre­dic­tive ac­cu­racy of a hy­poth­e­sis is some­how a use­ful proxy for effi­cacy of that hy­poth­e­sis in guid­ing ac­tion.

(It should also be noted that a re­ward-learn­ing frame­work pre­sumes we get feed­back about util­ity at all. If we get no feed­back about re­ward, then we’re forced to only judge hy­pothe­ses by pre­dic­tions, and make what in­fer­ences about util­ity we will. A dire situ­a­tion for learn­ing the­ory, but a situ­a­tion where we can still talk about ra­tio­nal agency more gen­er­ally.)

1b: The Anal­ogy to Logic

My ar­gu­ment is go­ing to be that if achiev­ing high re­ward is task A, and pre­dict­ing per­cep­tion is task B, logic can be task C. Like task B, it is very differ­ent from task A. Like task B, it nonethe­less pro­vides use­ful in­for­ma­tion. Like task B, it seems to me that a the­ory of (bound­edly) ra­tio­nal agency is miss­ing some­thing with­out it.

The ba­sic pic­ture is this. Per­cep­tual pre­dic­tion pro­vides a lot of good feed­back about the qual­ity of cog­ni­tive al­gorithms. But if you re­ally want to train up some good cog­ni­tive al­gorithms for your­self, it is helpful to do some imag­i­na­tive play on the side.

One way to vi­su­al­ize this is an agent mak­ing up math puz­zles in or­der to strengthen its rea­son­ing skills. This might sug­gest a pic­ture where the puz­zles are always well-defined (ter­mi­nat­ing) com­pu­ta­tions. How­ever, there’s no spe­cial di­vid­ing line be­tween de­cid­able and un­de­cid­able prob­lems—any par­tic­u­lar re­stric­tion to a de­cid­able class might rule out some in­ter­est­ing (de­cid­able but non-ob­vi­ously so) stuff which we could learn from. So we might end up just go­ing with any com­pu­ta­tions (halt­ing or no).

Similarly, we might not re­strict our­selves to en­tirely well-defined propo­si­tions. It makes a lot of sense to test cog­ni­tive heuris­tics on sce­nar­ios closer to life.

Why do I think suffi­ciently ad­vanced agents are likely to do this?

Well, just as it seems im­por­tant that we can learn a whole lot from pre­dic­tion be­fore we ever take an ac­tion in a given type of situ­a­tion, it seems im­por­tant that we can learn a whole lot by rea­son­ing be­fore we even ob­serve that situ­a­tion. I’m not for­mu­lat­ing a pre­cise learn­ing-the­o­retic con­jec­ture, but in­tu­itively, it is re­lated to whether we could rea­son­ably ex­pect the agent to get some­thing right on the first try. Good per­cep­tual pre­dic­tion alone does not guaran­tee that we can cor­rectly an­ti­ci­pate the effects of ac­tions we have never tried be­fore, but if I see an agent gen­er­ate an effec­tive strat­egy in a situ­a­tion it has never in­ter­vened in be­fore (but has had op­por­tu­nity to ob­serve), I ex­pect that in­ter­nally it is learn­ing from per­cep­tion at some level (even if it is model-free in over­all ar­chi­tec­ture). Similarly, if I see an agent quickly pick up a rea­son­ing-heavy game like chess, then I sus­pect it of learn­ing from hy­po­thet­i­cal simu­la­tions at some level.

Again, “on the first try” is not sup­posed to be a for­mal learn­ing-the­o­retic re­quire­ment; I re­al­ize you can’t ex­actly ex­pect any­thing to work on the first try with learn­ing agents. What I’m get­ting at has some­thing to do with gen­er­al­iza­tion.

2: Learn­ing-The­o­retic Criteria

Part of the frame has been learn­ing-the­ory-vs-logic. One might in­ter­pret my clos­ing re­marks from the pre­vi­ous sec­tion that way; I don’t know how to for­mu­late my in­tu­ition learn­ing-the­o­ret­i­cally, but I ex­pect that rea­son­ing helps agents in par­tic­u­lar situ­a­tions. It may be that the phe­nom­ena of the pre­vi­ous sec­tion can­not be un­der­stood learn­ing-the­o­ret­i­cally, and only amount to a “bet­ter prior over strate­gies” as I men­tioned. How­ever, I don’t want it to be a learn­ing-the­ory-vs-logic ar­gu­ment. I would hope that some­thing learn­ing-the­o­retic can be said in fa­vor of learn­ing from per­cep­tion, and in fa­vor of learn­ing from logic. Even if it can’t, learn­ing the­ory is still an im­por­tant com­po­nent here, re­gard­less of the im­por­tance of logic.

I’ll try to say some­thing about how I think learn­ing the­ory should in­ter­face with logic.

Vanessa said some rele­vant things in a com­ment, which I’ll quote in full:

Hetero­dox opinion: I think the en­tire MIRIesque (and aca­demic philos­o­phy) ap­proach to de­ci­sion the­ory is con­fused. The ba­sic as­sump­tion seems to be, that we can de­cou­ple the prob­lem of learn­ing a model of the world from the prob­lem of tak­ing a de­ci­sion given such a model. We then ig­nore the first prob­lem, and as­sume a par­tic­u­lar shape for the model (for ex­am­ple, causal net­work) which al­lows us to con­sider de­ci­sion the­o­ries such as CDT, EDT etc. How­ever, in re­al­ity the two prob­lems can­not be de­cou­pled. This is be­cause the type sig­na­ture of a world model is only mean­ingful if it comes with an al­gorithm for how to learn a model of this type.
For ex­am­ple, con­sider New­comb’s para­dox. The agent makes a de­ci­sion un­der the as­sump­tion that Omega be­haves in a cer­tain way. But, where did the as­sump­tion come from? Real­is­tic agents have to learn ev­ery­thing they know. Learn­ing nor­mally re­quires a time se­quence. For ex­am­ple, we can con­sider the iter­ated New­comb’s para­dox (INP). In INP, any re­in­force­ment learn­ing (RL) al­gorithm will con­verge to one-box­ing, sim­ply be­cause one-box­ing gives it the money. This is de­spite RL naively look­ing like CDT. Why does it hap­pen? Be­cause in the learned model, the “causal” re­la­tion­ships are not phys­i­cal causal­ity. The agent comes to be­lieve that tak­ing the one box causes the money to ap­pear there.
In New­comb’s para­dox EDT suc­ceeds but CDT fails. Let’s con­sider an ex­am­ple where CDT suc­ceeds and EDT fails: the XOR black­mail. The iter­ated ver­sion would be IXB. In IXB, clas­si­cal RL doesn’t guaran­tee much be­cause the en­vi­ron­ment is more com­plex than the agent (it con­tains Omega). To over­come this, we can use RL with in­com­plete mod­els. I be­lieve that this in­deed solves both INP and IXB.
Then we can con­sider e.g. coun­ter­fac­tual mug­ging. In coun­ter­fac­tual mug­ging, RL with in­com­plete mod­els doesn’t work. That’s be­cause the as­sump­tion that Omega re­sponds in a way that de­pends on a coun­ter­fac­tual world is not in the space of mod­els at all. In­deed, it’s un­clear how can any agent learn such a fact from em­piri­cal ob­ser­va­tions. One way to fix it is by al­low­ing the agent to pre­com­mit. Then the as­sump­tion about Omega be­comes em­piri­cally ver­ifi­able. But, if we do this, then RL with in­com­plete mod­els can solve the prob­lem again.
The only class of prob­lems that I’m gen­uinely un­sure how to deal with is game-the­o­retic su­per­ra­tional­ity. How­ever, I also don’t see much ev­i­dence the MIRIesque ap­proach has suc­ceeded on that front. We prob­a­bly need to start with just solv­ing the grain of truth prob­lem in the sense of con­verg­ing to or­di­nary Nash (or similar) equil­ibria (which might be pos­si­ble us­ing in­com­plete mod­els). Later we can con­sider agents that ob­serve each other’s source code, and maybe some­thing along the lines of this can ap­ply.

Be­sides the MIRI-vs-learn­ing frame, I agree with a lot of this. I wrote a com­ment el­se­where mak­ing some re­lated points about the need for a learn­ing-the­o­retic ap­proach. Some of the points also re­late to my CDT=EDT se­quence; I have been ar­gu­ing that CDT and EDT don’t be­have as peo­ple broadly imag­ine (of­ten not hav­ing the bad be­hav­ior which peo­ple broadly imag­ine). Some of those ar­gu­ments were learn­ing-the­o­retic while oth­ers were not, but the con­clu­sions were similar ei­ther way.

In any case, I think the fol­low­ing crite­rion (origi­nally men­tioned to me by Jack Gal­lagher) makes sense:

A de­ci­sion prob­lem should be con­ceived as a se­quence, but the al­gorithm de­cid­ing what to do on a par­tic­u­lar el­e­ment of the se­quence should not know/​care what the whole se­quence is.

Asymp­totic de­ci­sion the­ory was the first ma­jor pro­posal to con­ceive of de­ci­sion prob­lems as se­quences in this way. De­ci­sion-prob­lem-as-se­quence al­lows de­ci­sion the­ory to be ad­dressed learn­ing-the­o­ret­i­cally; we can’t ex­pect a learn­ing agent to nec­es­sar­ily do well in any par­tic­u­lar case (be­cause it could have a suffi­ciently poor prior, and so still be learn­ing in that par­tic­u­lar case), but we can ex­pect it to even­tu­ally perform well (pro­vided the prob­lem meets some “fair­ness” con­di­tions which make it learn­able).

As for the sec­ond part of the crite­rion, re­quiring that the agent is ig­no­rant of the over­all se­quence when de­cid­ing what to do on an in­stance: this cap­tures the idea of learn­ing from logic. Pro­vid­ing the agent with the se­quence is cheat­ing, be­cause you’re es­sen­tially giv­ing the agent your in­ter­pre­ta­tion of the situ­a­tion.

Jack men­tioned this crite­rion to me in a dis­cus­sion of av­er­ag­ing de­ci­sion the­ory (AvDT), in or­der to ex­plain why AvDT was cheat­ing.

AvDT is based on a fairly sim­ple idea: look at the av­er­age perfor­mance of a strat­egy so far, rather than its ex­pected perfor­mance on this par­tic­u­lar prob­lem. Un­for­tu­nately, “perfor­mance so far” re­quires things to be defined in terms of a train­ing se­quence (counter to the log­i­cal-in­duc­tion philos­o­phy of non-se­quen­tial learn­ing).

I cre­ated AvDT to try and ad­dress some short­com­ings of asymp­totic de­ci­sion the­ory (let’s call it AsDT). Speci­fi­cally, AsDT does not do well in coun­ter­log­i­cal mug­ging. AvDT is ca­pa­ble of do­ing well in coun­ter­fac­tual mug­ging. How­ever, it de­pends on the train­ing se­quence. Coun­ter­log­i­cal mug­ging re­quires the agent to de­cide on the “prob­a­bil­ity” of Omega ask­ing for money vs pay­ing up, to figure out whether par­ti­ci­pa­tion is worth it over­all. AvDT solves this prob­lem by look­ing at the train­ing se­quence to see how of­ten Omega pays up. So, the prob­lem of do­ing well in de­ci­sion prob­lems is “re­duced” to spec­i­fy­ing good train­ing se­quences. This (1) doesn’t ob­vi­ously make things eas­ier, and (2) puts the work on the hu­man train­ers.

Jack is say­ing that the sys­tem should be look­ing through logic on its own to find analo­gous sce­nar­ios to gen­er­al­ize from. When judg­ing whether a sys­tem gets coun­ter­log­i­cal mug­ging right, we have to define coun­ter­log­i­cal mug­ging as a se­quence to en­able learn­ing-the­o­retic anal­y­sis; but the agent has to figure things out on its own.

This is a some­what sub­tle point. A re­al­is­tic agent ex­pe­riences the world se­quen­tially, and learns by treat­ing its his­tory as a train­ing se­quence of sorts. This is phys­i­cal time. I have no prob­lem with this. What I’m say­ing is that if an agent is also learn­ing from analo­gous cir­cum­stances within logic, as I sug­gested so­phis­ti­cated agents will do in the first part, then Jack’s con­di­tion should come into play. We aren’t handed, from on high, a se­quence of log­i­cally defined sce­nar­ios which we can lo­cate our­selves within. We only have reg­u­lar phys­i­cal time, plus a bunch of hy­po­thet­i­cal sce­nar­ios which we can define and whose rele­vance we have to de­ter­mine.

This gets back to my ear­lier in­tu­ition about agents hav­ing a rea­son­able chance of get­ting cer­tain things right on the first try. Learn­ing-the­o­retic agents don’t get things right on the first try. How­ever, agents who learn from logic have “lots of tries” be­fore their first real try in phys­i­cal time. If you can suc­cess­fully de­ter­mine which log­i­cal sce­nar­ios are rele­vantly analo­gous to your own, you can learn what to do just by think­ing. (Of course, you still need a lot of phys­i­cal-time learn­ing to know enough about your situ­a­tion to do that!)

So, get­ting back to Vanessa’s point in the com­ment I quoted: can we solve MIRI-style de­ci­sion prob­lems by con­sid­er­ing the iter­ated prob­lem, rather than the sin­gle-shot ver­sion? To a large ex­tent, I think so: in log­i­cal time, all games are iter­ated games. How­ever, I don’t want to have to set an agent up with a train­ing se­quence in which it en­coun­ters those spe­cific prob­lems many times. For ex­am­ple, find­ing good strate­gies in chess via self-play should come nat­u­rally from the way the agent thinks about the world, rather than be­ing an ex­plicit train­ing regime which the de­signer has to im­ple­ment. Once the rules for chess are un­der­stood, the bot­tle­neck should be think­ing time rather than (phys­i­cal) train­ing in­stances.