“embedded self-justification,” or something like that

Link post


Some­times I won­der what the MIRI-type crowd thinks about some is­sue re­lated to their in­ter­ests. So I go to al­ign­ment­fo­rum.org, and quickly get in over my head, lost in a labyrinth of is­sues I only half un­der­stand.

I can never tell whether they’ve never thought about the things I’m think­ing about, or whether they sped past them years ago. They do seem very smart, that’s for sure.

But if they have terms for what I’m think­ing of, I lack the abil­ity to find those terms among the twists of their mir­rored hal­lways. So I go to tum­blr.com, and just start typ­ing.

parable (1/​3)

You’re an “agent” try­ing to take good ac­tions over time in a phys­i­cal en­vi­ron­ment un­der re­source con­straints. You know, the usual.

You cur­rently spend a lot of re­sources do­ing a par­tic­u­lar com­pu­ta­tion in­volved in your de­ci­sion pro­ce­dure. Your best known al­gorithm for it is O(N^n) for some n.

You’ve worked on the de­sign of de­ci­sion al­gorithms be­fore, and you think this could per­haps be im­proved. But to find it, you’d have to shift re­sources some away from run­ning the al­gorithm for a time, putting them into de­ci­sion al­gorithm de­sign in­stead.

You do this. Al­most im­me­di­ately, you dis­cover an O(N^(n-1)) al­gorithm. Given the large N you face, this will dra­mat­i­cally im­prove all your fu­ture de­ci­sions.

Clearly (…“clearly”?), the choice to in­vest more in al­gorithm de­sign was a good one.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

parable (2/​3)

Oh, you’re so very clever! By now you’ve re­al­ized you need, above and be­yond your reg­u­lar de­ci­sion pro­ce­dure to guide your ac­tions in the out­side world, a “meta-de­ci­sion-pro­ce­dure” to guide your own de­ci­sion-pro­ce­dure-im­prove­ment efforts.

Your meta-de­ci­sion-pro­ce­dure does re­quire its own re­source over­head, but in ex­change it tells you when and where to spend re­sources on R&D. All your al­gorithms are faster now. Your de­ci­sions are bet­ter, their guid­ing ap­prox­i­ma­tions less lossy.

All this, from a meta-de­ci­sion-pro­ce­dure that’s only a first draft. You frown over the re­source over­head it charges, and won­der whether it could be im­proved.

You try shift­ing some re­sources away from “reg­u­lar de­ci­sion pro­ce­dure de­sign” into “meta-de­ci­sion-pro­ce­dure-de­sign.” Al­most im­me­di­ately, you come up with a faster and bet­ter pro­ce­dure.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

parable (3/​3)

Oh, you’re so very clever! By now you’ve re­al­ized you need, above and be­yond your meta-meta-meta-de­ci­sion-pro­ce­dure, a “meta-meta-meta-meta-de­ci­sion-pro­ce­dure” to guide your meta-meta-meta-de­ci­sion-pro­ce­dure-im­prove­ment efforts.

Way down on the ob­ject level, you have not moved for a very long time, ex­cept to oc­ca­sion­ally up­date your meta-meta-meta-meta-ra­tio­nal­ity blog.

Way down on the ob­ject level, a dumb and fast preda­tor eats you.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

the boundary

You’re an “agent” try­ing to take good ac­tions, et cetera. Your ac­tions are guided by some sort of over­all “model” of how things are.

There are, in­evitably, two parts to your model: the in­te­rior and the bound­ary.

The in­te­rior is ev­ery­thing you treat as fair game for iter­a­tive and re­flec­tive im­prove­ment. For “op­ti­miza­tion,” if you want to put it that way. Facts in the in­te­rior are sub­ject to ra­tio­nal scrutiny; pro­ce­dures in the in­te­rior have been judged and se­lected for their qual­ity, us­ing some fur­ther pro­ce­dure.

The bound­ary is the out­most shell, where re­source con­straints force the regress to stop. Per­haps you have a tar­get and an op­ti­miza­tion pro­ce­dure. If you haven’t tested the op­ti­miza­tion pro­ce­dure against al­ter­na­tives, it’s in your bound­ary. If you have, but you haven’t tested your op­ti­miza­tion-pro­ce­dure-test­ing-pro­ce­dure against al­ter­na­tives, then it’s in your bound­ary. Et cetera.

You are a busi­ness. You do ret­ro­spec­tives on your pro­jects. You’re so very clever, in fact, that you do ret­ro­spec­tives on your ret­ro­spec­tive pro­cess, to im­prove it over time. But how do you im­prove these retro-ret­ros? You don’t. They’re in your bound­ary.

Of ev­ery­thing you know and do, you trust the bound­ary the least. You have ap­plied less scrutiny to it than any­thing else. You sus­pect it may be shame­fully sub­op­ti­mal, just like the pre­vi­ous bound­ary, be­fore you pushed it into the in­te­rior.

em­bed­ded self-justification

You would like to look back on the re­sources you spend – each sec­ond, each joule – and say, “I spent it the right way.” You would like to say, “I have a the­ory of what it means to de­cide well, and I ap­plied it, and so I de­cided well.”

Why did you spend it as you did, then? You can­not an­swer, ever, with­out your an­swer in­vok­ing some­thing on the bound­ary.

How did you spent that sec­ond? On look­ing for a faster al­gorithm. Why? Be­cause your R&D al­lo­ca­tion pro­ce­dure told you to. Why fol­low that pro­ce­dure? Be­cause it’s done bet­ter than oth­ers in the past. How do you know? Be­cause you’ve com­pared it to oth­ers. Which oth­ers? Un­der what as­sump­tions? Oh, your pro­ce­dure-ex­per­i­men­ta­tion pro­ce­dure told you. And how do you know it works? Even­tu­ally you come to the bound­ary, and throw up your hands: “I’m do­ing the best I can, okay!”

If you lived in a sim­ple and trans­par­ent world, maybe you could just find the op­ti­mal policy once and for all. If you re­ally were liter­ally the ban­dit among the slot ma­chines – and you knew this, perfectly, with cre­dence 1 – maybe you could solve for the op­ti­mal ex­plore/​ex­ploit be­hav­ior and then do it.

But your world isn’t like that. You know this, and know that you know it. Even if you could ob­tain a perfect model of your world and be­ings like you, you wouldn’t be able to fit it in­side your own head, much less run it fast enough to be use­ful. (If you had a magic amulet, you might be able to fit your­self in­side your own head, but you live in re­al­ity.)

In­stead, you have de­tailed pic­tures of spe­cific frag­ments of the world, in the in­te­rior and sub­ject to con­tin­u­ous re­fine­ment. And then you have pic­tures of the pic­ture-mak­ing pro­cess, and so on. As you go fur­ther out, the pic­tures get coarser and sim­pler, be­cause their do­main of de­scrip­tion be­comes ever vaster, while your re­sources re­main finite, and you must nour­ish each level with a por­tion of those re­sources be­fore the level above it even be­comes think­able.

At the end, at the bound­ary, you have the coars­est pic­ture, a sort of car­toon. There is a smil­ing stick figure, per­haps wear­ing a lab coat to in­di­cate sci­en­tific-ra­tio­nal val­ues. It reaches for the lever of a slot ma­chine, la­beled “ac­tion,” while peer­ing into a sketch of an os­cillo­scope, la­beled “ob­ser­va­tions.” A sin­gle ar­row curls around, point­ing from the di­a­gram back into the di­a­gram. It is la­beled “op­ti­miza­tion,” and dec­o­rated with cute lit­tle sparkles and hearts, to con­vey its won­der­ful­ness. The mar­gins of the page are lit­tered with equa­tions, de­scribing the lit­tlest of toy mod­els: ban­dit prob­lems, Dutch book sce­nar­ios, Nash equil­ibria un­der perfect in­for­ma­tion.

In the in­te­rior, there are much richer, more beau­tiful pic­tures that are oth­er­wise a lot like this one. In the in­te­rior, meta-learn­ing al­gorithms buzz away on a GPU, us­ing the lat­est and great­est pro­ce­dures for find­ing pro­ce­dures, jus­tified in pre­cise terms in your lat­est pa­per. You ges­ture at a white­board as you pri­ori­tize op­tions for im­prov­ing the al­gorithms. Your pri­ori­ti­za­tion frame­work has gone through rigor­ous test­ing.

Why, in the end, do you do all of it? Be­cause you are the lit­tle stick figure in the lab coat.


What am I try­ing to get at, here?

Oc­ca­sion­ally peo­ple talk about the rele­vance of com­pu­ta­tional com­plex­ity is­sues to AI and its limits. Gw­ern has a good page on why these con­cerns can’t place use­ful bounds on the po­ten­tial of ma­chine in­tel­li­gence in the way peo­ple some­times ar­gue they do.

Yet, some­how I feel an un­scratched itch when I read ar­gu­ments like Gw­ern’s there. They an­swer the ques­tion I think I’m ask­ing when I seek them out, but at the end I feel like I re­ally meant to ask some other ques­tion in­stead.

Given com­pu­ta­tional con­straints, how “su­per­hu­man” could an AI be? Well, it could just do what we do, but sped up – that is, it could have the same re­source effi­ciency but more re­sources per unit time. That’s enough to be scary. It could also find more effi­cient al­gorithms and pro­ce­dures, just as we do in our own re­search – but it would find them ever faster, more effi­ciently.

What re­mains unan­swered, though, is whether there is any use­ful way of talk­ing about do­ing this (the whole thing, in­clud­ing the self-im­prove­ment R&D) well, do­ing it ra­tio­nally, as op­posed to do­ing it in a way that sim­ply “seems to work” af­ter the fact.

How would an AI’s own policy for in­vest­ment in self-im­prove­ment com­pare to our own (to yours, to your so­ciety’s)? Could we look at it and say, “this is bet­ter”? Could the AI do so? Is there any­thing bet­ter than sim­ply bum­bling around in con­cept-space, in a man­ner that per­haps has many in­ter­nal struc­tures of self-jus­tifi­ca­tion but is not known to work as a whole? Is there such a thing as (ap­prox­i­mate) knowl­edge about the right way to do all of it that is still small enough to fit in­side the agent on which it passes judg­ment?

Can you rep­re­sent your over­all policy, your out­er­most strat­egy-over-strate­gies con­sid­ered a re­sponse to your en­tire situ­a­tion, in a way that is not a car­toon, a way real enough to defend it­self?

What is re­ally known about the best way to spend the next unit of re­sources? I mean, known at the level of the re­source-spenders, not as a mat­ter of ex­ter­nal judg­ment? Can any­thing definite be said about the topic in gen­eral ex­cept “it is pos­si­ble to do bet­ter or worse, and it is prob­a­bly pos­si­ble to do bet­ter than we do now?” If not, what stan­dard of ra­tio­nal­ity do we have left to ap­ply be­yond toy mod­els, to our­selves or our suc­ces­sors?