# “embedded self-justification,” or something like that

preamble

Some­times I won­der what the MIRI-type crowd thinks about some is­sue re­lated to their in­ter­ests. So I go to al­ign­ment­fo­rum.org, and quickly get in over my head, lost in a labyrinth of is­sues I only half un­der­stand.

I can never tell whether they’ve never thought about the things I’m think­ing about, or whether they sped past them years ago. They do seem very smart, that’s for sure.

But if they have terms for what I’m think­ing of, I lack the abil­ity to find those terms among the twists of their mir­rored hal­lways. So I go to tum­blr.com, and just start typ­ing.

parable (1/​3)

You’re an “agent” try­ing to take good ac­tions over time in a phys­i­cal en­vi­ron­ment un­der re­source con­straints. You know, the usual.

You cur­rently spend a lot of re­sources do­ing a par­tic­u­lar com­pu­ta­tion in­volved in your de­ci­sion pro­ce­dure. Your best known al­gorithm for it is O(N^n) for some n.

You’ve worked on the de­sign of de­ci­sion al­gorithms be­fore, and you think this could per­haps be im­proved. But to find it, you’d have to shift re­sources some away from run­ning the al­gorithm for a time, putting them into de­ci­sion al­gorithm de­sign in­stead.

You do this. Al­most im­me­di­ately, you dis­cover an O(N^(n-1)) al­gorithm. Given the large N you face, this will dra­mat­i­cally im­prove all your fu­ture de­ci­sions.

Clearly (…“clearly”?), the choice to in­vest more in al­gorithm de­sign was a good one.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

parable (2/​3)

Oh, you’re so very clever! By now you’ve re­al­ized you need, above and be­yond your reg­u­lar de­ci­sion pro­ce­dure to guide your ac­tions in the out­side world, a “meta-de­ci­sion-pro­ce­dure” to guide your own de­ci­sion-pro­ce­dure-im­prove­ment efforts.

Your meta-de­ci­sion-pro­ce­dure does re­quire its own re­source over­head, but in ex­change it tells you when and where to spend re­sources on R&D. All your al­gorithms are faster now. Your de­ci­sions are bet­ter, their guid­ing ap­prox­i­ma­tions less lossy.

All this, from a meta-de­ci­sion-pro­ce­dure that’s only a first draft. You frown over the re­source over­head it charges, and won­der whether it could be im­proved.

You try shift­ing some re­sources away from “reg­u­lar de­ci­sion pro­ce­dure de­sign” into “meta-de­ci­sion-pro­ce­dure-de­sign.” Al­most im­me­di­ately, you come up with a faster and bet­ter pro­ce­dure.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

parable (3/​3)

Oh, you’re so very clever! By now you’ve re­al­ized you need, above and be­yond your meta-meta-meta-de­ci­sion-pro­ce­dure, a “meta-meta-meta-meta-de­ci­sion-pro­ce­dure” to guide your meta-meta-meta-de­ci­sion-pro­ce­dure-im­prove­ment efforts.

Way down on the ob­ject level, you have not moved for a very long time, ex­cept to oc­ca­sion­ally up­date your meta-meta-meta-meta-ra­tio­nal­ity blog.

Way down on the ob­ject level, a dumb and fast preda­tor eats you.

Could you have an­ti­ci­pated this be­fore­hand? Could you have acted on that knowl­edge?

the boundary

You’re an “agent” try­ing to take good ac­tions, et cetera. Your ac­tions are guided by some sort of over­all “model” of how things are.

There are, in­evitably, two parts to your model: the in­te­rior and the bound­ary.

The in­te­rior is ev­ery­thing you treat as fair game for iter­a­tive and re­flec­tive im­prove­ment. For “op­ti­miza­tion,” if you want to put it that way. Facts in the in­te­rior are sub­ject to ra­tio­nal scrutiny; pro­ce­dures in the in­te­rior have been judged and se­lected for their qual­ity, us­ing some fur­ther pro­ce­dure.

The bound­ary is the out­most shell, where re­source con­straints force the regress to stop. Per­haps you have a tar­get and an op­ti­miza­tion pro­ce­dure. If you haven’t tested the op­ti­miza­tion pro­ce­dure against al­ter­na­tives, it’s in your bound­ary. If you have, but you haven’t tested your op­ti­miza­tion-pro­ce­dure-test­ing-pro­ce­dure against al­ter­na­tives, then it’s in your bound­ary. Et cetera.

You are a busi­ness. You do ret­ro­spec­tives on your pro­jects. You’re so very clever, in fact, that you do ret­ro­spec­tives on your ret­ro­spec­tive pro­cess, to im­prove it over time. But how do you im­prove these retro-ret­ros? You don’t. They’re in your bound­ary.

Of ev­ery­thing you know and do, you trust the bound­ary the least. You have ap­plied less scrutiny to it than any­thing else. You sus­pect it may be shame­fully sub­op­ti­mal, just like the pre­vi­ous bound­ary, be­fore you pushed it into the in­te­rior.

em­bed­ded self-justification

You would like to look back on the re­sources you spend – each sec­ond, each joule – and say, “I spent it the right way.” You would like to say, “I have a the­ory of what it means to de­cide well, and I ap­plied it, and so I de­cided well.”

Why did you spend it as you did, then? You can­not an­swer, ever, with­out your an­swer in­vok­ing some­thing on the bound­ary.

How did you spent that sec­ond? On look­ing for a faster al­gorithm. Why? Be­cause your R&D al­lo­ca­tion pro­ce­dure told you to. Why fol­low that pro­ce­dure? Be­cause it’s done bet­ter than oth­ers in the past. How do you know? Be­cause you’ve com­pared it to oth­ers. Which oth­ers? Un­der what as­sump­tions? Oh, your pro­ce­dure-ex­per­i­men­ta­tion pro­ce­dure told you. And how do you know it works? Even­tu­ally you come to the bound­ary, and throw up your hands: “I’m do­ing the best I can, okay!”

If you lived in a sim­ple and trans­par­ent world, maybe you could just find the op­ti­mal policy once and for all. If you re­ally were liter­ally the ban­dit among the slot ma­chines – and you knew this, perfectly, with cre­dence 1 – maybe you could solve for the op­ti­mal ex­plore/​ex­ploit be­hav­ior and then do it.

But your world isn’t like that. You know this, and know that you know it. Even if you could ob­tain a perfect model of your world and be­ings like you, you wouldn’t be able to fit it in­side your own head, much less run it fast enough to be use­ful. (If you had a magic amulet, you might be able to fit your­self in­side your own head, but you live in re­al­ity.)

In­stead, you have de­tailed pic­tures of spe­cific frag­ments of the world, in the in­te­rior and sub­ject to con­tin­u­ous re­fine­ment. And then you have pic­tures of the pic­ture-mak­ing pro­cess, and so on. As you go fur­ther out, the pic­tures get coarser and sim­pler, be­cause their do­main of de­scrip­tion be­comes ever vaster, while your re­sources re­main finite, and you must nour­ish each level with a por­tion of those re­sources be­fore the level above it even be­comes think­able.

At the end, at the bound­ary, you have the coars­est pic­ture, a sort of car­toon. There is a smil­ing stick figure, per­haps wear­ing a lab coat to in­di­cate sci­en­tific-ra­tio­nal val­ues. It reaches for the lever of a slot ma­chine, la­beled “ac­tion,” while peer­ing into a sketch of an os­cillo­scope, la­beled “ob­ser­va­tions.” A sin­gle ar­row curls around, point­ing from the di­a­gram back into the di­a­gram. It is la­beled “op­ti­miza­tion,” and dec­o­rated with cute lit­tle sparkles and hearts, to con­vey its won­der­ful­ness. The mar­gins of the page are lit­tered with equa­tions, de­scribing the lit­tlest of toy mod­els: ban­dit prob­lems, Dutch book sce­nar­ios, Nash equil­ibria un­der perfect in­for­ma­tion.

In the in­te­rior, there are much richer, more beau­tiful pic­tures that are oth­er­wise a lot like this one. In the in­te­rior, meta-learn­ing al­gorithms buzz away on a GPU, us­ing the lat­est and great­est pro­ce­dures for find­ing pro­ce­dures, jus­tified in pre­cise terms in your lat­est pa­per. You ges­ture at a white­board as you pri­ori­tize op­tions for im­prov­ing the al­gorithms. Your pri­ori­ti­za­tion frame­work has gone through rigor­ous test­ing.

Why, in the end, do you do all of it? Be­cause you are the lit­tle stick figure in the lab coat.

coda

What am I try­ing to get at, here?

Oc­ca­sion­ally peo­ple talk about the rele­vance of com­pu­ta­tional com­plex­ity is­sues to AI and its limits. Gw­ern has a good page on why these con­cerns can’t place use­ful bounds on the po­ten­tial of ma­chine in­tel­li­gence in the way peo­ple some­times ar­gue they do.

Yet, some­how I feel an un­scratched itch when I read ar­gu­ments like Gw­ern’s there. They an­swer the ques­tion I think I’m ask­ing when I seek them out, but at the end I feel like I re­ally meant to ask some other ques­tion in­stead.

Given com­pu­ta­tional con­straints, how “su­per­hu­man” could an AI be? Well, it could just do what we do, but sped up – that is, it could have the same re­source effi­ciency but more re­sources per unit time. That’s enough to be scary. It could also find more effi­cient al­gorithms and pro­ce­dures, just as we do in our own re­search – but it would find them ever faster, more effi­ciently.

What re­mains unan­swered, though, is whether there is any use­ful way of talk­ing about do­ing this (the whole thing, in­clud­ing the self-im­prove­ment R&D) well, do­ing it ra­tio­nally, as op­posed to do­ing it in a way that sim­ply “seems to work” af­ter the fact.

How would an AI’s own policy for in­vest­ment in self-im­prove­ment com­pare to our own (to yours, to your so­ciety’s)? Could we look at it and say, “this is bet­ter”? Could the AI do so? Is there any­thing bet­ter than sim­ply bum­bling around in con­cept-space, in a man­ner that per­haps has many in­ter­nal struc­tures of self-jus­tifi­ca­tion but is not known to work as a whole? Is there such a thing as (ap­prox­i­mate) knowl­edge about the right way to do all of it that is still small enough to fit in­side the agent on which it passes judg­ment?

Can you rep­re­sent your over­all policy, your out­er­most strat­egy-over-strate­gies con­sid­ered a re­sponse to your en­tire situ­a­tion, in a way that is not a car­toon, a way real enough to defend it­self?

What is re­ally known about the best way to spend the next unit of re­sources? I mean, known at the level of the re­source-spenders, not as a mat­ter of ex­ter­nal judg­ment? Can any­thing definite be said about the topic in gen­eral ex­cept “it is pos­si­ble to do bet­ter or worse, and it is prob­a­bly pos­si­ble to do bet­ter than we do now?” If not, what stan­dard of ra­tio­nal­ity do we have left to ap­ply be­yond toy mod­els, to our­selves or our suc­ces­sors?

• It seems to me that there are roughly two types of “bound­ary” to think about: ceilings and floors.

• Floors are aka the foun­da­tions. Maybe a sys­tem is run­ning on a ba­si­cally Bayesian frame­work, or (al­ter­nately) log­i­cal in­duc­tion. Maybe there are some ax­ioms, like ZFC. Go­ing meta on floors in­volves the kind of self-refer­ence stuff which you hear about most of­ten: Gödel’s the­o­rem and so on. Floors are, ba­si­cally, pretty hard to ques­tion and im­prove (though not im­pos­si­ble).

• Ceilings are fast heuris­tics. You have all kinds of so­phis­ti­cated be­liefs in the in­te­rior, but there’s a ques­tion of which in­fer­ences you im­me­di­ately make, with­out do­ing any meta to con­sider what di­rec­tion to think in. (IE, you do gen­er­ally do some meta to think about what di­rec­tion to think in; but, this “tops out” at some level, at which point the anal­y­sis has to pro­ceed with­out meta.) Ceilings are rel­a­tively easy to im­prove. For ex­am­ple, the AlphaGo move pro­posal net­work and eval­u­a­tion net­work (if I re­call the terms cor­rectly). Th­ese have cheap up­dates which can be made fre­quently, via ob­serv­ing the re­sults of rea­son­ing. Th­ese in­cre­men­tal up­dates then help the more ex­pen­sive tree-search rea­son­ing to be even bet­ter.

Both floors and ceilings have a fla­vor of “the ba­sic stuff that’s ac­tu­ally hap­pen­ing”—the in­te­rior is built out of a lot of bound­ary stuff, and small changes to bound­ary will cre­ate large shifts in in­te­rior. How­ever, floors and ceilings are very differ­ent. Tweak­ing floor is rel­a­tively dan­ger­ous, while tweak­ing ceiling is rel­a­tively safe. Re­turn­ing to the AlphaGo anal­ogy, the floor is like the model of the game which al­lows tree search. The floor is what al­lows us to cre­ate a ceiling. Tweaks to the floor will tend to cre­ate large shifts in the ceiling; tweaks to the ceiling will not change the floor at all.

(Per­haps other ex­am­ples won’t have as clear a floor/​ceiling di­vi­sion as AlphaGo; or, per­haps they still will.)

What re­mains unan­swered, though, is whether there is any use­ful way of talk­ing about do­ing this (the whole thing, in­clud­ing the self-im­prove­ment R&D) well, do­ing it ra­tio­nally, as op­posed to do­ing it in a way that sim­ply “seems to work” af­ter the fact.
[...] Is there any­thing bet­ter than sim­ply bum­bling around in con­cept-space, in a man­ner that per­haps has many in­ter­nal struc­tures of self-jus­tifi­ca­tion but is not known to work as a whole? [...]
Can you rep­re­sent your over­all policy, your out­er­most strat­egy-over-strate­gies con­sid­ered a re­sponse to your en­tire situ­a­tion, in a way that is not a car­toon, a way real enough to defend it­self?

My in­tu­ition is that the situ­a­tion differs, some­what, for floors and ceilings.

• For floors, there are fun­da­men­tal log­i­cal-para­dox-fla­vored bar­ri­ers. This re­lates to MIRI re­search on tiling agents.

• For ceilings, there are com­pu­ta­tional-com­plex­ity-fla­vored bar­ri­ers. You don’t ex­pect to have a perfect set of heuris­tics for fast think­ing. But, you can have strate­gies re­lat­ing to heuris­tics which have uni­ver­sal-ish prop­er­ties. Like, log­i­cal in­duc­tion is an “up­per­most ceiling” (takes the fixed point of re­cur­sive meta) such that, in some sense, you know you’re do­ing the best you can do in terms of track­ing which heuris­tics are use­ful; you don’t have to spawn fur­ther meta-anal­y­sis on your heuris­tic-form­ing heuris­tics. HOWEVER, it is also very very slow and im­prac­ti­cal for build­ing real agents. It’s the agent that gets eaten in your parable. So, there’s more to be said with re­spect to ceilings as they ex­ist in re­al­ity.

• Thanks, the floor/​ceiling dis­tinc­tion is helpful.

I think “ceilings as they ex­ist in re­al­ity” is my main in­ter­est in this post. Speci­fi­cally, I’m in­ter­ested in the fol­low­ing:

• any re­source-bound agent will have ceilings, so an ac­count of em­bed­ded ra­tio­nal­ity needs a “the­ory of hav­ing good ceilings”

• a “the­ory of hav­ing good ceilings” would be differ­ent from the sorts of “the­o­ries” we’re used to think­ing about, in­volv­ing prac­ti­cal con­cerns at the fun­da­men­tal desider­ata level rather than as a mat­ter of im­ple­ment­ing an ideal af­ter it’s been specified

In more de­tail: it’s one thing to be able to as­sess quick heuris­tics, and it’s an­other (and bet­ter) one to be able to as­sess quick heuris­tics quickly. It’s pos­si­ble (maybe) to imag­ine a con­ve­nient situ­a­tion where the the­ory of each “speed class” among fast de­ci­sions is com­press­ible enough to dis­till down to some­thing which can be run in that speed class and still provide use­ful guidance. In this case there’s a pos­si­bil­ity for the the­ory to tell us why our be­hav­ior as a whole is jus­tified, by ex­plain­ing how our choices are “about as good as can be hoped for” dur­ing nec­es­sar­ily fast/​sim­ple ac­tivity that can’t pos­si­bly meet our more pow­er­ful and fa­mil­iar no­tions of de­ci­sion ra­tio­nal­ity.

How­ever, if we can’t do this, it seems like we face an ex­plod­ing back­log of jus­tifi­ca­tion needs: ev­ery ap­pli­ca­tion of a fast heuris­tic now re­quires a slow jus­tifi­ca­tion pass, but we’re con­stantly ap­ply­ing fast heuris­tics and there’s no room for the slow pass to catch up. So maybe a stronger agent could jus­tify what we do, but we couldn’t.

I ex­pect helpful the­o­ries here to in­volve dis­till­ing-into-fast-enough-rules on a fun­da­men­tal level, so that “an im­prac­ti­cally slow but work­ing ver­sion of the the­ory” is ac­tu­ally a con­tra­dic­tion in terms.

• The way I un­der­stand your di­vi­sion of floors and seal­ing, the seal­ing is sim­ply the high­est level meta there is, and the agent has *typ­i­cally* no way of ques­tion­ing it. The ceiling is just “what the al­gorithm is pro­gramed to do”. Alpha Go is had pro­gramed to up­date the net­work weights in a cer­tain way in re­sponse to the train­ing data.

What you call floor for Alpha Go, i.e. the move eval­u­a­tions, are not even bound­aries (in the sense nos­talge­braist define it), that would just be the ob­ject level (no meta at all) policy.

I think this struc­ture will be the same for any known agent al­gorithm, where by “known” I mean “we know how it works”, rather than “we know that it ex­ists”. How­ever Hu­mans seems to be differ­ent? When I try to in­tro­spect it all seem to be mixed up, with ob­ject level heuris­tics in­fluenc­ing meta level up­dates. The ceiling and the floor are all mixed to­gether. Or maybe not? Maybe we are just the same, i.e. hav­ing a definite top level, hard coded, high­est level meta. Some ev­i­dence of this is that some­times I just no­tice emo­tional shifts and/​or de­ci­sions be­ing made in my brain, and I just know that no nor­mal rea­son­ing I can do will have any effect on this shift/​de­ci­sion.

• What you call floor for Alpha Go, i.e. the move eval­u­a­tions, are not even bound­aries (in the sense nos­talge­braist define it), that would just be the ob­ject level (no meta at all) policy.

I think in gen­eral the idea of the ob­ject level policy with no meta isn’t well-defined, if the agent at least does a lit­tle meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imag­ine a sys­tem where shut­ting off the meta would put it in such an ab­nor­mal state (like it’s on drugs) that the ob­served be­hav­ior wouldn’t mean very much in terms of its usual op­er­a­tion. Maybe this is the point you are mak­ing about hu­mans not hav­ing a good floor/​ceiling dis­tinc­tion.

But, I think we can con­ceive of the “floor” more gen­er­ally. If the ceiling is the fixed struc­ture, e.g. the up­date for the weights, the “floor” is the low­est-level con­tent—e.g. the weights them­selves. Whether think­ing at some meta-level or not, these weights de­ter­mine the fast heuris­tics by which a sys­tem rea­sons.

I still think some of what nos­talge­braist said about bound­aries seems more like the floor than the ceiling.

The space “be­tween” the floor and the ceiling in­volves con­structed meta lev­els, which are larger com­pu­ta­tions (ie not just a sin­gle ap­pli­ca­tion of a heuris­tic func­tion), but which are not fixed. This way we can think of the floor/​ceiling spec­trum as small-to-large: the floor is what hap­pens in a very small amount of time; the ceiling is the whole en­tire pro­cess of the al­gorithm (learn­ing and in­ter­act­ing with the world); the “in­te­rior” is any­thing in-be­tween.

Of course, this makes it sort of triv­ial, in that you could ap­ply the con­cept to any­thing at all. But the main in­ter­est­ing thing is how an agent’s sub­jec­tive ex­pe­rience seems to in­ter­act with floors and ceilings. IE, we can’t ac­cess floors very well be­cause they hap­pen “too quickly”, and be­sides, they’re the thing that we do ev­ery­thing with (it’s difficult to imag­ine what it would mean for a con­scious­ness to have sub­jec­tive “ac­cess to” its neu­rons/​tran­sis­tors). But we can ob­serve the con­se­quences very im­me­di­ately, and re­flect on that. And the fast op­er­a­tions can be ad­justed rel­a­tively easy (e.g. up­dat­ing neu­ral weights). In­ter­me­di­ate-sized com­pu­ta­tional phe­nom­ena can be rea­soned about, and ac­cessed in­ter­ac­tively, “from the out­side” by the rest of the sys­tem. But the whole com­pu­ta­tion can be “rea­soned about but not up­dated” in a sense, and be­comes difficult to ob­serve again (not “from the out­side” the way smaller sub-com­pu­ta­tions can be ob­served).

• I can never tell whether they’ve never thought about the things I’m think­ing about, or whether they sped past them years ago. They do seem very smart, that’s for sure.

When­ever I have a great idea, it turns out that some­one at MIRI con­sid­ered it five years ear­lier. This si­mul­ta­neously makes me feel very smart and rather dis­s­a­pointed. With that be­ing said, here are some rele­vant things:

Thing #1:

Oh, you’re so very clever! By now you’ve re­al­ized you need, above and be­yond your reg­u­lar de­ci­sion pro­ce­dure to guide your ac­tions in the out­side world, a “meta-de­ci­sion-pro­ce­dure” to guide your own de­ci­sion-pro­ce­dure-im­prove­ment efforts.

This is a nit­pick but it’s an im­por­tant one in un­der­stand­ing how meta-stuff works here: If you’ve de­cided that you need a de­ci­sion pro­ce­dure to de­cide when to up­date your de­ci­sion pro­ce­dure, then what­ever al­gorithm you used to make that de­ci­sion is already meta. This is be­cause your de­ci­sion pro­ce­dure is think­ing self-refer­en­tially. Given this, why would it need to build a whole new pro­ce­dure for think­ing about de­ci­sion pro­ce­dures when it could just im­prove it­self?

This has a num­ber of ad­van­tages be­cause it means that any­thing you learn about how to make de­ci­sions can also be di­rectly used to help you make de­ci­sions about how you make de­ci­sions—ad in­fini­tum.

Thing #2:

You are a busi­ness. You do ret­ro­spec­tives on your pro­jects. You’re so very clever, in fact, that you do ret­ro­spec­tives on your ret­ro­spec­tive pro­cess, to im­prove it over time. But how do you im­prove these retro-ret­ros? You don’t. They’re in your bound­ary.

This case re­minded me a lot of Eliezer on Where Re­cur­sive Jus­tifi­ca­tion Hits Rock Bot­tom ex­cept placed in a con­text where you can mod­ify your level of re­cur­sion.

You need to jus­tify that your pro­jects are good so you do ret­ro­spec­tives. But you need to jus­tify why your ret­ro­spec­tives are good so you do ret­ro­spec­tives on those. But you need to jus­tify why your retro-ret­ros are good too right? To quote Eliezer:

Should I trust my brain? Ob­vi­ously not; it doesn’t always work. But nonethe­less, the hu­man brain seems much more pow­er­ful than the most so­phis­ti­cated com­puter pro­grams I could con­sider trust­ing oth­er­wise. How well does my brain work in prac­tice, on which sorts of prob­lems?

So there are a cou­ple ques­tions here. The easy ques­tion:

Q: How do I jus­tify the way I’m in­vest­ing my re­sources?

A: You don’t. You just in­vest them us­ing the best of your abil­ity and hope for the best

And the more in­ter­est­ing ques­tion:

Q: What is the op­ti­mal level of meta-jus­tifi­ca­tion I use in in­vest­ing my re­sources?

A1: This still isn’t tech­ni­cally know­able in­for­ma­tion. How­ever, there are plenty of un­jus­tified pri­ors that might be built into you which cause you to make a de­ci­sion. For in­stance, you might keep go­ing up the meta-lev­els enough rounds un­til you see diminish­ing re­turns and then stop. Or you might just never go above three lev­els of meta be­cause you figure that’s ex­ces­sive. Depends on the AI.

A2: Given that Thing #1 is true, you don’t need any meta-de­ci­sion al­gorithms—you just need a self-refer­en­tial de­ci­sion al­gorithm. In this case, we just have the an­swer to the easy ques­tion: You use the full ca­pa­bil­ities of your de­ci­sion al­gorithm and hope for the best (and some­times your de­ci­sion al­gorithm makes de­ci­sions about it­self in­stead of de­ci­sions about phys­i­cal ac­tions)

• I don’t un­der­stand Thing #1. Per­haps, in the pas­sage you quote from my post, the phrase “de­ci­sion pro­ce­dure” sounds mis­lead­ingly generic, as if I have some sin­gle func­tion I use to make all my de­ci­sions (big and small) and we are talk­ing about mod­ifi­ca­tions to that func­tion.

(I don’t think that is re­ally pos­si­ble: if the func­tion is so­phis­ti­cated enough to ac­tu­ally work in gen­eral, it must have a lot of in­ter­nal sub-struc­ture, and the smaller things it does in­side it­self could be treated as “de­ci­sions” that aren’t be­ing made us­ing the whole func­tion, which con­tra­dicts the origi­nal premise.)

In­stead, I’m just talk­ing about the or­di­nary sort of case where you shift some re­sources away from do­ing X to think­ing about bet­ter ways to do X, where X isn’t the whole of ev­ery­thing you do.

Re: Q/​A/​A1, I guess I agree that these things are (as best I can tell) in­evitably prag­matic. And that, as EY says in the post you link, “I’m man­ag­ing the re­cur­sion to the best of my abil­ity” can mean some­thing bet­ter than just “I work on ex­actly N lev­els and then my de­ci­sions at level N+1 are ut­terly ar­bi­trary.” But then this seems to threaten the Embed­ded Agency pro­gramme, be­cause it would mean we can’t make the­o­ret­i­cally grounded as­sess­ments or com­par­i­sons in­volv­ing agents as strong as our­selves or stronger.

(The dis­cus­sion of self-jus­tifi­ca­tion in this post was origi­nally mo­ti­vated by the topic of ex­ter­nal as­sess­ment, on the premise that if we are pow­er­ful enough to as­sess a pro­posed AGI in a given way, it must also be pow­er­ful enough to as­sess it­self in that way. And con­tra­pos­i­tively, if the AGI can’t as­sess it­self in a given way then we can’t as­sess it in that way ei­ther.)

• (I don’t think that is re­ally pos­si­ble: if the func­tion is so­phis­ti­cated enough to ac­tu­ally work in gen­eral, it must have a lot of in­ter­nal sub-struc­ture, and the smaller things it does in­side it­self could be treated as “de­ci­sions” that aren’t be­ing made us­ing the whole func­tion, which con­tra­dicts the origi­nal premise.)

Even if the de­ci­sion func­tion has a lot of sub-struc­ture, I think that in the con­text of AGI

• (less im­por­tant point) It is un­likely that we will be able to di­rectly sep­a­rate sub­struc­tures of the func­tion from the whole func­tion. This is be­cause I’m as­sum­ing the func­tion is us­ing some heuris­tic ap­prox­i­mat­ing log­i­cal in­duc­tion to think about it­self and this has ex­tremely broad uses across ba­si­cally ev­ery as­pect of the func­tion.

• (more im­por­tant point) It doesn’t mat­ter if it’s a sub-struc­ture or not. The point is that some part of the de­ci­sion func­tion is already ca­pa­ble of rea­son­ing about ei­ther im­prov­ing it­self or about im­prov­ing other as­pects of the de­ci­sion func­tion. So what­ever method it uses to an­ti­ci­pate whether it should try self-im­prove­ment is already baked-in in some way.

Re: Q/​A/​A1, I guess I agree that these things are (as best I can tell) in­evitably prag­matic. And that, as EY says in the post you link, “I’m man­ag­ing the re­cur­sion to the best of my abil­ity” can mean some­thing bet­ter than just “I work on ex­actly N lev­els and then my de­ci­sions at level N+1 are ut­terly ar­bi­trary.” But then this seems to threaten the Embed­ded Agency pro­gramme, be­cause it would mean we can’t make the­o­ret­i­cally grounded as­sess­ments or com­par­i­sons in­volv­ing agents as strong as our­selves or stronger.

So “I work on ex­actly N lev­els and then my de­ci­sions at level N+1 are ut­terly ar­bi­trary” is not ex­actly true be­cause, in all rele­vant sce­nar­ios, we’re the ones who build the AI. It’s more like “So I work on ex­actly N lev­els and then my de­ci­sions at level N+1 were deemed ir­rele­vant by the se­lec­tion pres­sures that cre­ated me which granted me this de­ci­sion-func­tion that deemed fur­ther lev­els ir­rele­vant.”

If we’re okay with lev­er­ag­ing nor­ma­tive or em­piri­cal as­sump­tions about the world, we should be able to as­sess AGI (or have the AGI as­sess it­self) with meth­ods that we’re com­fortable with.

In some sense, we have prac­ti­cal ex­am­ples of what this looks like. N, the level of meta, can be viewed as a hy­per­pa­ram­e­ter of our learn­ing sys­tem. How­ever, in data sci­ence, hy­per­pa­ram­e­ters perform differ­ently for differ­ent prob­lems so peo­ple of­ten use Bayesian op­ti­miza­tion to iter­a­tively pick the best hy­per­pa­ram­e­ters. But, you might say, our Bayesian hy­per­pa­ram­e­ter op­ti­miza­tion pro­cess re­quires its own pri­ors—it too has hy­per­pa­ram­e­ters!

But no one re­ally both­ers to op­ti­mize these for a cou­ple rea­sons--

#1. As we in­crease the level of meta in a par­tic­u­lar op­ti­miza­tion pro­cess, we tend to see diminish­ing re­turns on the im­proved model performance

#2. Meta-op­ti­miza­tion is pro­hibitively ex­pen­sive: Each N-level meta-op­ti­mizer gen­er­ally needs to con­sider mul­ti­ple pos­si­bil­ities of (N-1)-level op­ti­miz­ers in or­der to pick the best one. In­duc­tively, this means your N-level meta-op­ti­mizer’s com­pu­ta­tional cost is around where x rep­re­sents the num­ber of (N-1)-level op­ti­miz­ers each N-level op­ti­mizer needs to con­sider.

But #1. can’t ac­tu­ally be proved. It’s just an as­sump­tiont that we think is true be­cause we have a strong ob­ser­va­tional prior for it be­ing true. Maybe we should ques­tion how hu­man brains gen­er­ate their pri­ors but, at the end of the day, the way we do this ques­tion­ing is still de­ter­mined by our hard-coded al­gorithms for deal­ing with prob­a­bil­ity.

The up­shot is that, when we look at prob­lems to the one similar we face with em­bed­ded agency, we still use the Eliezer-an ap­proach. We just hap­pen to be very con­fi­dent in our bound­ary for rea­sons that can­not be rigor­ously jus­tified.

• I don’t un­der­stand your ar­gu­ment for why #1 is im­pos­si­ble. Con­sider a uni­verse that’ll un­dergo heat death in a billion steps. Con­sider the agent that im­ple­ments “Take an ac­tion if PA+<steps re­main­ing> can prove that it is good.” us­ing some prov­abil­ity checker al­gorithm that takes some steps to run. If there is some faster prov­abil­ity checker al­gorithm, it’s prov­able that it’ll do bet­ter us­ing that one, so it switches when it finds that proof.

• Just a quick note: Some­times there is a way out of this kind of in­finite regress by im­ple­ment­ing an al­gorithm that ap­prox­i­mates the limit. Of course, you can also be put back into an in­finite regress by ask­ing if there is a bet­ter ap­prox­i­ma­tion.

• A lot of what you write here seems re­lated to my no­tion of Tur­ing Re­in­force­ment Learn­ing. In Tur­ing RL we con­sider an AI com­pris­ing of a “core” RL agent and an “en­velope” which is a com­puter on which the core can run pro­grams (some­what similarly to neu­ral Tur­ing ma­chines). From the point of the view of the core, the en­velope is a com­po­nent of its en­vi­ron­ment (in ad­di­tion to its usual I/​O), about which it has some­what stronger pri­ors than about the rest. Such a sys­tem learns how to make op­ti­mal use of the en­velope’s com­put­ing re­sources. Your “bound­ary” cor­re­sponds to the core, which is the im­mutable part of the al­gorithm that pro­duces ev­ery­thing else. Re­gard­ing the “jus­tifi­ca­tion” of why a par­tic­u­lar core al­gorithm is cor­rect, the jus­tifi­ca­tion should come from re­gret bounds we prove about this al­gorithm w.r.t. some prior over in­com­plete mod­els. In­com­plete mod­els are the solu­tion to “even if you could ob­tain a perfect model of your world and be­ings like you, you wouldn’t be able to fit it in­side your own head”. In­stead of ob­tain­ing a perfect model, the agent learns all pat­terns (in­com­plete mod­els) in the world that it can fit into its head, and ex­ploits these pat­terns for gain. More pre­cisely, in Tur­ing RL the agent starts with some small class of pat­terns that the core can fit into its head, and boot­straps from those to a larger class of pat­terns, ac­count­ing for a cost-benefit anal­y­sis of re­source use. This way, the re­gret bound satis­fied by the core al­gorithm should lead to even stronger guaran­tees for the sys­tem as a whole (for ex­am­ple this).

• ‘Do­ing it well’ seems to be very load bear­ing there. I think you’re sneak­ing in an ‘all’ in the back­ground? Like, in or­der to be defined as su­per­in­tel­li­gent it must do bet­ter at all do­mains than X or some­thing?

My cur­rent an­swer is some­thing hand wavy about the pro­cess just try­ing to un­good­hart it­self (as­sum­ing that the self and world model as given start off good­harted) and the chips fall where they may.

• It’s not re­ally about do­ing well/​bet­ter in all do­mains, it’s about be­ing able to ex­plain how you can do well at all of the things you do, even if that isn’t nearly ev­ery­thing. And mak­ing that ex­pla­na­tion com­plete enough to be con­vinc­ing, as an ar­gu­ment about the real world as­sessed us­ing your usual stan­dards, while still keep­ing it limited enough to avoid self-refer­ence prob­lems.

• Why did you spend it as you did, then? You can­not an­swer, ever, with­out your an­swer in­vok­ing some­thing on the bound­ary.

It seems like if the O(N^(n-1)) al­gorithm (al­gorithm 2) is bet­ter than the O(N^n) al­gorithm (al­gorithm 1), then there is an amount of time, such that, af­ter that time has elapsed (af­ter adapt­ing the new al­gorithm), the re­duc­tion in re­source con­sump­tion will equal the cost spent find­ing the new al­gorithm. This might be called “break­ing even”, and doesn’t seem to in­voke some­thing on the bound­ary.

• I think that this in­finite regress can be con­verted into a loop. Given an in­finite se­quence of lay­ers, in which the job of layer is to op­ti­mise layer . Each layer is a piece of pro­gram­ming code. After the first cou­ple of lay­ers, these lay­ers will start to look very similar. You could have layer 3 be­ing a able to op­ti­mize both layer 2 and layer 3.

One model is that your robot just sits and thinks for an hour. At the end of that hour, it de­signs what it thinks is the best code it can come up with, and runs that. To the origi­nal AI, any­thing out­side the origi­nal hour is ex­ter­nal, it is an­swer­ing the ques­tion “what pat­tern of bits on this hard disk will lead to the best out­come.” It can take all these bal­ances and trade­offs into ac­count in what­ever way it likes. If it hasn’t come up with any good ideas yet, it could copy its code, add a crude heuris­tic that makes it run ran­domly when think­ing (to avoid the pred­i­tors) and think for longer.