abramdemski comments on “embedded self-justification,” or something like that

abramdemski 3 Nov 2019 6:49 UTC
LW: 25 AF: 11
AF
It seems to me that there are roughly two types of “boundary” to think about: ceilings and floors.
- Floors are aka the foundations. Maybe a system is running on a basically Bayesian framework, or (alternately) logical induction. Maybe there are some axioms, like ZFC. Going meta on floors involves the kind of self-reference stuff which you hear about most often: Gödel’s theorem and so on. Floors are, basically, pretty hard to question and improve (though not impossible).
- Ceilings are fast heuristics. You have all kinds of sophisticated beliefs in the interior, but there’s a question of which inferences you immediately make, without doing any meta to consider what direction to think in. (IE, you do generally do some meta to think about what direction to think in; but, this “tops out” at some level, at which point the analysis has to proceed without meta.) Ceilings are relatively easy to improve. For example, the AlphaGo move proposal network and evaluation network (if I recall the terms correctly). These have cheap updates which can be made frequently, via observing the results of reasoning. These incremental updates then help the more expensive tree-search reasoning to be even better.
Both floors and ceilings have a flavor of “the basic stuff that’s actually happening”—the interior is built out of a lot of boundary stuff, and small changes to boundary will create large shifts in interior. However, floors and ceilings are very different. Tweaking floor is relatively dangerous, while tweaking ceiling is relatively safe. Returning to the AlphaGo analogy, the floor is like the model of the game which allows tree search. The floor is what allows us to create a ceiling. Tweaks to the floor will tend to create large shifts in the ceiling; tweaks to the ceiling will not change the floor at all.
(Perhaps other examples won’t have as clear a floor/ceiling division as AlphaGo; or, perhaps they still will.)
What remains unanswered, though, is whether there is any useful way of talking about doing this (the whole thing, including the self-improvement R&D) well, doing it rationally, as opposed to doing it in a way that simply “seems to work” after the fact.
[...] Is there anything better than simply bumbling around in concept-space, in a manner that perhaps has many internal structures of self-justification but is not known to work as a whole? [...]
Can you represent your overall policy, your outermost strategy-over-strategies considered a response to your entire situation, in a way that is not a cartoon, a way real enough to defend itself?
My intuition is that the situation differs, somewhat, for floors and ceilings.
- For floors, there are fundamental logical-paradox-flavored barriers. This relates to MIRI research on tiling agents.
- For ceilings, there are computational-complexity-flavored barriers. You don’t expect to have a perfect set of heuristics for fast thinking. But, you can have strategies relating to heuristics which have universal-ish properties. Like, logical induction is an “uppermost ceiling” (takes the fixed point of recursive meta) such that, in some sense, you know you’re doing the best you can do in terms of tracking which heuristics are useful; you don’t have to spawn further meta-analysis on your heuristic-forming heuristics. HOWEVER, it is also very very slow and impractical for building real agents. It’s the agent that gets eaten in your parable. So, there’s more to be said with respect to ceilings as they exist in reality.
- nostalgebraist 3 Nov 2019 8:35 UTC
  LW: 7 AF: 5
  AF Parent
  Thanks, the floor/ceiling distinction is helpful.
  I think “ceilings as they exist in reality” is my main interest in this post. Specifically, I’m interested in the following:
  - any resource-bound agent will have ceilings, so an account of embedded rationality needs a “theory of having good ceilings”
  - a “theory of having good ceilings” would be different from the sorts of “theories” we’re used to thinking about, involving practical concerns at the fundamental desiderata level rather than as a matter of implementing an ideal after it’s been specified
  In more detail: it’s one thing to be able to assess quick heuristics, and it’s another (and better) one to be able to assess quick heuristics quickly. It’s possible (maybe) to imagine a convenient situation where the theory of each “speed class” among fast decisions is compressible enough to distill down to something which can be run in that speed class and still provide useful guidance. In this case there’s a possibility for the theory to tell us why our behavior as a whole is justified, by explaining how our choices are “about as good as can be hoped for” during necessarily fast/simple activity that can’t possibly meet our more powerful and familiar notions of decision rationality.
  However, if we can’t do this, it seems like we face an exploding backlog of justification needs: every application of a fast heuristic now requires a slow justification pass, but we’re constantly applying fast heuristics and there’s no room for the slow pass to catch up. So maybe a stronger agent could justify what we do, but we couldn’t.
  I expect helpful theories here to involve distilling-into-fast-enough-rules on a fundamental level, so that “an impractically slow but working version of the theory” is actually a contradiction in terms.
- Linda Linsefors 13 Nov 2019 19:11 UTC
  LW: 3 AF: 2
  AF Parent
  The way I understand your division of floors and sealing, the sealing is simply the highest level meta there is, and the agent has *typically* no way of questioning it. The ceiling is just “what the algorithm is programed to do”. Alpha Go is had programed to update the network weights in a certain way in response to the training data.
  What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.
  I think this structure will be the same for any known agent algorithm, where by “known” I mean “we know how it works”, rather than “we know that it exists”. However Humans seems to be different? When I try to introspect it all seem to be mixed up, with object level heuristics influencing meta level updates. The ceiling and the floor are all mixed together. Or maybe not? Maybe we are just the same, i.e. having a definite top level, hard coded, highest level meta. Some evidence of this is that sometimes I just notice emotional shifts and/or decisions being made in my brain, and I just know that no normal reasoning I can do will have any effect on this shift/decision.
  - abramdemski 13 Nov 2019 21:17 UTC
    LW: 10 AF: 3
    AF Parent
    What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.
    I think in general the idea of the object level policy with no meta isn’t well-defined, if the agent at least does a little meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imagine a system where shutting off the meta would put it in such an abnormal state (like it’s on drugs) that the observed behavior wouldn’t mean very much in terms of its usual operation. Maybe this is the point you are making about humans not having a good floor/ceiling distinction.
    But, I think we can conceive of the “floor” more generally. If the ceiling is the fixed structure, e.g. the update for the weights, the “floor” is the lowest-level content—e.g. the weights themselves. Whether thinking at some meta-level or not, these weights determine the fast heuristics by which a system reasons.
    I still think some of what nostalgebraist said about boundaries seems more like the floor than the ceiling.
    The space “between” the floor and the ceiling involves constructed meta levels, which are larger computations (ie not just a single application of a heuristic function), but which are not fixed. This way we can think of the floor/ceiling spectrum as small-to-large: the floor is what happens in a very small amount of time; the ceiling is the whole entire process of the algorithm (learning and interacting with the world); the “interior” is anything in-between.
    Of course, this makes it sort of trivial, in that you could apply the concept to anything at all. But the main interesting thing is how an agent’s subjective experience seems to interact with floors and ceilings. IE, we can’t access floors very well because they happen “too quickly”, and besides, they’re the thing that we do everything with (it’s difficult to imagine what it would mean for a consciousness to have subjective “access to” its neurons/transistors). But we can observe the consequences very immediately, and reflect on that. And the fast operations can be adjusted relatively easy (e.g. updating neural weights). Intermediate-sized computational phenomena can be reasoned about, and accessed interactively, “from the outside” by the rest of the system. But the whole computation can be “reasoned about but not updated” in a sense, and becomes difficult to observe again (not “from the outside” the way smaller sub-computations can be observed).