# Embedded World-Models

*(A longer text-based version of this post is also available on MIRI’s blog* *here, and the bibliography for the whole sequence can be found* *here)*

*(Edit: This post had 15 slides added on Saturday 10th November.)*

- Why Agent Foundations? An Overly Abstract Explanation by 25 Mar 2022 23:17 UTC; 295 points) (
- Radical Probabilism by 18 Aug 2020 21:14 UTC; 176 points) (
- Problem relaxation as a tactic by 22 Apr 2020 23:44 UTC; 119 points) (
- The Credit Assignment Problem by 8 Nov 2019 2:50 UTC; 98 points) (
- Public Static: What is Abstraction? by 9 Jun 2020 18:36 UTC; 96 points) (
- When does rationality-as-search have nontrivial implications? by 4 Nov 2018 22:42 UTC; 66 points) (
- Inductive biases stick around by 18 Dec 2019 19:52 UTC; 64 points) (
- How We Picture Bayesian Agents by 8 Apr 2024 18:12 UTC; 62 points) (
- ELK Thought Dump by 28 Feb 2022 18:46 UTC; 58 points) (
- Does Bayes Beat Goodhart? by 3 Jun 2019 2:31 UTC; 48 points) (
- Embedded Agency via Abstraction by 26 Aug 2019 23:03 UTC; 42 points) (
- 26 Feb 2024 20:57 UTC; 38 points) 's comment on Benito’s Shortform Feed by (
- [AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee by 27 Nov 2019 18:10 UTC; 38 points) (
- New GreaterWrong feature: image zoom + image slideshows by 4 Nov 2018 7:34 UTC; 37 points) (
- What is Abstraction? by 6 Dec 2019 20:30 UTC; 35 points) (
- Defining Myopia by 19 Oct 2019 21:32 UTC; 32 points) (
- Characterizing Real-World Agents as a Research Meta-Strategy by 8 Oct 2019 15:32 UTC; 29 points) (
- 28 Dec 2021 2:45 UTC; 21 points) 's comment on The Solomonoff Prior is Malign by (
- Theory of Ideal Agents, or of Existing Agents? by 13 Sep 2019 17:38 UTC; 20 points) (
- Alignment Newsletter #32 by 12 Nov 2018 17:20 UTC; 18 points) (
- How should AIs update a prior over human preferences? by 15 May 2020 13:14 UTC; 17 points) (
- Alignment Newsletter #31 by 5 Nov 2018 23:50 UTC; 17 points) (
- Modelling Model Comparisons by 4 Apr 2019 17:26 UTC; 11 points) (
- 22 Aug 2020 5:58 UTC; 9 points) 's comment on Radical Probabilism by (
- 29 Oct 2020 22:23 UTC; 5 points) 's comment on MikkW’s Shortform by (
- 21 Aug 2020 17:38 UTC; 5 points) 's comment on Radical Probabilism by (

This post feels quite similar to things I have written in the past to justify my lack of enthusiasm about idealizations like AIXI and logically-omniscient Bayes. But I would go further: I think that grappling with embeddedness properly will

inevitablymake theories of thisgeneral typeirrelevant or useless, so that “a theory like this, except for embedded agents” is not a thing that we can reasonably want. To specify what I mean, I’ll use this paragraph as a jumping-off point:Most “theories of rational belief” I have encountered—including Bayesianism in the sense I think is meant here—are framed at the level of an evaluator outside the universe, and have essentially no content when we try to transfer them to individual embedded agents. This is because these theories tend to be derived in the following way:

We want a theory of the best possible behavior for agents.

We have some class of “practically achievable” strategies S, which can actually be implemented by agents. We note that an agent’s observations provide some information about the quality of different strategies s∈S. So if it were possible to follow a rule like R≡ “find the best s∈S given your observations, and then follow that s,” this rule would spit out very good agent behavior.

Usually we soften this to a performance-weighted average rather than a hard argmax, but the principle is the same: if we could search over all of S, the rule R that says “do the search and then follow what it says” can be competitive with the very best s∈S. (Trivially so, since it has access to the best strategies, along with all the others.)

But usually R∉S. That is, the strategy “search over all practical strategies and follow the best ones” is not a

practicalstrategy. But we argue that this is fine, since we are constructing a theory ofidealbehavior. It doesn’t have to be practically implementable.For example, in Solomonoff, S is defined by computability while R is allowed to be uncomputable. In the LIA construction, S is defined by polytime complexity while R is allowed to run slower than polytime. In logically-omniscient Bayes, finite sets of hypotheses can be manipulated in a finite universe but the full Boolean algebra over hypotheses generally cannot.

I hope the framework I’ve just introduced helps clarify what I find unpromising about these theories. By construction, any agent you can actually design and run is a

singleelement of S (a “practical strategy”), so every fact about rationality that can be incorporated into agent design gets “hidden inside” the individual s∈S, and the only things you can learn from the “ideal theory” R are things which can’t fit into a practical strategy.For example, suppose (reasonably) that model averaging and complexity penalties are broadly good ideas that lead to good results. But all of the model averaging and complexity penalization that can be done

computablyhappens inside some Turing machine or other, at the level “below” Solomonoff. Thus Solomonoffonlytells you about the extra advantage you can get by doing these thingsuncomputably. Any kind of nice Bayesian average over Turing machines that can happen computably is (of course) just another Turing machine.This also explains why I find it misleading to say that good practical strategies constitute “approximations to” an ideal theory of this type. Of course, since R just says to follow the best strategies in S, if you are following a very good strategy in S your behavior will tend to be close to that of R. But this cannot be attributed to

anyof the searching over S that R does, since you are not doing a search over S; you are executing asingle memberof S and ignoring the others. Any searching that can be done practically collapses down to a single practical strategy, and any that doesn’t is not practical. Concretely, this talk of approximations is like saying that a very successful chess player “approximates” the rule “consult all possible chess players, then weight their moves by past performance.” Yes, the skilled player willplay similarlyto this rule, but they are notfollowingit, not even approximately! They are only themselves, not any other player.Any theory of ideal rationality that wants to be a guide for embedded agents will have to be constrained in the same ways the agents are. But theories of ideal rationality usually get

all of their contentby going to a level above the agents they judge. So this new theory would have to be a very different sort of thing.I disagree. This is like saying, “we don’t need fluid dynamics, we just need airplanes!”. General mathematical formalizations like AIXI are just as important as special theories that apply more directly to real-world problems, like embedded agents. Without a grounded formal theory, we’re stumbling in the dark. You simply need to understand it for what it is: a generalized theory, then most of the apparent paradoxes evaporate.

Kolmogorov complexity tells us there is no such thing as a universal lossless compression algorithm, yet people happily “zip” data every day. That doesn’t mean Kolmogorov wasted his time coming up with his general ideas about complexity. Real world data tends to have a lot of structure because we live in a low-entropy universe. When you take a photo or record audio, it doesn’t look or sound like white noise because there’s structure in the universe. In math-land, the vast majority of bit-strings would look and sound like incompressible white noise.

The same holds true for AIXI. The vast majority of problems drawn from problem space would essentially be, “map this string of random bits to some other string of random bits” in which case, the best you can hope for is a brute-force tree-search of all the possibilities weighted by Occam’s razor (i.e. Solomonoff inductive inference).

I can’t speak to the motivations or processes of others, but these sound like assumptions without much basis. The reason I tend to define intelligence outside of the environment is because it generalizes much better. There are many problems where the system providing the solution can be decoupled both in time and space from the agent acting upon said solution. Agents solving problems in real-time are a special case, not a general case. The general case is: an intelligent system produces a solution/policy to a problem and an agent in an environment acts upon that solution/policy. An intelligent system might spend all night planning how to most efficiently route mail trucks the next morning, the drivers then follow those routes. A real-time model in which the driver has to plan her routs while driving is a special case. You can think of it as the drivers brain coming up with the solution/policy and the driver acting on it in situ.

You could make the case that the driver has to do on-line/real-time problem solving to navigate the roads and avoid collisions, etc. in which case the full solution would be a hybrid of real-time and off-line formulation (which is probably representative of most situations). Either way, constraining your definition of intelligence to only in-situ problem solving excludes many valid examples of intelligence.

Also, it doesn’t seem like you understand what Solomonoff inductive inference is. The weighted average is used because there will typically be multiple world models that explain your experiences at any given point in time and Occam’s razor says to favor shorter explanations that give the same result, so you weight the predictions of each model by the inverse of the length of the model (in bits, usually).

I think you’re confusing behavior with implementation. When people talk about neural nets being “universal function approximators” they’re talking about the input-output behavior, not the implementation. Obviously the implementation of an XOR gate is different than a neural net that approximates an XOR gate.

I’m definitely not treating these as interchangeable—my argument is about how, in a certain set of cases, they are importantly

notinterchangeable.Specifically, I’m arguing that certain characterizations of ideal behavior cannot help us explain why any given implementation approximates that behavior well or poorly.

I don’t understand how the rest of your points engage with my argument. Yes, there is a good reason Solomonoff does a weighted average and not an argmax; I don’t see how this affects my argument one way or the other. Yes, fully general theories can be valuable even when they’re not practical to apply directly to real problems; I was arguing that a

specific type offully general theory lacksa specific type ofpractical value, one which people sometimes expect that type of theory to have.In that case, your argument lacks value in its own right because it is vague and confusing. I don’t know any theories that fall in the “specific type” of general theory you tried to describe. You used Solomonoff as an example when it doesn’t match your description.

When someone develops a formalization, they have to explicitly state its context and any assumptions. If someone expects to use Kolmogorov complexity theory to write the next hit game, they’re going to have a bad time. That’s not Kolmogorov’s fault.

Of course it can. It provides a different way of constructing a solution. You can start with an ideal then add assumptions that allow you to arrive at a more practicable implementation.

For instance, in computer vision; determining how a depth camera is moving in a scene is very difficult if you use an ideal formalization directly, but if you assume that the differences between two point-clouds are due primarily to affine transformations, then you can use the computationally cheap iterative-closest-point method based on Procrustes analysis to approximate the formal solution. Then, when you observe anomalous behavior, your usual suspects will be the list of assumptions you made to render the problem tractable. Are there non-affine transformations dominating the deltas between point clouds? Maybe that’s causing my computer vision system to glitch. Maybe I need some way to detect such situations and/or some sort of fall-back.

Not only that, but there are many other reasons to formalize ideas like intelligence other than to guide the practical implementation of intelligent systems. You can explore the concept of intelligence and its bounds.

Again if you understand a tool for what it is, there’s no problem. Of-course trying to use a purely formalized theory directly to solve real-world problems is going to yield confusing results. Trying to engineer a bridge using the standard model of particle physics is going to be just as difficult. It’s not a fault of the theory, nor does it mean studying the theory is pointless. The problem is that you want it to be something it’s not.

It’s hard to engage much with your argument because it’s made up of vague straw men:

I have no solid context to engage you about. If you’re talking about AIXI, then you’ve misunderstood AIXI because it isn’t about choosing strategies out of a set of all strategies. In-fact, you’ve got Solomonoff Inductive inference completely wrong too:

Solomonoff inductive inference is defined in the context of an agent observing an environment. That’s all. It doesn’t take actions. It just observes and predicts. There is no set of strategies. There is no rule for selecting a strategy, and given your definition of S and R:

It doesn’t even make sense that R would be incomputable given that S is computable.

When you say:

On what grounds do you even justify the claim that the chess player’s behavior is “not even approximately” following the rule of “consult all possible chess players, then weight their moves by past performance.”?

Actually, what vanilla AIXI would prescribe is a full tree traversal similar to the min-max algorithm. Which is, of-course; impractical. However, there are things you can do to approximate a full tree traversal more practically. You can build approximate models based on experience like “given the state of the board, what moves should I consider” which prunes the width of the tree, and “given the state of the board, how likely am I to win” which limits the depth of the tree. So instead of considering every possible move at every possible step of the game to every possible conclusion, you only consider 3-4 possible moves per step and only maybe 4-5 steps into the future. Maybe diminishing the number of moves per step.

Did you edit your original comment? Because I could have sworn you said more disparaging the use of “arbitrary” weights. At any rate, it’s not a “performance-weighted average” as it isn’t about performance. It’s about uncertainty.

Thanks, this is a very clear framework for understanding your objection. Here’s the first counterargument that comes to mind: Minimax search is a theoretically optimal algorithm for playing chess, but is too computationally costly to implement in practice. One could therefore argue that all that matters is computationally feasible heuristics, and modeling an ideal chess player as executing a minimax search adds nothing to our knowledge of chess. OTOH, doing a minimax search of the game tree for some bounded number of moves, then applying a simple board-evaluation heuristic at the leaf nodes, is a pretty decent algorithm in practice.

Furthermore, it seems like there’s a pattern where, the more general the algorithmic problem you want to solve is, the more your solution is compelled to resemble some sort of brute-force search. There are all kinds of narrow abilities we’d like an AGI to have that depend on the detailed structure of the physical world, but it’s not obvious that any such structure, beyond hypotheses about what is feasibly computable, could be usefully exploited to solve the kinds of problem laid out in this sequence. So it may well be that the best approach turns out to involve some sort of bounded search over simpler strategies, plus lots and lots of compute.

I’ve written previously about this kind of argument—see here (scroll down to the non-blockquoted text). tl;dr we can often describe the same optimum in multiple ways, with each way giving us a different series that approximates the optimum in the limit. Whether any one series does well or poorly when truncated to N terms can’t be explained by saying “it’s a truncation of the optimum,” since they all are; these truncations properties are facts about the different series, not about the optimum. I illustrate with different series expansions for π.

You may be right, and there are interesting conversations to be had about when solutions will tend to look like search and when they won’t. But this doesn’t feel like it really addresses my argument, which is not about “what kind of algorithm should you use” but about the weirdness of the injunction to optimize over a space containing every procedure you could ever do, including all of the

optimizationprocedures you could ever do. There is a logical / definitional weirdness here that can’t be resolved by arguments about what sorts of (logically / definitionally unproblematic) algorithms are good or bad in what domains.My most recent preprint discusses multi-agent Goodhart ( https://arxiv.org/abs/1810.10862 ) and uses the example of poker, along with a different argument somewhat related to the embedded agent problem, to say why the optimization over strategies needs to include optimizing over the larger solution space.

To summarize and try to clarify how I think it relates, strategies for game-playing must at least implicitly include a model of the other player’s actions, so that an agent can tell which strategies will work against them. We need uncertainty in that model, because if we do something silly like assume they are rational Bayesian agents, we are likely to act non-optimally against their actual strategy. But the model of the other agent itself needs to account for their model of our strategy, including uncertainty about our search procedure for strategies—otherwise the space is clearly much too large to optimize over.

Does this make sense? (I may need to expand on this and clarify my thinking...)

Epistemic Status: Attempting to bridge what I see as a missing inferential link in the post / sequence.

(This is a point which I picked up on because I am familiar with what Abram was thinking about 3 years ago, and I was surprised it didn’t get mentioned. Maybe it was assumed to be obvious, maybe it’s not as relevant as I assumed, but I think some others will find the point worth a bit more explaining.)

The reason we care about the relative size of the world and the model is that we have a deep reason to think that a model smaller than the world cannot perform optimally—it’s the Conant-Ashby Theorem, which states “every good regulator of a system must be a model of that system.” For a great explanation of this idea, there is a paper that Abram pointed me to years ago, “Every good key must be a model of the lock it opens (The Conant & Ashby Theorem Revisited)” To quote from there:

“What all of this means, more or less, is that the pursuit of a goal by some dynamic agent (Regulator) in the face of a source of obstacles (System) places at least one particular and unavoidable demand on that agent, which is that the agent’s behaviors must be executed in such a reliable and predictable way that they can serve as a representation (Model) of that source of obstacles.”

To lay the connection out explicitly, if the agent model of the world is not isomorphic to the world, the actions chosen will be sub-optimal. This is bad if we assume the world is not isomorphic to a simple model (and this sequence is laying out reasons that for reflexive agents, there cannot be such a computational model.)

Abram has made a major update to the post above, adding material on self-reference and the grain of truth problem. The corresponding text on the MIRI Blog version has also been expanded, with some extra material on those topics plus logical uncertainty.

New material on paradoxes of self-reference

Revised material on logical uncertainty

Some of these issues (obviously) are not limited to AI. Specifically, the problem of how to deal with multi-level models and “composibility” was the subject of an applied research project for military applications by my dissertation chair, Paul Davis, here: https://www.rand.org/content/dam/rand/pubs/monographs/2004/RAND_MG101.pdf -

“The appealing imagery of arbitrary plug-and-play is fatally flawed for complex models… The more-complex [lower level] model components have typically been developed for particular purposes and depend on context-sensitive assumptions, some of which are tacit.”

This issue has formed the basis of a fair amount of his later work as well, but this work focuses on practical advice, rather than conceptual understanding of the limitations. Still, that type of work may be useful as inspiration.

This article talks about multi-level models, where you somehow switch between cheaper models and more-accurate models depending on your needs. Would it be useful to generalize this idea to switching between multiple “same-level” models that are differentiated by something other than cheap vs. accurate?

For example, one might have one model that groups individual people together into “families”, another that groups them into “organizations”, and a third that groups them into “ideologies”. None of those models seems to be strictly “higher” than another (e.g. neither families nor ideologies are composed of each other), and different models might be useful for different problems.

One could also imagine combining all of those into one unified model, of course. But it might be wasteful to model all of them for problems where you only really care about one.

I feel like humans do something like this.

If multiple “same-level” models can coexist, then one strategy for holding onto your values while inventing new models might be to always hold onto whichever model the values were originally defined in, even if you add more models alongside it.

I think this article is too vague, because for almost almost claims in it I am not sure if I understand the author correctly. Below I am posting my notes. If you want to help me and others clarify understanding of this article, consider answering

questions in bold, or, if you see a mistake in my notes, correcting it. Also I hope my notes help the author as a piece of feedback. I’ve only finished^{2}⁄_{3}of the article so far, but posting notes because I might become less interested in this later.Also it’s unfortunate that unlike in https://intelligence.org/2018/11/02/embedded-models/ version of this article we don’t have hyperlinks to explanations of various concepts here. Perhaps you could add them under the corresponding images? Or have images themselves be hyperlinks or reference links (like in academic articles) to the bottom of the document where all relevant links would be stored grouped by image number.

The post says an embedded agent can’t hold an exact model of the environment in its head, can’t think through the consequences of every potential course of action, can’t hold in its head every possible way the environment could be.

I think this may not be necessarily true and I am not sure what assumptions the author used here.It seems the whole article assumes countable probability spaces (even before the AIXI part). I wonder why and I wonder how realizability is defined for uncountable probability space.--

Regarding relative bounded loss and what this bound is for, my best guess is as follows. Here I use non-conditional probability notation p(x) instead of π(x). Let e be the elementary event that is actually true. Let “expert” h be a (not necessarily elementary) event, such that p(e,h)>0. Then loss of the expert is Lh=−logp(e∣h). Loss of the prior is L=−logp(e). For their difference it holds that L−Lh=lnp(e,h)p(e)p(h)≤−lnp(h).

Remember, p(h∣e)=p(e∣h)p(h)p(e). It follows that probability of h increases given evidence e if and only if p(e∣h)>p(e), i.e. h “is even a little bit more correct than you”.

But I don’t understand the bit about copying the expert h precisely before losing more than logπ(h). If the expert is an event, how can you “copy” it?I don’t understand this. Our probability space is the cartesian product of the set of all possible UTM programs and the set of all possible UTM working tape initial configurations. Or, equivalently, the set of outputs of UTM under these conditions. Hence our whole hypothesis space only includes computable worlds.

What does “can learn to act like any algorithm” mean here?“It’s getting bounded loss on its predictive accuracy as compared with any computable predictor.” Huh?Does predictor here mean expert h? If yes, what does it mean that h is computable and why? All in all, is the author claiming it’s impossible to have a better computable predictor than AIXI with Solomonoff prior, even if it has non-computable worlds in the probability space?What do these mean?I only know informally what calibration means related to forecasting.How is AIXI even defined without realizability, i.e. when the actual world isn’t in the probability space, or it has zero prior probability?Is this about the world changing because of the agent just thinking? Or something else?From the former paragraph I don’t understand anything except that (the author claims) game theory has more problems with grain of truth / realizability, than AIXI. After the latter paragraph, my best guess is: for any game, if there is no pure strategy equilibrium in it, then we say it has no grain of truth, because for every possible outcome rational agents wouldn’t choose it.

Weights represent possible worlds, therefore they are on the scales right from the beginning (the prior), we never put new weights on the scales.

My probably incorrect guess of what the author is saying issome agent which acts like AIXI but instead of updating on pieces of evidence as soon as he receives it, he stockpiles it, and at some points he (boundedly) searches for proofs that these pieces of evidence are in favor of some hypothesis and performs update only when he finds them.But still, why oscillation?I interpret it as there are infinitely many theorems, hence an agent with finite amount of space or finite amount of computation steps can’t process all of them.

No idea what the second quoted paragraph means.All in all, I doubt that high level world models are necessary. And it’s very not clear what is meant by “high level” or “things” here. Perhaps embedded agents can (boundedly) reason about the world in other ways, e.g. by modeling only part of the world.

https://intelligence.org/files/OntologicalCrises.pdf explains the ontological crisis idea better. Suppose our AIXI-like agent thinks the world is an elementary outcome of some parameterized probability distribution with the parameter θ. θ is either 1 or 2. We call the set of elementary outcomes with θ=1 the first ontology (e.g. possible worlds running on classical mechanics), and the set of elementary outcomes with θ=2 the second ontology (e.g. possible worlds running on superstrings theory). The programmer has only programmed the agent’s utility functiom for θ=1 part, i.e. a u function from ontology 1 to real numbers. The agent keeps count of which value of θ is more probable and chooses actions by considering only current ontology. If at some point he decides that the second ontology is more useful, he switches to it. The agent should extrapolate the utility function to θ=2 part. How can he do it?

I can follow most of this, but i’m confused about one part of the premise.

What if the agent created a low-resolution simulation of its behavior, called it Approximate Self, and used that in its predictions? Is the idea that this is doable, but represents a unacceptably large loss of accuracy? Are we in a ‘no approximation’ context where any loss of accuracy is to be avoided?

My perspective: It seems to me that humans also suffer from the problem of embedded self-reference. I suspect that humans deal with this by thinking about a highly approximate representation of their own behavior. For example, when i try to predict how a future conversation will go, i imagine myself saying things that a ‘reasonable person’ might say. Could a machine use a analogous form of non-self-referential approximation?

Great piece, thanks for posting.

In order to do this, the agent needs to be able to reason approximately about the results of their own computations, which is where logical uncertainty comes in

What if the agent which is a quantum mechanical intelligence CAN temporarily tunnel out of the environment long enough to make certain key observations/measurements. It could be both in the embedded environment AND out at the same time as a hyper wavefunction or in the form of its own pilot wave? Thinking as a human is a quantum mechanical process to a degree. You cannot change a system from within it is a psychological norm, however if the agent is quantum mechanical in nature then it is likely neither particle nor wave but something undeterminable by other agents. The agent might be in quantum flux indefinately n’est pas? Hence incompleteness theorem in both physics and mathematics.

Not sure why the above comment was downvoted to −15. It’s a fair question, even if the person asking seems to misinterpret both quantum mechanics and mathematical logic. Quantum mechanics seems to be an accurate description of the “lower levels” of the agent’s model of the universe, and mathematical logic is a useful meta-model that helps us construct better quality models of the universe. They are not, as far, as I know, interrelated, and there is no “hence”. Additionally, while quantum mechanics is a good description of the microscopic world, it is much less useful at the level of living organisms (though ion channel opening and closing reflects the underlying quantum-mechanical tunneling), so there is no indication that human thinking is inherently quantum mechanical and could not be some day implemented by a classical computer without a huge complexity penalty.