This is the second part of a two posts series explaining the Internal Model Principle and how it might relate to AI Safety, particularly to Agent Foundations research. In the first post, we constructed a simplified version of IMP that was easier to understand and focused on building intuition about the theorem’s assumptions. In this second post, we explain the general version of the theorem as stated by Cai&Wonham[1] and discuss how it relates to alignment-relevant questions such as agent-structure problem and selection theorems.
We present the more general setup and explain Feedback and Regulation conditions. With those, we prove a first result. Then, we discuss the meaning of this first result and add another condition called “Observability” to derive another result. After this, we provide a worked out example and discuss possible extensions of the theorem.
Setup
The IMP from Cai&Wonham[1] originally models the following control theory situation:
Suppose there’s, for example, a chemical powerplant in a given environment. The temperature of the plant should be constant and the plant’s temperature is affected by the environment’s temperature, since they exchange heat, so there is a temperature system made of heaters and air conditioning systems inside the plant, controlled by a controller. This controller controls this systems: it receives information from the environment’s temperature and passes a signal to the temperature inside the plant to counter-act the effect of the environment:
If the environment is hot, air conditionings are turned on.
If the environment is cold, heaters are turned on.
Naturally, one thing that could happen is that the controller could receive information from the environment and then, after that, control the temperature system. The theorem shows that, under specific circumstances, the controller actually “foresee” the environment’s temperature and control the plant temperature concurrently to the environment’s temperature changing. We use “foresee” because what the theorem really shows is that, under some circumstances, the controller is autonomous (i.e, it doesn’t use outside information to decide how to control the temperature system) and it controls the temperature system in a way that counter-acts the environment’s effect (even without using information from the environment).
“Environment (E), Controller (C) and Plant (P) are dynamic systems which generate, via suitable output maps, the reference, control and output signals respectively. The objective of regulation is to ensure that the output signal coincides (eventually) with the reference, namely the system ‘tracks’. To this end the output is ‘fed back’ and compared (via ⊗) to the reference, and the resulting tracking error signal used to ‘drive’ the controller. The latter in turn controls the plant, causing its output to approach the reference, so that the tracking error eventually (perhaps as t→∞) approaches ‘zero’. Our aim is to show that this setup implies, when suitably formalized, that the controller incorporates a model of the Environment: this statement is the Internal Model Principle.”
We model the world as any set X. For example, we could consider the whole world as X=E×C×P, comprised of environment, controller and plant. We assume there’s a discrete and deterministic world dynamics α:X→X that specifies how the world evolves.
We could also consider P and C as a single joint system called internal system (or just controller), and then we’d have X=E×C (we’re calling C×P just C).
We could consider, for example, X=E×C×P (left) or X=E×C (right)
We say a system is autonomous if its next state depends only on the previous state. For example, if A and B are sets, then the function f:A→A is autonomous and the function g:A×B→A is not. We’ll just say “A is autonomous” as a slight abuse of terminology. We’ll usually write XS⊂S to denote that XS is an autonomous system inside S - while S don’t need necessarily to be autonomous, for S=E,C,P. We assume the environment is autonomous (intuitively, because we can’t control the environment, it’s a given) with update rule αE:XE→XE, where XE⊆E. We assume, however, the environment is always autonomous, so E=XE.
We’ll denote by K⊂X the set of good states (that is, the set we want our system to be in after a large number of time-steps). It could be, for example, states where the plant’s temperature is constant, for any environment state. A important thing to notice is that K is not necessarily the set of states such that the system converges to, because it need not be α-invariant—that is, we usually say a system converges to a part of it if it remains in that part after convergence—and it may not be the case that for x∈K,α(x)∈K - we haven’t added any specific condition ensuring that. We don’t care about what happened “prior to convergence”, so all the states and assumptions we care about will be true on K.
When we consider X=E×C, the set of good states K⊆X is a region on the E×C-plane. Note that K is not necessarily a rectangle and it can be any region of the plane.
Since K⊂X, we can project states out of K. For example, if X=E×C×P, we can write the canonical projection πC:E×C×P→C given by πC((xE,xC,xP))=xC. More generally, we could have any function γ:X→C acting as “projection”. We define XC:=γ(X)⊆C, that is, XC’s elements are controller states that we get from projecting good states to controller states. Recall that earlier we said we would use the “XS” notation to denote autonomous systems, and one of the conclusions of the theorem is that XC will be autonomous—i.e, we will be able to define a function αC:XC→XC that is also interesting to us. At last, note that, by definition, γ:X→XC is surjective.
So until now in our setup we have:
X world, α:X→X
XE environment, αE:XE→XE
K⊆X set of good states
γ:X→XC , XC denoting all the controller states we’re interested in
At this point, the reader might be tempted to ask two things:
Ok, the controller states are obtained through the world via a “projection” γ, but what about the environment states?
Is it always true that the environment and world dynamics are consistent with each other? For example, if X=XE×XC, it could be that πE(α(xE,xC))≠αE(πE(xE,xC))
To answer these questions, we’ll first play around with the X=XE×XC setup, and then arrive at a property that can be abstractly imposed to other setups.
If X=XE×XC, we have that XE=πE(X). Here, it’s not true that XE⊆X, but it’s obvious that XE is “a part” of X or “like a subset” of X. We want a general property to relate XE and X such that XE is “like a subset” of X. In mathematics, we use the notion of insertion to work with this. If A⊂B, one can trivially define an injection between A and B (the identity, for example). If A⊈B, we think of A as “like a subset” of B if one can define an injective function i:A→B. So in general we can just ask for XE and X to be such that there’s an injective function iE:XE→X. This partly answers question 1. Now, we’ll look closer at insertions to understand if any insertion is allowed.
(For the following, if you’re not familiar with equivalence relations and partitions, go read this section of my first post)
Note that πE:X→XE is not necessarily invertible, but πE:XmodkerπE→XE is a bijection (exercise for the reader), so invertible.
Each xE point determines a line [xE]={(c,e)∈X;πE(c,e)=xE}. Then, we can define π−1E(xE)=[xE]
XmodkerπE is a set containing lines parallel to the XE-axis, so while πE:X→XE maps points on X to points on XE, πE:XmodkerπE→XE maps lines to points on XE. The inverse π−1E:XE→XmodkerπE takes a point in XE and map it to the specific parallel line such that, when projected over XE, it reaches xE. Hence, if we want to define an insertioniE:XE→X, we can do the following:
Given xE, compute π−1E(xE), this is a line.
Choose any point in the line and define it to be iE(xE).
By this procedure, if XE=R, for example, we can define an uncountable amount of insertions between XE and X.
Note, however, that we want the environment to be part of the world, and thus to evolve consistently with the world. For a given xE∈XE, we have two ways of producing updates of the world now:
Compute iE(xE)∈X, then compute α(iE(xE))∈X
Compute αE(xE), and then iE(αE(xE))∈X
They don’t necessarily need to be the same.
Thus, to answer question 1, we ask that XE and X are such that there’s iE:XE→X injective.
To answer question 2, we ask that this injection iE is such that iE(αE(xE))=α(iE(xE)),∀xE∈XE.
We define ~XE:=iE(XE).
Note that, if XE⊆X, this is the same as asking α|XE=αE, so deep down we’re asking something related to XE being α-invariant. Indeed, ~XE is α-invariant because:
if x∈~XE, then x=iE(xE) and α(x)=α(iE(xE))=iE(αE(xE))∈~XE
Assumptions
Feedback Condition
We want to state something that enables us to proof the controller is autonomous and it models the environment. In the first post, we argued that one way to ensure that is by asking the feedback condition:
kerγ≤kerγ∘α
We want to be able to model a situation where the controller, prior to convergence, needs information from the environment to “learn” how the environment works. Then, after a large number of time-steps, the controller learns the environment’s behaviour and becomes autonomous, so instead of asking full feedback, we only ask the controller to be autonomous on good states, that is,
kerγ|K≤kerγ∘α|K
Regulation Condition
We want all environment states to be good states, i.e, to have reached convergence. Since XE is not necessarily a subset of X, we ask
~XE⊆K
The Controller Models the Environment
With this assumptions and setup, we’re already able to prove the first part of the theorem, that states that the controller models/tracks the environment
Conclusion (1)
There exists a unique mapping αC:XC→XC determined by αC∘γ|K=γ∘α|K
Proof:
Let xC∈XC, then xC=γ(x), for x∈K. Define αC(xC)=γ∘α(x). αC is well defined because if xC=γ(x)=γ(x′), then(x,x′)∈kerγ|K⟹(x,x′)∈kerγα|K⟹γ(α(x))=γ(α(x′)).
The mapping αC is uniquely determined by αC∘γ|K=γ∘α|K: let αC,1 and αC,2 satisfying αC,i∘γ|K=γ∘α|K. Then, for any xC=∈XC,xC=γ(x) for some x∈K, since γ is surjective and thus αC,1(xC)=αC,1(γ(x))=γ(α(x))=αC,2(γ(x))=αC,2(xC)
This point guarantees that we have an autonomous αC:XC→XC and that it
Conclusion (2)
It’s true that αC∘γ|~XE=γ∘α|~XE
Proof: Let x∈~XE, since ~XE⊂K by the regulation condition, x∈K and hence αC∘γ(x)=γ∘α(x)=γ∘α|~XE(x), since x∈~XE.
Note that if XE⊆X, we have α|XE=αE, so this states that αC∘γ|XE=γ∘αE, that is, the controller tracks the environment. For more comments on why αC∘γ|XE=γ∘αE means the controller tracks/models the environment, check out my first post and the section below.
Note that α|~XE can be thought of as the environment dynamics inserted into the world.
We say the “internal model” in the controller is the pair consisting of the set XM:=γ|~XE(~XE) and the rule αC|XM
In what sense is this a ‘model’?
We now digress about the meaning of the theorem above.
On this session, it’s helpful to think that XE⊆X, thus αE=α|XE. The second conclusion of the theorem is αC∘γ|XE=γ|XE∘αE, that is, αC evolves according to αE autonomously: Consider the environment is in state xE∈XE at time-step t, its next state is αE(xE) and the internal system state (which can be calculated via γ) is γ|XE(αE(xE)) in time-step t+1. On the other hand, γ(xE) is the system’s state in time-step t, and in time-step t+1 it is αC(γ(xE)). The theorem ensures those two expressions for the internal system’s state in time-step t+1 are equal.
This is qualitatively different from the internal system receiving information about the environment and one time-step later updating to track the environment. The internal system is updating accordingly to the environment in the same time-step.
To clarify this, consider an analogy of a dog running after a beetle. The dog is initially in the position d0 and the beetle in the position b0. One thing that might happen is that the dog see the beetle’s position and starts to move there. When the dog arrives, the beetle already left that position, and is now in a new position. So the dog starts moving to this new beetle position, and when he arrives, the beetle already left again, and so on.
Another qualitatively different thing that can happen is that the dog actually understands (i.e, “models”) how the beetle will move. So instead of moving to the beetle’s current position, it moves to where it predicts the beetle will be in the future, so that it can catch it.
The theorem also states αC is unique, so it can’t be the case that there’s other dynamics in a system that satisfies perfect regulation and feedback structure. Hence, the dynamics on the internal model will necessarily be the one that models the environment.
In the dog-beetle analogy, the theorem guarantees that the second scenario will necessarily happen if the dog behavior satisfyies our hypothesis.
The Controller Faithfully Models the Environment
To understand this session better, we recommend the reading of the first post. There, we explain more intuitively the key ideas used here.
In our first post, we were able to prove an analogous result with only feedback structure condition. There, we didn’t need to worry about the difference between environment and world and good states. The last theorem is a particular scenario of this post’s theorem when we have XE=K=X and αE=α|XE.
Analogous to the first post, the results proved above includes pathological cases, where, for example, the controller could be XC={xC,1,xC,2} and the environment ~XE=N. Even if we had that αC∘γ|~XE=γ∘α|~XE, the controller wouldn’t have the “expressivity” to faithfully model the environment: it doesn’t have enough states to do so. We’ll now introduce the observability condition and prove a lemma to conclude the controller faithfully models the environment if, additionally to the prior assumptions, observability is also satisfied.
Observability can be stated as
inf{kerγ∘αn|~XE;n=0,1,2,…}=⊥
For intuition on what does this mean and why we ask this, check out this section of the first post
Generalized Feedback Lemma
We’ll prove a generalization of the feedback structure condition in an analogous way we did in the first post, we’ll use that together with observability to conclude the controller faithfully models the environment.
(Base case): If k=1, by feedback condition, we knowkerγ|K≤kerγ∘α|K. Since ~XE⊆K by regulation condition, (x,y)∈kerγ|~XE⟹(x,y)∈kerγ|K⟹(x,y)∈kerγ∘α|K, but since ~XE is α-invariant, and (x,y)∈~XE, then α(x),α(y)∈~XE, so (x,y)∈kerγ∘α|~XE.
(Induction Hypothesis): Suppose now the theorem holds up until n.
(Inductive step) Let x,y∈~XE, then, by definition of kernel of a function, (x,y)∈kerγ∘αn|~XE⟺(αn(x),αn(y))∈kerγ|~XE
Let s:=αn(x),t:=αn(y). Since ~XE is α-invariant, we know s,t∈~XE. So the equivalence above gives us (s,t)∈kerγ|~XE. By the base case, we have that (α(s),α(t))∈kerγ∘α|~XE, but (α(s),α(t))=(αn+1(x),αn+1(y)), so the result follows by induction.
Conclusion (3)
γ|~XE is injective
Proof:
By generalized feedback lemma, inf{kerγ∘αn|~XE;n=0,1,2,…}=kerγ|~XE
By observability, inf{kerγ∘αn|K;n=0,1,2,…}=⊥, but since ~XE⊂K, kerγ∘αn|K≤kerγ∘αn|~XE, for any n. Hence, inf{kerγ∘αn|~XE;n=0,1,2,…}≤inf{kerγ∘αn|K;n=0,1,2,…}=⊥
So kerγ|~XE=⊥, so γ|~XE injective
In the next section, we explain why γ|~XE being an injection provides us the meaning of faiyhfully modeling.
Cardinality and expressivity
In the first post, we concluded γ:X→XC satisfying the theorem’s hypothesis is necessarily a bijection. In terms of cardinality, this means, by definition, that |XC|=|X|.
In this post, we concluded γ:XE→XC injection. In cardinality terms, |XE|≤|XC|.
Note that the elements of XE fully represents the description of each environment state. Say the environment is the outside of a chemical plant and say the chemical reactions being done there depends on the temperature and pressure of the plant, which in turns depends on the temperature and pressure of the environment. Then, elements xE∈XE could be ordered pairs xE=(TE,PE) comprising the external temperature and pressure. On the other hand, if there’s an additional relevant environment feature, say, the water delivered from the environment need to be treated (for example, because otherwise the water used in the reactions will be contaminated), one would want to consider an environment state as xE=(TE,PE,WE), where WE stands for “quality of water”. In other words, WE is a relevant feature and if we don’t include it in the environment, it’s as if the environment is not “expressive” enough to model the whole system, and thus even if the theorem applies and the controller of the plant develops an internal model, it could be that it runs poorly, i.e, have bad performance overall (note, though, that we haven’t discussed what it means for an internal model to have good performance in reality).
The point here is that the elements of XE are the only piece of our framework that encompasses the notion of “expressivity”. All the notion of “expressivity” in our setup comes from the sets XE and XC (for the controller, you could also say the expressivity comes from γ, since XC is determined by γ, i.e, γ(X)=XC).
The example above illustrated one way the states can show expressivity, that is, by the structure of the set of states: If X⊆R14, we perhaps have 14 different relevant features, that could be independent or not. If X⊆N10, we have 10 features with natural values.
Another way the states have expressivity is via the state set cardinality, as we talked in the beginning of this section: if X has |X|=10 (that is, 10 elements), it can only represent 10 different states. If |X|=|N|, then there’s a countable quantity of different states. Recall that cardinality is defined via mappings:
f:X→Yinjection ⟺|X|≤|Y|
f:X→Ysurjection ⟺|X|≥|Y|
f:X→Y bijection ⟺|X|=|Y|
Thus, a state set X exhibits expressivity in two ways:
Via X’s internal structure
Via X’s cardinality
Our theorem states that a necessary condition for the internal system to have an internal model of the environment is |XE|≤|XC|, that is, in order for the internal system to model the environment, it must exhibit at least as much expressivity as the controller (in the second sense of expressivity). In other words, if the internal system isn’t at least as expressive as the environment, the internal system necessarily can’t have an internal model of the environment.
Summarizing the whole theorem-meaning discussion,
The semantics of the world “model” in the theorem means faithfully tracking/simulating. That is, for states in ~XE (which the internal system views as XM) the internal system doesn’t wait for the environment and then update a timestep later following the environment. Instead, it updates accordingly to the environment in the same time step. We’ll discuss more about that in a later section
Conclusion (1) guarantees αC - the autonomous dynamics on the controller—is well defined and unique. Conclusion (2) states that αC simulates αE and Conclusion (3) says this simulation is faithful.
Comparing the less general version of the theorem in the first post with this version,
On the left, the setup of the first post. On the right, the setup of the second post—we’re calling ~XE:=XE, since generally XE may not be a subset of X
This theorem does apply for systems where γ is not a bijection. In fact, γ is injective on XE and surjective on K. It also seems useful to model a wider range of systems.
Our previous version of the theorem is a particular case of this, with XE=X=K and αE=α.
~XE⊆K and XC⊆γ(K) are the conditions ensuring all the states involved in the internal modeling are good states. Thus, this theorem is supposed to apply after the regulation has “converged” in some sense (such as t→∞).
Thus, modeling here means tracking faithfully.
Example
Consider an unidimensional circular grid of size N (i.e, a row of N squares such that the first square is connected to the last square) with a moving target and an agent that has the goal to pursue that target. The target and the agent can move left and right and are always inside some square of the grid.
The squares of our grid will be the set N:={0,…,N−1}.
We want the world to be able to describe the agent and the moving target, each moving in this N sized grid, so we will consider a state of the world as an ordered pair (i,j) with i,j∈{0,…,N−1}=N and so we’ll define the world as X:=N×N. Here, the first coordinate of the pair represents the position of the moving target and the second coordinate represents the position of the agent.
We will consider the environment to be the position of the moving target on the grid, thus XE:=N.
We assume the moving target always moves one square to the right. That is, αE:XE→XE is given by αE(i):=i+1modN, where the modN is to account for the fact that the world is circular.
We’ll define the dynamics of the world α:X→X by α(i,j)=(i+1modN,j+1modN) if i=j and α(i,j)=(i+1modN,jmodN) if i≠j
Consider jE:XE→X defined by jE(xE):=(xE,xE). Then, ~XE:=jE(XE)⊆X andjE∘αE(xE)=jE(αE(xE))=jE(xE+1modN)=(xE+1modN,xE+1modN)=α((xE,xE))=α(jE(xE))=α∘jE
In other words, let α~XE:=jE∘αE=α∘jE=α|~XE (since jE is injective). Thus we can use the internal model theorem with ~XE=jE(XE) as environment instead of XE directly.
For the agent,
XC should represent the states the agent can be in, hence XC:={0,…,N−1}
γ:X→XC defined by
Clearly, γ is a surjection: given j∈XC,∃x=(i,j)∈X s.t. γ(x)=xC,∀i∈{0,…,N−1}
We want our agent to pursue the moving target, so we will define K:={(i,j)∈X|i=j}
This setup satisfy our hypothesis:
~XE={(xE,xE)∈X|xE∈XE}⊆K={(i,j)∈X|i=j} and α~XE=α|~XE
γ(K)=N=XC
kerγ|K≤kerγ∘α|K.
Suppose x,y∈K such that (x,y)∈kerγ⟺γ(x)=γ(y). Since x,y∈K,x=(i,i) and y=(j,j). Then, i=γ(x)=γ(y)=j⟹i=j. Now, γ∘α(i,i)=γ∘α(j,j) since i=j. Thus, (x,y)∈kerγ∘α.
~XE is α-invariant.
Let (xE,xE)∈~XE, then α((xE,xE))=(xE+1modN,xE+1modN)=jE(xE+1) if xE+1≤N−1 and =jE(0) if xE+1=N. That is, α((xE,xE))=jE(x′E)∈~XE for some x′E∈XE.
infkerγ∘αn|~XE=⊥
This follows from feedback and regulation and α-invariance of ~XE: Since feedback holds and ~XE⊆K, then kerγ|~XE≤kerγ∘α|~XE. Because ~XE is α-invariant, one can prove the generalized feedback lemma for the sequence kerγ∘αn|~XE. Then infkerγ∘αn|~XE=⊥
Since the theorem’s assumptions are all true, we already know that there’s a unique αC determined by αC∘γ=γ∘α.
That is, xC=γ(x)=γ((i,j))=j, αC(xC)=γ(α(x))=γ(α(i,j))=jmodN if i≠j and =j+1modN if i=j.
Then the pair XM=γ|~XE(~XE) and αC|XM is our internal model.
Discussion and further work
We presented the Internal Model Theorem statement, which basically states that, under a setup of an external system passing signals to an autonomous internal system syuch that these signals satisfy observability, then: feedback structure and regulation implies the controller necessarily have an internal model of the external system.
Some critiques:
As we discuss in the first post, the update rules involved are all deterministic. Thus, the theorem can’t represent non-deterministic scenarios such as taking decisions under uncertainty
Note, though, that the theorem is supposed to model regulation after some sort of equilibria or convergence. I expect intuitively many systems in these situations have at least an approximately deterministic behavior
It’s really hard to come up with non-trivial examples for the theorem. This difficulty is particularly present when trying to come up with examples that satisfy Feedback and Observability together
In fact, I’d be glad to see non-trivial examples of this theorem. If you know or construct one, please write in the comments or send it
IMP as a selection theorem
A selection theorem is a theorem that states something in the lines of “this is the type of agents we expect to find in environments with this specific type of selection pressure. Currently, the theorem doesn’t say much because the controller is not properly an agent. It’s an autonomous system, but it doesn’t act on the world. Phrased as a selection theorem wannabe, the IMP states that in environments where the controller is autonomous in good states, the controller acts as/tracks the environment internal model.
We can think of extending the theorem in two directions
“Systems that approximately regulate their internal state have approximate internal models”, we would get a better version of a selection theorem.
I’d expect this extended theorem to work for a wider range of real world systems.
This still wouldn’t make the controller look more agentic because it still wouldn’t properly act on the world.
“The controller acts upon the environment but it has an internal model of the environment.”
This would make the controller more agentic.
The first point above would make the theorem more useful to real systems, while the last would make the controller feel more like an agent. I’d think those two extensions would provide a selection theorem.
Relevance to Agent Structure problem
The agent-like structure problem is the problem of determining if, given a policy that robustly optimizes far away regions of the space into small chunks of the state space, does this policy has agent structure (by agent structure we mean informally having an internal model and a search process)? Another way to phrase this questions is “under which types of environments is the implication above true?”
Alex gave a loose formalism to answer this question, making some notions more precise:
“If we take some class of behaving things and apply a filter for agent-like behavior, do we end up selecting things with agent-like architecture (or structure)? Under which conditions does agent-like behavior imply agent-like structure?”
This loose formalism consist of a policy and an environment. The policy receives an observation from the environment, updates its internal state and acts on the environment, changing its state. Then the environment sends a new observation to the policy and so on, in discrete time-steps. The policy is, thus, a function that sends each policy state and observation to a policy state. Analogously, the environment sends each environment state and action to an environment state.
We can define a class of different policies and different environments.
In Alex’s words “Here we “run” the policy πi in each environment, producing a collection of trajectories of states. Each state has some measure of goodness or badness, indicated here by the subscript and color. The policy does well (“optimizes”) in some environment, and poorly in others.”
The idea of the formalism is being able to assess a function that associates to each policy in the policy class a number in [0,1], thought of as the “degree of agent structure”. We expect policies that perform well in a wide range of environments are highly “agentic”. Based on the performance of a policy in a wide range of environments, we want to be able to tell the degree of agent structure this policy has.
In Alex’s words,
“One result we might wish to have is that if you tell me how many parameters are in your ML model, how long you ran it for, and what its performance was, then I could tell you the minimum amount of agent structure it must have”.
The important idea here is that we expect to be able to define a function ϵ depending on the parameters, training, performance or other relevant variable such that it’s always true that the structure of a given policy is ≥1−ϵ. Since we want the structure function to be between 0 and 1, ϵ=1 would always make this statement true, but we want this function to be zero in the limit, in some sense of limit.
More concretely, in the agent structure setup, we want to be able to:
Have a measure of performance of a given policy in a wide range of environments.
Filter the policy class (because a lookup table containing “optimal” moves for each environment in the environment class would have a very good performance, but we would expect a lookup table would be the policy with less agent structure amongst all policies).
Define ϵ appropriately such that its limit is zero in some sense of asymptotics.
The IMP setup fails these conditions because
It doesn’t encompass a measure of performance of different policies. Actually, we’ve already proved the dynamics of the internal system is unique under IMP conditions, so we would actually have only one policy.
The IMP’s setup doesn’t seem to consider a policy acting over an environment. The internal system (the analogous of policy here) can be influenced by the environment (via γ), but can’t influence the environment. In a self-driving car analogy, it’s like the car being in position x=10 or x=20, moving or not corresponds to same environment states.
We wish to extend the IMP in two different ways, addressing those two problems:
Modify the IMP to show that systems which approximately regulate their internal state must have approximate models of their environments.
This could give us a notion of performance and different policies.
Rework the IMP so that it applies to controllers regulating the external environment (rather than regulating their internal state).
This would solve the fact that in current IMP, the internal system doesn’t interact with the environment.
We expect if one can extend the theorem to these two different situations, it might give some insight on the agent structure problem.
Distilling the Internal Model Principle part II
This post was written during the agent foundations fellowship with Alex Altair funded by the LTFF.
Introduction
This is the second part of a two posts series explaining the Internal Model Principle and how it might relate to AI Safety, particularly to Agent Foundations research. In the first post, we constructed a simplified version of IMP that was easier to understand and focused on building intuition about the theorem’s assumptions. In this second post, we explain the general version of the theorem as stated by Cai&Wonham[1] and discuss how it relates to alignment-relevant questions such as agent-structure problem and selection theorems.
We present the more general setup and explain Feedback and Regulation conditions. With those, we prove a first result. Then, we discuss the meaning of this first result and add another condition called “Observability” to derive another result. After this, we provide a worked out example and discuss possible extensions of the theorem.
Setup
The IMP from Cai&Wonham[1] originally models the following control theory situation:
Suppose there’s, for example, a chemical powerplant in a given environment. The temperature of the plant should be constant and the plant’s temperature is affected by the environment’s temperature, since they exchange heat, so there is a temperature system made of heaters and air conditioning systems inside the plant, controlled by a controller. This controller controls this systems: it receives information from the environment’s temperature and passes a signal to the temperature inside the plant to counter-act the effect of the environment:
If the environment is hot, air conditionings are turned on.
If the environment is cold, heaters are turned on.
Naturally, one thing that could happen is that the controller could receive information from the environment and then, after that, control the temperature system. The theorem shows that, under specific circumstances, the controller actually “foresee” the environment’s temperature and control the plant temperature concurrently to the environment’s temperature changing. We use “foresee” because what the theorem really shows is that, under some circumstances, the controller is autonomous (i.e, it doesn’t use outside information to decide how to control the temperature system) and it controls the temperature system in a way that counter-acts the environment’s effect (even without using information from the environment).
In Cai&Wonham’s[1] book,
We model the world as any set X. For example, we could consider the whole world as X=E×C×P, comprised of environment, controller and plant. We assume there’s a discrete and deterministic world dynamics α:X→X that specifies how the world evolves.
We could also consider P and C as a single joint system called internal system (or just controller), and then we’d have X=E×C (we’re calling C×P just C).
We say a system is autonomous if its next state depends only on the previous state. For example, if A and B are sets, then the function f:A→A is autonomous and the function g:A×B→A is not. We’ll just say “A is autonomous” as a slight abuse of terminology. We’ll usually write XS⊂S to denote that XS is an autonomous system inside S - while S don’t need necessarily to be autonomous, for S=E,C,P. We assume the environment is autonomous (intuitively, because we can’t control the environment, it’s a given) with update rule αE:XE→XE, where XE⊆E. We assume, however, the environment is always autonomous, so E=XE.
We’ll denote by K⊂X the set of good states (that is, the set we want our system to be in after a large number of time-steps). It could be, for example, states where the plant’s temperature is constant, for any environment state. A important thing to notice is that K is not necessarily the set of states such that the system converges to, because it need not be α-invariant—that is, we usually say a system converges to a part of it if it remains in that part after convergence—and it may not be the case that for x∈K,α(x)∈K - we haven’t added any specific condition ensuring that. We don’t care about what happened “prior to convergence”, so all the states and assumptions we care about will be true on K.
Since K⊂X, we can project states out of K. For example, if X=E×C×P, we can write the canonical projection πC:E×C×P→C given by πC((xE,xC,xP))=xC. More generally, we could have any function γ:X→C acting as “projection”. We define XC:=γ(X)⊆C, that is, XC’s elements are controller states that we get from projecting good states to controller states. Recall that earlier we said we would use the “XS” notation to denote autonomous systems, and one of the conclusions of the theorem is that XC will be autonomous—i.e, we will be able to define a function αC:XC→XC that is also interesting to us. At last, note that, by definition, γ:X→XC is surjective.
So until now in our setup we have:
X world, α:X→X
XE environment, αE:XE→XE
K⊆X set of good states
γ:X→XC , XC denoting all the controller states we’re interested in
At this point, the reader might be tempted to ask two things:
Ok, the controller states are obtained through the world via a “projection” γ, but what about the environment states?
Is it always true that the environment and world dynamics are consistent with each other? For example, if X=XE×XC, it could be that πE(α(xE,xC))≠αE(πE(xE,xC))
To answer these questions, we’ll first play around with the X=XE×XC setup, and then arrive at a property that can be abstractly imposed to other setups.
If X=XE×XC, we have that XE=πE(X). Here, it’s not true that XE⊆X, but it’s obvious that XE is “a part” of X or “like a subset” of X. We want a general property to relate XE and X such that XE is “like a subset” of X. In mathematics, we use the notion of insertion to work with this. If A⊂B, one can trivially define an injection between A and B (the identity, for example). If A⊈B, we think of A as “like a subset” of B if one can define an injective function i:A→B. So in general we can just ask for XE and X to be such that there’s an injective function iE:XE→X. This partly answers question 1. Now, we’ll look closer at insertions to understand if any insertion is allowed.
(For the following, if you’re not familiar with equivalence relations and partitions, go read this section of my first post)
Note that πE:X→XE is not necessarily invertible, but πE:XmodkerπE→XE is a bijection (exercise for the reader), so invertible.
XmodkerπE is a set containing lines parallel to the XE-axis, so while πE:X→XE maps points on X to points on XE, πE:XmodkerπE→XE maps lines to points on XE. The inverse π−1E:XE→XmodkerπE takes a point in XE and map it to the specific parallel line such that, when projected over XE, it reaches xE. Hence, if we want to define an insertioniE:XE→X, we can do the following:
Given xE, compute π−1E(xE), this is a line.
Choose any point in the line and define it to be iE(xE).
By this procedure, if XE=R, for example, we can define an uncountable amount of insertions between XE and X.
Note, however, that we want the environment to be part of the world, and thus to evolve consistently with the world. For a given xE∈XE, we have two ways of producing updates of the world now:
Compute iE(xE)∈X, then compute α(iE(xE))∈X
Compute αE(xE), and then iE(αE(xE))∈X
They don’t necessarily need to be the same.
Thus, to answer question 1, we ask that XE and X are such that there’s iE:XE→X injective.
To answer question 2, we ask that this injection iE is such that iE(αE(xE))=α(iE(xE)),∀xE∈XE.
We define ~XE:=iE(XE).
Note that, if XE⊆X, this is the same as asking α|XE=αE, so deep down we’re asking something related to XE being α-invariant. Indeed, ~XE is α-invariant because:
if x∈~XE, then x=iE(xE) and α(x)=α(iE(xE))=iE(αE(xE))∈~XE
Assumptions
Feedback Condition
We want to state something that enables us to proof the controller is autonomous and it models the environment. In the first post, we argued that one way to ensure that is by asking the feedback condition:
kerγ≤kerγ∘αWe want to be able to model a situation where the controller, prior to convergence, needs information from the environment to “learn” how the environment works. Then, after a large number of time-steps, the controller learns the environment’s behaviour and becomes autonomous, so instead of asking full feedback, we only ask the controller to be autonomous on good states, that is,
kerγ|K≤kerγ∘α|KRegulation Condition
We want all environment states to be good states, i.e, to have reached convergence. Since XE is not necessarily a subset of X, we ask
~XE⊆KThe Controller Models the Environment
With this assumptions and setup, we’re already able to prove the first part of the theorem, that states that the controller models/tracks the environment
Conclusion (1)
There exists a unique mapping αC:XC→XC determined by αC∘γ|K=γ∘α|K
Proof:
Let xC∈XC, then xC=γ(x), for x∈K. Define αC(xC)=γ∘α(x). αC is well defined because if xC=γ(x)=γ(x′), then(x,x′)∈kerγ|K⟹(x,x′)∈kerγα|K⟹γ(α(x))=γ(α(x′)).
The mapping αC is uniquely determined by αC∘γ|K=γ∘α|K: let αC,1 and αC,2 satisfying αC,i∘γ|K=γ∘α|K. Then, for any xC=∈XC,xC=γ(x) for some x∈K, since γ is surjective and thus αC,1(xC)=αC,1(γ(x))=γ(α(x))=αC,2(γ(x))=αC,2(xC)
This point guarantees that we have an autonomous αC:XC→XC and that it
Conclusion (2)
It’s true that αC∘γ|~XE=γ∘α|~XE
Proof: Let x∈~XE, since ~XE⊂K by the regulation condition, x∈K and hence αC∘γ(x)=γ∘α(x)=γ∘α|~XE(x), since x∈~XE.
Note that if XE⊆X, we have α|XE=αE, so this states that αC∘γ|XE=γ∘αE, that is, the controller tracks the environment. For more comments on why αC∘γ|XE=γ∘αE means the controller tracks/models the environment, check out my first post and the section below.
Note that α|~XE can be thought of as the environment dynamics inserted into the world.
We say the “internal model” in the controller is the pair consisting of the set XM:=γ|~XE(~XE) and the rule αC|XM
In what sense is this a ‘model’?
We now digress about the meaning of the theorem above.
On this session, it’s helpful to think that XE⊆X, thus αE=α|XE. The second conclusion of the theorem is αC∘γ|XE=γ|XE∘αE, that is, αC evolves according to αE autonomously: Consider the environment is in state xE∈XE at time-step t, its next state is αE(xE) and the internal system state (which can be calculated via γ) is γ|XE(αE(xE)) in time-step t+1. On the other hand, γ(xE) is the system’s state in time-step t, and in time-step t+1 it is αC(γ(xE)). The theorem ensures those two expressions for the internal system’s state in time-step t+1 are equal.
This is qualitatively different from the internal system receiving information about the environment and one time-step later updating to track the environment. The internal system is updating accordingly to the environment in the same time-step.
To clarify this, consider an analogy of a dog running after a beetle. The dog is initially in the position d0 and the beetle in the position b0. One thing that might happen is that the dog see the beetle’s position and starts to move there. When the dog arrives, the beetle already left that position, and is now in a new position. So the dog starts moving to this new beetle position, and when he arrives, the beetle already left again, and so on.
Another qualitatively different thing that can happen is that the dog actually understands (i.e, “models”) how the beetle will move. So instead of moving to the beetle’s current position, it moves to where it predicts the beetle will be in the future, so that it can catch it.
The theorem also states αC is unique, so it can’t be the case that there’s other dynamics in a system that satisfies perfect regulation and feedback structure. Hence, the dynamics on the internal model will necessarily be the one that models the environment.
In the dog-beetle analogy, the theorem guarantees that the second scenario will necessarily happen if the dog behavior satisfyies our hypothesis.
The Controller Faithfully Models the Environment
To understand this session better, we recommend the reading of the first post. There, we explain more intuitively the key ideas used here.
In our first post, we were able to prove an analogous result with only feedback structure condition. There, we didn’t need to worry about the difference between environment and world and good states. The last theorem is a particular scenario of this post’s theorem when we have XE=K=X and αE=α|XE.
Analogous to the first post, the results proved above includes pathological cases, where, for example, the controller could be XC={xC,1,xC,2} and the environment ~XE=N. Even if we had that αC∘γ|~XE=γ∘α|~XE, the controller wouldn’t have the “expressivity” to faithfully model the environment: it doesn’t have enough states to do so. We’ll now introduce the observability condition and prove a lemma to conclude the controller faithfully models the environment if, additionally to the prior assumptions, observability is also satisfied.
Observability can be stated as
inf{kerγ∘αn|~XE;n=0,1,2,…}=⊥For intuition on what does this mean and why we ask this, check out this section of the first post
Generalized Feedback Lemma
We’ll prove a generalization of the feedback structure condition in an analogous way we did in the first post, we’ll use that together with observability to conclude the controller faithfully models the environment.
The result we want to prove is
kerγ|~XE≤kerγ∘α|~XE≤kerγ∘α2|~XE≤kerγ∘α3|~XE≤…≤kerγ∘αk|~XE≤…We’ll prove it by induction on k.
(Base case): If k=1, by feedback condition, we knowkerγ|K≤kerγ∘α|K. Since ~XE⊆K by regulation condition, (x,y)∈kerγ|~XE⟹(x,y)∈kerγ|K⟹(x,y)∈kerγ∘α|K, but since ~XE is α-invariant, and (x,y)∈~XE, then α(x),α(y)∈~XE, so (x,y)∈kerγ∘α|~XE.
(Induction Hypothesis): Suppose now the theorem holds up until n.
(Inductive step) Let x,y∈~XE, then, by definition of kernel of a function, (x,y)∈kerγ∘αn|~XE⟺(αn(x),αn(y))∈kerγ|~XE
Let s:=αn(x),t:=αn(y). Since ~XE is α-invariant, we know s,t∈~XE. So the equivalence above gives us (s,t)∈kerγ|~XE. By the base case, we have that (α(s),α(t))∈kerγ∘α|~XE, but (α(s),α(t))=(αn+1(x),αn+1(y)), so the result follows by induction.
Conclusion (3)
γ|~XE is injective
Proof:
By generalized feedback lemma, inf{kerγ∘αn|~XE;n=0,1,2,…}=kerγ|~XE
By observability, inf{kerγ∘αn|K;n=0,1,2,…}=⊥, but since ~XE⊂K, kerγ∘αn|K≤kerγ∘αn|~XE, for any n. Hence, inf{kerγ∘αn|~XE;n=0,1,2,…}≤inf{kerγ∘αn|K;n=0,1,2,…}=⊥
So kerγ|~XE=⊥, so γ|~XE injective
In the next section, we explain why γ|~XE being an injection provides us the meaning of faiyhfully modeling.
Cardinality and expressivity
In the first post, we concluded γ:X→XC satisfying the theorem’s hypothesis is necessarily a bijection. In terms of cardinality, this means, by definition, that |XC|=|X|.
In this post, we concluded γ:XE→XC injection. In cardinality terms, |XE|≤|XC|.
Note that the elements of XE fully represents the description of each environment state. Say the environment is the outside of a chemical plant and say the chemical reactions being done there depends on the temperature and pressure of the plant, which in turns depends on the temperature and pressure of the environment. Then, elements xE∈XE could be ordered pairs xE=(TE,PE) comprising the external temperature and pressure. On the other hand, if there’s an additional relevant environment feature, say, the water delivered from the environment need to be treated (for example, because otherwise the water used in the reactions will be contaminated), one would want to consider an environment state as xE=(TE,PE,WE), where WE stands for “quality of water”. In other words, WE is a relevant feature and if we don’t include it in the environment, it’s as if the environment is not “expressive” enough to model the whole system, and thus even if the theorem applies and the controller of the plant develops an internal model, it could be that it runs poorly, i.e, have bad performance overall (note, though, that we haven’t discussed what it means for an internal model to have good performance in reality).
The point here is that the elements of XE are the only piece of our framework that encompasses the notion of “expressivity”. All the notion of “expressivity” in our setup comes from the sets XE and XC (for the controller, you could also say the expressivity comes from γ, since XC is determined by γ, i.e, γ(X)=XC).
The example above illustrated one way the states can show expressivity, that is, by the structure of the set of states: If X⊆R14, we perhaps have 14 different relevant features, that could be independent or not. If X⊆N10, we have 10 features with natural values.
Another way the states have expressivity is via the state set cardinality, as we talked in the beginning of this section: if X has |X|=10 (that is, 10 elements), it can only represent 10 different states. If |X|=|N|, then there’s a countable quantity of different states. Recall that cardinality is defined via mappings:
f:X→Yinjection ⟺|X|≤|Y|
f:X→Ysurjection ⟺|X|≥|Y|
f:X→Y bijection ⟺|X|=|Y|
Thus, a state set X exhibits expressivity in two ways:
Via X’s internal structure
Via X’s cardinality
Our theorem states that a necessary condition for the internal system to have an internal model of the environment is |XE|≤|XC|, that is, in order for the internal system to model the environment, it must exhibit at least as much expressivity as the controller (in the second sense of expressivity). In other words, if the internal system isn’t at least as expressive as the environment, the internal system necessarily can’t have an internal model of the environment.
Summarizing the whole theorem-meaning discussion,
The semantics of the world “model” in the theorem means faithfully tracking/simulating. That is, for states in ~XE (which the internal system views as XM) the internal system doesn’t wait for the environment and then update a timestep later following the environment. Instead, it updates accordingly to the environment in the same time step. We’ll discuss more about that in a later section
Conclusion (1) guarantees αC - the autonomous dynamics on the controller—is well defined and unique. Conclusion (2) states that αC simulates αE and Conclusion (3) says this simulation is faithful.
Comparing the less general version of the theorem in the first post with this version,
This theorem does apply for systems where γ is not a bijection. In fact, γ is injective on XE and surjective on K. It also seems useful to model a wider range of systems.
Our previous version of the theorem is a particular case of this, with XE=X=K and αE=α.
~XE⊆K and XC⊆γ(K) are the conditions ensuring all the states involved in the internal modeling are good states. Thus, this theorem is supposed to apply after the regulation has “converged” in some sense (such as t→∞).
Thus, modeling here means tracking faithfully.
Example
Consider an unidimensional circular grid of size N (i.e, a row of N squares such that the first square is connected to the last square) with a moving target and an agent that has the goal to pursue that target. The target and the agent can move left and right and are always inside some square of the grid.
The squares of our grid will be the set N:={0,…,N−1}.
We want the world to be able to describe the agent and the moving target, each moving in this N sized grid, so we will consider a state of the world as an ordered pair (i,j) with i,j∈{0,…,N−1}=N and so we’ll define the world as X:=N×N. Here, the first coordinate of the pair represents the position of the moving target and the second coordinate represents the position of the agent.
We will consider the environment to be the position of the moving target on the grid, thus XE:=N.
We assume the moving target always moves one square to the right. That is, αE:XE→XE is given by αE(i):=i+1modN, where the modN is to account for the fact that the world is circular.
We’ll define the dynamics of the world α:X→X by α(i,j)=(i+1modN,j+1modN) if i=j and α(i,j)=(i+1modN,jmodN) if i≠j
Consider jE:XE→X defined by jE(xE):=(xE,xE). Then, ~XE:=jE(XE)⊆X andjE∘αE(xE)=jE(αE(xE))=jE(xE+1modN)=(xE+1modN,xE+1modN)=α((xE,xE))=α(jE(xE))=α∘jE
In other words, let α~XE:=jE∘αE=α∘jE=α|~XE (since jE is injective). Thus we can use the internal model theorem with ~XE=jE(XE) as environment instead of XE directly.
For the agent,
XC should represent the states the agent can be in, hence XC:={0,…,N−1}
γ:X→XC defined by
Clearly, γ is a surjection: given j∈XC,∃x=(i,j)∈X s.t. γ(x)=xC,∀i∈{0,…,N−1}
We want our agent to pursue the moving target, so we will define K:={(i,j)∈X|i=j}
This setup satisfy our hypothesis:
~XE={(xE,xE)∈X|xE∈XE}⊆K={(i,j)∈X|i=j} and α~XE=α|~XE
γ(K)=N=XC
kerγ|K≤kerγ∘α|K.
Suppose x,y∈K such that (x,y)∈kerγ⟺γ(x)=γ(y). Since x,y∈K,x=(i,i) and y=(j,j). Then, i=γ(x)=γ(y)=j⟹i=j. Now, γ∘α(i,i)=γ∘α(j,j) since i=j. Thus, (x,y)∈kerγ∘α.
~XE is α-invariant.
Let (xE,xE)∈~XE, then α((xE,xE))=(xE+1modN,xE+1modN)=jE(xE+1) if xE+1≤N−1 and =jE(0) if xE+1=N. That is, α((xE,xE))=jE(x′E)∈~XE for some x′E∈XE.
infkerγ∘αn|~XE=⊥
This follows from feedback and regulation and α-invariance of ~XE: Since feedback holds and ~XE⊆K, then kerγ|~XE≤kerγ∘α|~XE. Because ~XE is α-invariant, one can prove the generalized feedback lemma for the sequence kerγ∘αn|~XE. Then infkerγ∘αn|~XE=⊥
Since the theorem’s assumptions are all true, we already know that there’s a unique αC determined by αC∘γ=γ∘α.
That is, xC=γ(x)=γ((i,j))=j, αC(xC)=γ(α(x))=γ(α(i,j)) = jmodN if i≠j and =j+1modN if i=j.
Then the pair XM=γ|~XE(~XE) and αC|XM is our internal model.
Discussion and further work
We presented the Internal Model Theorem statement, which basically states that, under a setup of an external system passing signals to an autonomous internal system syuch that these signals satisfy observability, then: feedback structure and regulation implies the controller necessarily have an internal model of the external system.
Some critiques:
As we discuss in the first post, the update rules involved are all deterministic. Thus, the theorem can’t represent non-deterministic scenarios such as taking decisions under uncertainty
Note, though, that the theorem is supposed to model regulation after some sort of equilibria or convergence. I expect intuitively many systems in these situations have at least an approximately deterministic behavior
It’s really hard to come up with non-trivial examples for the theorem. This difficulty is particularly present when trying to come up with examples that satisfy Feedback and Observability together
In fact, I’d be glad to see non-trivial examples of this theorem. If you know or construct one, please write in the comments or send it
IMP as a selection theorem
A selection theorem is a theorem that states something in the lines of “this is the type of agents we expect to find in environments with this specific type of selection pressure. Currently, the theorem doesn’t say much because the controller is not properly an agent. It’s an autonomous system, but it doesn’t act on the world. Phrased as a selection theorem wannabe, the IMP states that in environments where the controller is autonomous in good states, the controller acts as/tracks the environment internal model.
We can think of extending the theorem in two directions
“Systems that approximately regulate their internal state have approximate internal models”, we would get a better version of a selection theorem.
I’d expect this extended theorem to work for a wider range of real world systems.
This still wouldn’t make the controller look more agentic because it still wouldn’t properly act on the world.
“The controller acts upon the environment but it has an internal model of the environment.”
This would make the controller more agentic.
The first point above would make the theorem more useful to real systems, while the last would make the controller feel more like an agent. I’d think those two extensions would provide a selection theorem.
Relevance to Agent Structure problem
The agent-like structure problem is the problem of determining if, given a policy that robustly optimizes far away regions of the space into small chunks of the state space, does this policy has agent structure (by agent structure we mean informally having an internal model and a search process)? Another way to phrase this questions is “under which types of environments is the implication above true?”
Alex gave a loose formalism to answer this question, making some notions more precise:
This loose formalism consist of a policy and an environment. The policy receives an observation from the environment, updates its internal state and acts on the environment, changing its state. Then the environment sends a new observation to the policy and so on, in discrete time-steps. The policy is, thus, a function that sends each policy state and observation to a policy state. Analogously, the environment sends each environment state and action to an environment state.
We can define a class of different policies and different environments.
The idea of the formalism is being able to assess a function that associates to each policy in the policy class a number in [0,1], thought of as the “degree of agent structure”. We expect policies that perform well in a wide range of environments are highly “agentic”. Based on the performance of a policy in a wide range of environments, we want to be able to tell the degree of agent structure this policy has.
In Alex’s words,
The important idea here is that we expect to be able to define a function ϵ depending on the parameters, training, performance or other relevant variable such that it’s always true that the structure of a given policy is ≥1−ϵ. Since we want the structure function to be between 0 and 1, ϵ=1 would always make this statement true, but we want this function to be zero in the limit, in some sense of limit.
More concretely, in the agent structure setup, we want to be able to:
Have a measure of performance of a given policy in a wide range of environments.
Filter the policy class (because a lookup table containing “optimal” moves for each environment in the environment class would have a very good performance, but we would expect a lookup table would be the policy with less agent structure amongst all policies).
Define ϵ appropriately such that its limit is zero in some sense of asymptotics.
The IMP setup fails these conditions because
It doesn’t encompass a measure of performance of different policies. Actually, we’ve already proved the dynamics of the internal system is unique under IMP conditions, so we would actually have only one policy.
The IMP’s setup doesn’t seem to consider a policy acting over an environment. The internal system (the analogous of policy here) can be influenced by the environment (via γ), but can’t influence the environment. In a self-driving car analogy, it’s like the car being in position x=10 or x=20, moving or not corresponds to same environment states.
We wish to extend the IMP in two different ways, addressing those two problems:
Modify the IMP to show that systems which approximately regulate their internal state must have approximate models of their environments.
This could give us a notion of performance and different policies.
Rework the IMP so that it applies to controllers regulating the external environment (rather than regulating their internal state).
This would solve the fact that in current IMP, the internal system doesn’t interact with the environment.
We expect if one can extend the theorem to these two different situations, it might give some insight on the agent structure problem.
Supervisory Control of Discrete-Event Systems, (2019) Cai & Wonham as section 1.5.