This is a response directly to comments made by Richard Ngo at the CMU agent foundations conference. Though he requested I comment here, the claims I want to focus on go beyond this (and the previous post) and include the following:
1: redefining agency as coalitional (agent = cooperating subagents) as opposed to the normal belief/goal model.
2: justifying this model by arguing that subagents are required for robustness in hard domains (specifically those that require concept invention).
3: that therefore AIXI is irrelevant for understanding agency.
I will do my best to pass an ideological Turing test for Richard Ngo and (where uncertain) steelman his position below, but as far as I know the claims I am responding to do not appear in writing. I am actually more sympathetic to the claims in this post than I am to the picture of 1-3, despite owning about four t-shirts with Bayes rule on them.
I will argue that claims 1-3 together imply very strict constraints on agent structure and function that radically depart from the standard intuitive picture of human agency as well as the established mathematics of rational agents as expected utility maximizers, and that this departure, while it is full of rich ideas, is not a reasonable solution to the problems Ngo is pointing at.
By attempting to redefine agency in terms of 1, Ngo seems to be making the very strong assertion that all agents must be composed of cooperating subagents. Taken literally, this is not tenable, because it seems to imply that agents are infinitely decomposable. Also, without a base case it’s questionable whether this characterization uniquely points at anything resembling agency (e.g. are all materials made up of cooperating molecules made up of cooperating atoms?). Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so that nothing AIXI like ever really appears in his hierarchy). Combining such basic subagents through “cooperation” seems like it does have a reasonable chance of pointing at a distinguished agency concept. However, in its strongest form, claim 1 is clearly false because it is rather easy to construct useful agents that do have neat belief/goal separations and do not have subagents, for example basic chess engines using maximin tree search with hardcoded value functions. A more sophisticated example is MC AIXI CTW, a direct AIXI approximation that plans against a belief distribution and uses only a crisply separated reward signal to determine its goals—this weak approximation is already able to learn a wide variety of games from scratch with minimal hyperparameter tuning.
As I understand it, claim 2 defends the usefulness of claim 1 by asserting that all really interesting agents are made up of subagents. That’s because a robust agent, capable of innovatively discovering new concepts and adapting to a wide variety of domains, must be composed of subagents that cooperate to handle the tasks that each of them is well suited for. These subagents seem to map roughly on to the concept of “models” introduced in this post, and (roughly speaking) negotiate to determine which is the best expert for each given situation—and attempting to integrate all of them into one framework (ontology) is costly and unnatural. I think there is some truth to this idea; though there are powerful reinforcement learning agents like the series AlphaGo Zero, AlphaZero, and MuZero that rely on distinct world models and state evaluation for planning, there is probably a lot of repeated work going on between networks. Policy networks for search guidance seem to blur the lines, but provide a clear benefit (based on ablation testing). I don’t remember precisely, but I think the later, stronger models in that series were also somewhat more integrated networks, which all seems somewhat consistent with Ngo’s model. However, how does claim 2 explain the flexibility of MC AIXI CTW? Even if belief/goal separation is unnatural or costly, subagent structure seems very unlikely to be an inevitable feature of agent architecture—in which case, what is left of Ngo’s claims?
I think we can answer this question by turning to claim 3. By asserting that AIXI is irrelevant to understanding agency, Ngo makes it clear that he is attacking not only the conventional beliefs about agent structure (e.g. that expected utility maximization is an effective core engine for cognition) but also that it is not an acceptable optimality standard. This seems like a necessary inference from 3 because AIXI is typically formulated as a gold standard for agency, not a description of its actual implementation or internal structure. By rejecting AIXI in favor of coalitional agency, Ngo seems to be implying that AIXI is a useless frame because interesting agents can not be understood as approximating it (either because this is not normative or because it is not practical, I am not sure precisely which he believes but I suspect the later). Specifically, I think that Ngo expects interesting agents to look like cooperating subagents, and therefore importantly NOT like AIXI.
This is where I profoundly depart from Richard Ngo’s position. I am fine with the idea of agents usefully including subagents at some level, particularly if limited heuristics and models count as subagents. However, this seems to necessarily be in service of forming beliefs about the world and pursuing goals—insofar as a subagent architecture does not serve this purpose, it shouldn’t be adaptive, and it shouldn’t be selected through the types of optimization processes that usually arrive at agency, for example evolution and gradient descent. This is my model of what agency is for, and I think it is pretty overdetermined—without rehashing the entirety of the sequences and several coherence theorems for Bayesian decision theory, it’s just intuitive that intelligence is adaptive precisely because it is useful for surviving selection pressure, which requires learning enough about reality to achieve what is locally valuable (e.g. reproduction) in the face of unanticipated challenges.
I want to emphasize exactly how much has to be true for Ngo’s model to hold up. It’s not enough that the brain is made up of neurons which (say) turn out to have some low-level agency concept, as long as those neurons are ultimately implementing a conventional agent. The subagents have to be “big” enough (relative to the agent hierarchically above them) that they can be meaningfully viewed as negotiators with distinct objectives, whose compromise ultimately decides the agent’s behavior. That is, his coalitional agency model is trying to make predictions about how agents will act (an extrinsic view) in a way that is not consistent with EU maximization.
As an example, Ngo has given the fact that animals are multicellular organisms, and each cell can be viewed as a subagents. But the actual reason for this is not robustness but the square-cube law—you just can’t make a cell very big and have it still exchange resources/waste with the environment efficiently. This constraint doesn’t appear in the same way for computer engineers, and indeed computers do not have such cellular structure—transistors are not agentic, and though they form the basis for many levels of hierarchical organization, agency first appears at the highest level, when an LLM is trained on top of an entire computing cluster (though it is of course possible that LLMs include subagents). A more direct rebuttal is that the cooperative subagent structure does not even appear at all levels in biology—as Scott Garrabrant has pointed out, a kidney (or for that matter ~any organelle) looks less like an independent agent than a cell.
Now, there are examples of more sophisticated subagent structure in biology, appearing in the order Siphonophorae: consider for example the Portuguese man o’ war. According to this source: “Each man o’ war is actually a colony of several small individual organisms that each have a specialized job and are so closely intertwined that they cannot survive alone. In this manner, the larger colony consists of a float that keeps the colony at the sea surface, a series of long tentacles that are covered with stinging cells, a rudimentary digestive system, and a simple reproductive system.”
However, even in this extreme case, it still feels like a man o’ war is best viewed from a distance as a conventional agent, not as a coalition, precisely because the coalition is working so well that its subagent nature is screened off. This is why a man o’ war is easily mistaken for a jellyfish by a casual observer: by appearances, it acts very much like a unified multicellular organism. In my view, this is because (at least in biology) when agents form a sufficiently tight coalition, and take on a subagent role, they naturally collapse into gears of a larger machine serving an increasingly unified purpose. Certainly cooperation sometimes occurs in nature, though coevolution. But tight cooperation among equals is not stable here.
In my view, any subagents of a highly optimized agent are like the suborganisms of a man o’ war; the former serves to maximize EU, the latter to sail around and catch prey.
What would a truly coalitional agent actually look like? Perhaps Nash bargaining between subagents as in Scott Garrabrant’s geometric rationality sequence. This sort of coalition really is not VNM rational (rejecting the independence axiom), so can’t generally be viewed as an EU maximizer. But it also seems to be inherently unstable—subagents may simply choose to randomly select a leader, collapsing into the form of VNM rationality.
It seems far fetched that coalitional agency should be the only robust architecture. I think the robustness benefits can be achieved without paying the price of anarchy. You might want to maintain multiple incompatible models, but I don’t think you want them to fight with each other. I strongly suspect that there exist good data structures / algorithms / architectures for this. It’s just easy to stop at subagents because the idea of emergence is elegant and mysterious. Applying further optimization pressure to the architecture design problem will slave most subagents into gears.
Ngo and I have some conflicting intuitions which I’ve tried to illuminate; since I suspect we won’t reach agreement this way, is there any more direct empirical evidence? I think so; the most performant machine learning algorithms don’t look very coalitional. Mixture of expert models don’t seem to be worth the cost. The top RL agents aren’t just ten different RL agents glued together. Random forest inspired methods do work well for some problems (better than decision trees), but I don’t think they’re actually that coalitional—certainly it’s at most a 1-level deep hierarchy, and the individual stumps seem quite weak, more like neurons than value-laden ideologies. Overall, ML practice seems to fit better with my model.
Thank you Cole for the comment! Some quick thoughts in response (though I’ve skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so that nothing AIXI like ever really appears in his hierarchy)
Yepp, this is a good rephrasing. I’d clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren’t strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a “pick up a cup” subagent has a scope of maybe 1 second, so there’s just not much planning to do). So I’m open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don’t, −10 if they knock it over). This is still different from “utility-maximizers” in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope).
attempting to integrate all of them into one framework (ontology) is costly and unnatural
This feels crucial to me. There’s a level of optimality at which you no longer care about robustness, because you’re so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it’s confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!)
But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say “I can’t critique this plan in its own ontology but my heuristics say it’s going to fail”. More generally, attempted unifications of ontologies have the same problem as attempted unifications of political factions—typically the unified entity will ignore some aspects which the old entities thought of as important.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents. The point at which AIXI is relevant again in my mind is the point at which we have agents who can plan about the real world as precisely as Stockfish can plan in chess games, which IMO is well past what I’d call “superintelligence”.
What would a truly coalitional agent actually look like? Perhaps Nash bargaining between subagents as in Scott Garrabrant’s geometric rationality sequence. This sort of coalition really is not VNM rational (rejecting the independence axiom), so can’t generally be viewed as an EU maximizer. But it also seems to be inherently unstable—subagents may simply choose to randomly select a leader, collapsing into the form of VNM rationality.
Randomly choosing a leader is very far from the Pareto frontier! I do agree that there’s a form of instability in anything else (in the same way that countries often fall into dictatorship) but I’d say there’s also a form of meta-stability which dictatorship lacks (the countries that fall into dictatorship tend to be overtaken by countries that don’t).
But I do like the rest of this paragraph. Ultimately coalitional agency is an interesting philosophical hypothesis but for it to be a meaningful theory of intelligent agency it needs much more mathematical structure and insight. “If subagents bargain over what coalitional structure to form, what would they converge to under which conditions?” feels like the sort of question that might lead to that type of insight.
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it’s democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents.
So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as “acting like a belief/goal agent” in the limit, but part of my point is that we don’t even know what it means to act “approximately like belief/goal agents” in realistic regimes, because e.g. belief/goal agents as we currently characterize them can’t learn new concepts.
Update: I am increasingly convinced that Bayesianism is not a complete theory of intelligence and may not be the best fundamental basis for agent foundations research, but I am still not convinced that coalitional agency is the right direction.
Mostly talking to you, talking to Abram, and reading Tom Sterkenburg’s thesis.
Briefly: I am now less confident that realizability assumptions are ever satisfied for embedded agents in our universe (Vanessa Kosoy / Diffractor argue this fairly convincingly). In fact this is probably similar to a standard observation about the scientific method (I read Alchin’s “theory of knowledge”, Hutter recommends avoiding editions 3rd and after). As an example intuition, with runtime restrictions it seems to be impossible to construct universal mixtures (Vladimir Vovk impressed this on me). In the unrealizable case, I now appreciate Bayesian learning as one specific expert advice aggregator (albeit an abnormally principled one equipped with now-standard analysis). I appreciate the advantages of other approaches with partial experts, with Garrabrant induction as an extreme case.
I still endorse the Bayesian approach in many cases, in particular when it is at least possible to formulate a reasonable hypothesis class that contains the truth.
I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency.
A general underlying idea is to decompose a system into it’s maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance.
There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here’s some related papers:
https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested)
This is a response directly to comments made by Richard Ngo at the CMU agent foundations conference. Though he requested I comment here, the claims I want to focus on go beyond this (and the previous post) and include the following:
1: redefining agency as coalitional (agent = cooperating subagents) as opposed to the normal belief/goal model.
2: justifying this model by arguing that subagents are required for robustness in hard domains (specifically those that require concept invention).
3: that therefore AIXI is irrelevant for understanding agency.
I will do my best to pass an ideological Turing test for Richard Ngo and (where uncertain) steelman his position below, but as far as I know the claims I am responding to do not appear in writing. I am actually more sympathetic to the claims in this post than I am to the picture of 1-3, despite owning about four t-shirts with Bayes rule on them.
I will argue that claims 1-3 together imply very strict constraints on agent structure and function that radically depart from the standard intuitive picture of human agency as well as the established mathematics of rational agents as expected utility maximizers, and that this departure, while it is full of rich ideas, is not a reasonable solution to the problems Ngo is pointing at.
By attempting to redefine agency in terms of 1, Ngo seems to be making the very strong assertion that all agents must be composed of cooperating subagents. Taken literally, this is not tenable, because it seems to imply that agents are infinitely decomposable. Also, without a base case it’s questionable whether this characterization uniquely points at anything resembling agency (e.g. are all materials made up of cooperating molecules made up of cooperating atoms?). Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so that nothing AIXI like ever really appears in his hierarchy). Combining such basic subagents through “cooperation” seems like it does have a reasonable chance of pointing at a distinguished agency concept. However, in its strongest form, claim 1 is clearly false because it is rather easy to construct useful agents that do have neat belief/goal separations and do not have subagents, for example basic chess engines using maximin tree search with hardcoded value functions. A more sophisticated example is MC AIXI CTW, a direct AIXI approximation that plans against a belief distribution and uses only a crisply separated reward signal to determine its goals—this weak approximation is already able to learn a wide variety of games from scratch with minimal hyperparameter tuning.
As I understand it, claim 2 defends the usefulness of claim 1 by asserting that all really interesting agents are made up of subagents. That’s because a robust agent, capable of innovatively discovering new concepts and adapting to a wide variety of domains, must be composed of subagents that cooperate to handle the tasks that each of them is well suited for. These subagents seem to map roughly on to the concept of “models” introduced in this post, and (roughly speaking) negotiate to determine which is the best expert for each given situation—and attempting to integrate all of them into one framework (ontology) is costly and unnatural. I think there is some truth to this idea; though there are powerful reinforcement learning agents like the series AlphaGo Zero, AlphaZero, and MuZero that rely on distinct world models and state evaluation for planning, there is probably a lot of repeated work going on between networks. Policy networks for search guidance seem to blur the lines, but provide a clear benefit (based on ablation testing). I don’t remember precisely, but I think the later, stronger models in that series were also somewhat more integrated networks, which all seems somewhat consistent with Ngo’s model. However, how does claim 2 explain the flexibility of MC AIXI CTW? Even if belief/goal separation is unnatural or costly, subagent structure seems very unlikely to be an inevitable feature of agent architecture—in which case, what is left of Ngo’s claims?
I think we can answer this question by turning to claim 3. By asserting that AIXI is irrelevant to understanding agency, Ngo makes it clear that he is attacking not only the conventional beliefs about agent structure (e.g. that expected utility maximization is an effective core engine for cognition) but also that it is not an acceptable optimality standard. This seems like a necessary inference from 3 because AIXI is typically formulated as a gold standard for agency, not a description of its actual implementation or internal structure. By rejecting AIXI in favor of coalitional agency, Ngo seems to be implying that AIXI is a useless frame because interesting agents can not be understood as approximating it (either because this is not normative or because it is not practical, I am not sure precisely which he believes but I suspect the later). Specifically, I think that Ngo expects interesting agents to look like cooperating subagents, and therefore importantly NOT like AIXI.
This is where I profoundly depart from Richard Ngo’s position. I am fine with the idea of agents usefully including subagents at some level, particularly if limited heuristics and models count as subagents. However, this seems to necessarily be in service of forming beliefs about the world and pursuing goals—insofar as a subagent architecture does not serve this purpose, it shouldn’t be adaptive, and it shouldn’t be selected through the types of optimization processes that usually arrive at agency, for example evolution and gradient descent. This is my model of what agency is for, and I think it is pretty overdetermined—without rehashing the entirety of the sequences and several coherence theorems for Bayesian decision theory, it’s just intuitive that intelligence is adaptive precisely because it is useful for surviving selection pressure, which requires learning enough about reality to achieve what is locally valuable (e.g. reproduction) in the face of unanticipated challenges.
I want to emphasize exactly how much has to be true for Ngo’s model to hold up. It’s not enough that the brain is made up of neurons which (say) turn out to have some low-level agency concept, as long as those neurons are ultimately implementing a conventional agent. The subagents have to be “big” enough (relative to the agent hierarchically above them) that they can be meaningfully viewed as negotiators with distinct objectives, whose compromise ultimately decides the agent’s behavior. That is, his coalitional agency model is trying to make predictions about how agents will act (an extrinsic view) in a way that is not consistent with EU maximization.
As an example, Ngo has given the fact that animals are multicellular organisms, and each cell can be viewed as a subagents. But the actual reason for this is not robustness but the square-cube law—you just can’t make a cell very big and have it still exchange resources/waste with the environment efficiently. This constraint doesn’t appear in the same way for computer engineers, and indeed computers do not have such cellular structure—transistors are not agentic, and though they form the basis for many levels of hierarchical organization, agency first appears at the highest level, when an LLM is trained on top of an entire computing cluster (though it is of course possible that LLMs include subagents). A more direct rebuttal is that the cooperative subagent structure does not even appear at all levels in biology—as Scott Garrabrant has pointed out, a kidney (or for that matter ~any organelle) looks less like an independent agent than a cell.
Now, there are examples of more sophisticated subagent structure in biology, appearing in the order Siphonophorae: consider for example the Portuguese man o’ war. According to this source: “Each man o’ war is actually a colony of several small individual organisms that each have a specialized job and are so closely intertwined that they cannot survive alone. In this manner, the larger colony consists of a float that keeps the colony at the sea surface, a series of long tentacles that are covered with stinging cells, a rudimentary digestive system, and a simple reproductive system.”
However, even in this extreme case, it still feels like a man o’ war is best viewed from a distance as a conventional agent, not as a coalition, precisely because the coalition is working so well that its subagent nature is screened off. This is why a man o’ war is easily mistaken for a jellyfish by a casual observer: by appearances, it acts very much like a unified multicellular organism. In my view, this is because (at least in biology) when agents form a sufficiently tight coalition, and take on a subagent role, they naturally collapse into gears of a larger machine serving an increasingly unified purpose. Certainly cooperation sometimes occurs in nature, though coevolution. But tight cooperation among equals is not stable here.
In my view, any subagents of a highly optimized agent are like the suborganisms of a man o’ war; the former serves to maximize EU, the latter to sail around and catch prey.
What would a truly coalitional agent actually look like? Perhaps Nash bargaining between subagents as in Scott Garrabrant’s geometric rationality sequence. This sort of coalition really is not VNM rational (rejecting the independence axiom), so can’t generally be viewed as an EU maximizer. But it also seems to be inherently unstable—subagents may simply choose to randomly select a leader, collapsing into the form of VNM rationality.
It seems far fetched that coalitional agency should be the only robust architecture. I think the robustness benefits can be achieved without paying the price of anarchy. You might want to maintain multiple incompatible models, but I don’t think you want them to fight with each other. I strongly suspect that there exist good data structures / algorithms / architectures for this. It’s just easy to stop at subagents because the idea of emergence is elegant and mysterious. Applying further optimization pressure to the architecture design problem will slave most subagents into gears.
Ngo and I have some conflicting intuitions which I’ve tried to illuminate; since I suspect we won’t reach agreement this way, is there any more direct empirical evidence? I think so; the most performant machine learning algorithms don’t look very coalitional. Mixture of expert models don’t seem to be worth the cost. The top RL agents aren’t just ten different RL agents glued together. Random forest inspired methods do work well for some problems (better than decision trees), but I don’t think they’re actually that coalitional—certainly it’s at most a 1-level deep hierarchy, and the individual stumps seem quite weak, more like neurons than value-laden ideologies. Overall, ML practice seems to fit better with my model.
Thank you Cole for the comment! Some quick thoughts in response (though I’ve skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
Yepp, this is a good rephrasing. I’d clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren’t strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a “pick up a cup” subagent has a scope of maybe 1 second, so there’s just not much planning to do). So I’m open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don’t, −10 if they knock it over). This is still different from “utility-maximizers” in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope).
This feels crucial to me. There’s a level of optimality at which you no longer care about robustness, because you’re so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it’s confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!)
But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say “I can’t critique this plan in its own ontology but my heuristics say it’s going to fail”. More generally, attempted unifications of ontologies have the same problem as attempted unifications of political factions—typically the unified entity will ignore some aspects which the old entities thought of as important.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents. The point at which AIXI is relevant again in my mind is the point at which we have agents who can plan about the real world as precisely as Stockfish can plan in chess games, which IMO is well past what I’d call “superintelligence”.
Randomly choosing a leader is very far from the Pareto frontier! I do agree that there’s a form of instability in anything else (in the same way that countries often fall into dictatorship) but I’d say there’s also a form of meta-stability which dictatorship lacks (the countries that fall into dictatorship tend to be overtaken by countries that don’t).
But I do like the rest of this paragraph. Ultimately coalitional agency is an interesting philosophical hypothesis but for it to be a meaningful theory of intelligent agency it needs much more mathematical structure and insight. “If subagents bargain over what coalitional structure to form, what would they converge to under which conditions?” feels like the sort of question that might lead to that type of insight.
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it’s democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.
What is this coalitional structure for if not to approximate an EU maximizing agent?
This quote from my comment above addresses this:
So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?
If so, I find your model pretty plausible.
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as “acting like a belief/goal agent” in the limit, but part of my point is that we don’t even know what it means to act “approximately like belief/goal agents” in realistic regimes, because e.g. belief/goal agents as we currently characterize them can’t learn new concepts.
Relatedly, see the dialogue in this post.
Update: I am increasingly convinced that Bayesianism is not a complete theory of intelligence and may not be the best fundamental basis for agent foundations research, but I am still not convinced that coalitional agency is the right direction.
Interesting. Got a short summary of what’s changing your mind?
I now have a better understanding of coalitional agency, which I will be interested in your thoughts on when I write it up.
Mostly talking to you, talking to Abram, and reading Tom Sterkenburg’s thesis.
Briefly: I am now less confident that realizability assumptions are ever satisfied for embedded agents in our universe (Vanessa Kosoy / Diffractor argue this fairly convincingly). In fact this is probably similar to a standard observation about the scientific method (I read Alchin’s “theory of knowledge”, Hutter recommends avoiding editions 3rd and after). As an example intuition, with runtime restrictions it seems to be impossible to construct universal mixtures (Vladimir Vovk impressed this on me). In the unrealizable case, I now appreciate Bayesian learning as one specific expert advice aggregator (albeit an abnormally principled one equipped with now-standard analysis). I appreciate the advantages of other approaches with partial experts, with Garrabrant induction as an extreme case.
I still endorse the Bayesian approach in many cases, in particular when it is at least possible to formulate a reasonable hypothesis class that contains the truth.
I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency.
A general underlying idea is to decompose a system into it’s maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance.
There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here’s some related papers:
https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested)
https://arxiv.org/pdf/2209.01619 - Trying to relate Agency to POMDPs and the intentional stance.