Thank you Cole for the comment! Some quick thoughts in response (though I’ve skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
Ngo’s view seems to be that after some level of decomposition, the recursion bottoms out at agents that can be seen as expected utility maximizers (though it’s not totally clear to me where this point occurs on his model, he seems to think that these irreducible agents are more like rough heuristics than sophisticated planners, so that nothing AIXI like ever really appears in his hierarchy)
Yepp, this is a good rephrasing. I’d clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren’t strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a “pick up a cup” subagent has a scope of maybe 1 second, so there’s just not much planning to do). So I’m open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don’t, −10 if they knock it over). This is still different from “utility-maximizers” in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope).
attempting to integrate all of them into one framework (ontology) is costly and unnatural
This feels crucial to me. There’s a level of optimality at which you no longer care about robustness, because you’re so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it’s confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!)
But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say “I can’t critique this plan in its own ontology but my heuristics say it’s going to fail”. More generally, attempted unifications of ontologies have the same problem as attempted unifications of political factions—typically the unified entity will ignore some aspects which the old entities thought of as important.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents. The point at which AIXI is relevant again in my mind is the point at which we have agents who can plan about the real world as precisely as Stockfish can plan in chess games, which IMO is well past what I’d call “superintelligence”.
What would a truly coalitional agent actually look like? Perhaps Nash bargaining between subagents as in Scott Garrabrant’s geometric rationality sequence. This sort of coalition really is not VNM rational (rejecting the independence axiom), so can’t generally be viewed as an EU maximizer. But it also seems to be inherently unstable—subagents may simply choose to randomly select a leader, collapsing into the form of VNM rationality.
Randomly choosing a leader is very far from the Pareto frontier! I do agree that there’s a form of instability in anything else (in the same way that countries often fall into dictatorship) but I’d say there’s also a form of meta-stability which dictatorship lacks (the countries that fall into dictatorship tend to be overtaken by countries that don’t).
But I do like the rest of this paragraph. Ultimately coalitional agency is an interesting philosophical hypothesis but for it to be a meaningful theory of intelligent agency it needs much more mathematical structure and insight. “If subagents bargain over what coalitional structure to form, what would they converge to under which conditions?” feels like the sort of question that might lead to that type of insight.
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it’s democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents.
So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as “acting like a belief/goal agent” in the limit, but part of my point is that we don’t even know what it means to act “approximately like belief/goal agents” in realistic regimes, because e.g. belief/goal agents as we currently characterize them can’t learn new concepts.
Update: I am increasingly convinced that Bayesianism is not a complete theory of intelligence and may not be the best fundamental basis for agent foundations research, but I am still not convinced that coalitional agency is the right direction.
Mostly talking to you, talking to Abram, and reading Tom Sterkenburg’s thesis.
Briefly: I am now less confident that realizability assumptions are ever satisfied for embedded agents in our universe (Vanessa Kosoy / Diffractor argue this fairly convincingly). In fact this is probably similar to a standard observation about the scientific method (I read Alchin’s “theory of knowledge”, Hutter recommends avoiding editions 3rd and after). As an example intuition, with runtime restrictions it seems to be impossible to construct universal mixtures (Vladimir Vovk impressed this on me). In the unrealizable case, I now appreciate Bayesian learning as one specific expert advice aggregator (albeit an abnormally principled one equipped with now-standard analysis). I appreciate the advantages of other approaches with partial experts, with Garrabrant induction as an extreme case.
I still endorse the Bayesian approach in many cases, in particular when it is at least possible to formulate a reasonable hypothesis class that contains the truth.
I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency.
A general underlying idea is to decompose a system into it’s maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance.
There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here’s some related papers:
https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested)
Thank you Cole for the comment! Some quick thoughts in response (though I’ve skipped commenting on the biology examples and the ML examples since I think our intuitions here are a bit too different to usefully resolve via text):
Yepp, this is a good rephrasing. I’d clarify a bit by saying: after some level of decomposition, the recursion reaches agents which are limited to simple enough domains (like recognizing shapes in your visual field) that they aren’t strongly bottlenecked on forming new concepts (like all higher-level agents are). In domains that simple, the difference between heuristics and planners is much less well-defined (e.g. a “pick up a cup” subagent has a scope of maybe 1 second, so there’s just not much planning to do). So I’m open to describing such subagents as utility-maximizers with bounded scope (e.g. utility 1 if they pick up the cup in the next second, 0 if they don’t, −10 if they knock it over). This is still different from “utility-maximizers” in the classic LessWrong sense (which are usually understood as not being bounded in terms of time or scope).
This feels crucial to me. There’s a level of optimality at which you no longer care about robustness, because you’re so good at planning that you can account for every consideration. Stockfish, for example, is willing to play moves that go against any standard chess intuition, because it has calculated out so many lines that it’s confident it works in that specific case. (Though even then, note that this leaves it vulnerable to neural chess engines!)
But for anything short of that, you want to be able to integrate subagents with non-overlapping ontologies into the same decision procedure. E.g. if your internally planning subagent has come up with some clever and convoluted plan, you want some other subagent to be able to say “I can’t critique this plan in its own ontology but my heuristics say it’s going to fail”. More generally, attempted unifications of ontologies have the same problem as attempted unifications of political factions—typically the unified entity will ignore some aspects which the old entities thought of as important.
And so I’d say that AIXI-like agents and coalitional agents converge in the limit of optimality, but before that coalitional agency will be a much better framework for understanding realistic agents, including significantly superhuman agents. The point at which AIXI is relevant again in my mind is the point at which we have agents who can plan about the real world as precisely as Stockfish can plan in chess games, which IMO is well past what I’d call “superintelligence”.
Randomly choosing a leader is very far from the Pareto frontier! I do agree that there’s a form of instability in anything else (in the same way that countries often fall into dictatorship) but I’d say there’s also a form of meta-stability which dictatorship lacks (the countries that fall into dictatorship tend to be overtaken by countries that don’t).
But I do like the rest of this paragraph. Ultimately coalitional agency is an interesting philosophical hypothesis but for it to be a meaningful theory of intelligent agency it needs much more mathematical structure and insight. “If subagents bargain over what coalitional structure to form, what would they converge to under which conditions?” feels like the sort of question that might lead to that type of insight.
Re democratic countries overtaken by dictatorial countries, I think that this will only last until AI that can automate at least all white collar labor is achieved, and maybe even most blue collar physical labor well enough that human wages for those jobs decline below what you need to subsist on a human, and by then dictatorial/plutocratic countries will unfortunately come back as a viable governing option, and maybe even overtaking democratic countries.
So to come back to the analogy, I think VNM-rationality dictatorship is unfortunately common and convergent over a long timescale, and it’s democracies/coalition politics that are fragile over the sweep of history, because they only became dominant-ish starting in the 18th century and end sometime in the 21st century.
What is this coalitional structure for if not to approximate an EU maximizing agent?
This quote from my comment above addresses this:
So the thing that coalitional agents are robust at is acting approximately like belief/goal agents, and you’re only making a structural claim about agency?
If so, I find your model pretty plausible.
Oh, I see what you mean now. In that case, no, I disagree. Right now this notion of robustness is pre-theoretic. I suspect that we can characterize robustness as “acting like a belief/goal agent” in the limit, but part of my point is that we don’t even know what it means to act “approximately like belief/goal agents” in realistic regimes, because e.g. belief/goal agents as we currently characterize them can’t learn new concepts.
Relatedly, see the dialogue in this post.
Update: I am increasingly convinced that Bayesianism is not a complete theory of intelligence and may not be the best fundamental basis for agent foundations research, but I am still not convinced that coalitional agency is the right direction.
Interesting. Got a short summary of what’s changing your mind?
I now have a better understanding of coalitional agency, which I will be interested in your thoughts on when I write it up.
Mostly talking to you, talking to Abram, and reading Tom Sterkenburg’s thesis.
Briefly: I am now less confident that realizability assumptions are ever satisfied for embedded agents in our universe (Vanessa Kosoy / Diffractor argue this fairly convincingly). In fact this is probably similar to a standard observation about the scientific method (I read Alchin’s “theory of knowledge”, Hutter recommends avoiding editions 3rd and after). As an example intuition, with runtime restrictions it seems to be impossible to construct universal mixtures (Vladimir Vovk impressed this on me). In the unrealizable case, I now appreciate Bayesian learning as one specific expert advice aggregator (albeit an abnormally principled one equipped with now-standard analysis). I appreciate the advantages of other approaches with partial experts, with Garrabrant induction as an extreme case.
I still endorse the Bayesian approach in many cases, in particular when it is at least possible to formulate a reasonable hypothesis class that contains the truth.
I saw the comment and thought I would drop some stuff that are beginnings of approaches for a more mathematical theory of iterated agency.
A general underlying idea is to decompose a system into it’s maximally predictive sub-agents, sort of like an arg-max of daniel dennetts intentional stance.
There are various underlying reasons for why you would believe that there are algorithms for discovering the most important nested sub-parts of systems using things like Active Inference especially where it has been applied in computational biology. Here’s some related papers:
https://arxiv.org/abs/1412.2447 - We consider biological individuality in terms of information theoretic and graphical principles. Our purpose is to extract through an algorithmic decomposition system-environment boundaries supporting individuality. We infer or detect evolved individuals rather than assume that they exist. Given a set of consistent measurements over time, we discover a coarse-grained or quantized description on a system, inducing partitions (which can be nested)
https://arxiv.org/pdf/2209.01619 - Trying to relate Agency to POMDPs and the intentional stance.