First and foremost, I imagine that the notion of “success” on which the agent conditions is not just a direct translation of “winning” in the decision problem. After all, a lot of the substance of tricky decision theory problems is exactly in that “direct” translation of what-it-means-to-win! Instead, I imagine that the notion of “success” has a lot more supporting infrastructure built into it, and the agent’s actions can directly interact with the supporting infrastructure as well as the nominal goal itself.
A prototypical example here would be an abstraction-based decision theory. There, the notion of “success” would not be “system achieves the maximum amount of utility”, but rather “system abstracts into a utility-maximizing agent”. The system’s “choices” will be used both to maximize utility and to make sure the abstraction holds. The “supporting infrastructure” part—i.e. making sure the abstraction holds—is what would handle things like e.g. acting as though the agent is deciding for simulations of itself (see the link for more explanation of that).
More generally, two other notions of “success” which we could imagine:
“success” means “our model of the territory is accurate, and our modelled-choices maximize our modelled-utility” (though this allows some degrees of freedom in how the model handles counterfactuals)
“success” means “the physical process which output our choices is equivalent to program X” (where X itself would optimize for this notion of success, and probably some other conditions as well; the point here is to check that the computation is not corrupted)
(These are not mutually exclusive.) In both cases, the agent’s decisions would be used to support its internal infrastructure (accurate models, uncorrupted computation) as well as the actual utility-maximization.
Having written that all out, it seems like it might be orthogonal to predictive processing. I had been thinking of these “success” notions more as part-of-the-world-model, mainly because the “success” notions are largely about parts of the world abstracting into specific things (models, program execution, agents). In that context, it made sense to view “enforcing the infrastructure” as part of “making the model and the territory match”. But if abstraction-enforcement is built into the utility function, rather than the model, then it looks less predictive-processing-specific.
“A prototypical example here would be an abstraction-based decision theory. There, the notion of “success” would not be “system achieves the maximum amount of utility”, but rather “system abstracts into a utility-maximizing agent”. The system’s “choices” will be used both to maximize utility and to make sure the abstraction holds. The “supporting infrastructure” part—i.e. making sure the abstraction holds—is what would handle things like e.g. acting as though the agent is deciding for simulations of itself (see the link for more explanation of that).”
isn’t this kind kind of like virtue ethics as opposed to utilitarianism?
This was a solid explanation, thanks.
Some differences from what I imagine...
First and foremost, I imagine that the notion of “success” on which the agent conditions is not just a direct translation of “winning” in the decision problem. After all, a lot of the substance of tricky decision theory problems is exactly in that “direct” translation of what-it-means-to-win! Instead, I imagine that the notion of “success” has a lot more supporting infrastructure built into it, and the agent’s actions can directly interact with the supporting infrastructure as well as the nominal goal itself.
A prototypical example here would be an abstraction-based decision theory. There, the notion of “success” would not be “system achieves the maximum amount of utility”, but rather “system abstracts into a utility-maximizing agent”. The system’s “choices” will be used both to maximize utility and to make sure the abstraction holds. The “supporting infrastructure” part—i.e. making sure the abstraction holds—is what would handle things like e.g. acting as though the agent is deciding for simulations of itself (see the link for more explanation of that).
More generally, two other notions of “success” which we could imagine:
“success” means “our model of the territory is accurate, and our modelled-choices maximize our modelled-utility” (though this allows some degrees of freedom in how the model handles counterfactuals)
“success” means “the physical process which output our choices is equivalent to program X” (where X itself would optimize for this notion of success, and probably some other conditions as well; the point here is to check that the computation is not corrupted)
(These are not mutually exclusive.) In both cases, the agent’s decisions would be used to support its internal infrastructure (accurate models, uncorrupted computation) as well as the actual utility-maximization.
Having written that all out, it seems like it might be orthogonal to predictive processing. I had been thinking of these “success” notions more as part-of-the-world-model, mainly because the “success” notions are largely about parts of the world abstracting into specific things (models, program execution, agents). In that context, it made sense to view “enforcing the infrastructure” as part of “making the model and the territory match”. But if abstraction-enforcement is built into the utility function, rather than the model, then it looks less predictive-processing-specific.
isn’t this kind kind of like virtue ethics as opposed to utilitarianism?
Interesting analogy, I hadn’t thought of that.