I really like this model of computation and how naturally it deals with counterfactuals, surprised it isn’t talked about more often.
This raises the issue of abstraction—the core problem of embedded agency.
I’d like to understand this claim better—are you saying that the core problem of embedded agency is relating high-level agent models (represented as causal diagrams) to low-level physics models (also represented as causal diagrams)?
I’m quite confused about what a non-agentic approach actually looks like, and I agree that extending this to give a proper account would be really interesting. A possible argument for actively avoiding ‘agentic’ models from this framework is:
Models which generalize very competently also seem more likely to have malign failures, so we might want to avoid them.
If we believe H then things which generalize very competently are likely to have agent-like internal architecture.
Having a selection criteria or model-space/prior which actively pushes away from such agent-like architectures could then help push away from things which generalize too broadly.
I think my main problem with this argument is that step 3 might make step 2 invalid—it might be that if you actively punish agent-like architecture in your search then you will break the conditions that made ‘too broad generalization’ imply ‘agent-like architecture’, and thus end up with things that still generalize very broadly (with all the downsides of this) but just look a lot weirder.
Thanks for the links, I definitely agree that I was drastically oversimplifying this problem. I still think this task might be much simpler than the task of trying to understand the generalization of some strange model whose internal working we don’t even have a vocabulary to describe.