this would have to take the form of something like, first make the agent as a slightly-stateful pattern-response bot, maybe with a global “emotion” state thing that sets which pattern-response networks to use. then try to predict the world in parts, unsupervised. then have preferences, which can be about other agents’ inferred mental states. then pull those preferences back through time, reinforcement learned. then add the retribution and deservingness things on top. power would be inferred from representations of other agents, something like trying to predict the other agents’ unobserved attributes.
also this doesn’t put level 4 as this super high level thing, it’s just a natural result of running the world prediction for a while.
the better version of this model probably takes the form of a list of the most important built-in input-action mappings.