something that I realized bothers me about this model: I basically didn’t include TAPs reasoning aka classical conditioning, I started from operant conditioning.
also, this explanation fails miserably at the “tell a story of how you got there in order to convey the subtleties” thing that eg ben hoffman was talking about recently.
yeahhhhhh missing TAP type reasoning is a really critical failure here, I think a lot of important stuff happens around signaling whether you’ll be an agent that is level 1 valuable to be around, and I’ve thought before about how keeping your hidden TAP depth short in ways that are recognizeable to others makes you more comfortable to be around because you’re more predictable. or something
this would have to take the form of something like, first make the agent as a slightly-stateful pattern-response bot, maybe with a global “emotion” state thing that sets which pattern-response networks to use. then try to predict the world in parts, unsupervised. then have preferences, which can be about other agents’ inferred mental states. then pull those preferences back through time, reinforcement learned. then add the retribution and deservingness things on top. power would be inferred from representations of other agents, something like trying to predict the other agents’ unobserved attributes.
also this doesn’t put level 4 as this super high level thing, it’s just a natural result of running the world prediction for a while.
the better version of this model probably takes the form of a list of the most important built-in input-action mappings.