Wait I think the “indirect evolutionary game theory” mentioned in footnote 1, where agents optimize for subjective preferences (inner objectives) that are themselves subject to selection based on the fitness they induce cleared it up for me
There’s a temporal aspect here—spite might start as instrumental but become intrinsic through training dynamics.
Meditations on Moloch is among my all-time favourite ACX post. Anyone that wants to fanboy with me over it at any time is welcome to.