Wait I think the “indirect evolutionary game theory” mentioned in footnote 1, where agents optimize for subjective preferences (inner objectives) that are themselves subject to selection based on the fitness they induce cleared it up for me
There’s a temporal aspect here—spite might start as instrumental but become intrinsic through training dynamics.
Wait I think the “indirect evolutionary game theory” mentioned in footnote 1, where agents optimize for subjective preferences (inner objectives) that are themselves subject to selection based on the fitness they induce cleared it up for me
There’s a temporal aspect here—spite might start as instrumental but become intrinsic through training dynamics.