Cornelis Dirk Haupt comments on Making AIs less likely to be spiteful

Cornelis Dirk Haupt 27 Apr 2025 22:04 UTC
1 point
0
I am a bit confused how you can coherently entangle spite as a commitment device vs spite as intrinsic rather than strategic. If Spite is being employed as a commitment device this necessarily seems to be evidence of using spite for instrumental ends—not as an end (i.e. terminal preference).
- Cornelis Dirk Haupt 28 Apr 2025 0:54 UTC
  1 point
  0
  Parent
  Wait I think the “indirect evolutionary game theory” mentioned in footnote 1, where agents optimize for subjective preferences (inner objectives) that are themselves subject to selection based on the fitness they induce cleared it up for me
  There’s a temporal aspect here—spite might start as instrumental but become intrinsic through training dynamics.