Instrumental prudent-bot game theory–cooperating with agents who would conditionally cooperate and defecting against others
This seems like the critical part. The notion of “cooperation” is straightforward in the abstracted setting of prisoner’s dilemma, but in the real world it gets much more complicated, and is close to a restatement of/generalization of the alignment problem.
This seems like the critical part. The notion of “cooperation” is straightforward in the abstracted setting of prisoner’s dilemma, but in the real world it gets much more complicated, and is close to a restatement of/generalization of the alignment problem.
Maybe that’s why abstract approaches to real-world alignment seem so intractable.
If real alignment is necessarily messy, concrete, and changing, then abstract formality just wasn’t the right problem framing to begin with.