I’ve been thinking a lot about identity (as in pg, keep your identity small).
Specifically which identities might lead to safe development of AI. And trying to validate that by running these different activities:
1. Role playing games where the participants are asked to take on specific identities and play through a scenario where AI has to be created. 2. Similar things where LLMs are prompted to take on particular roles and given agency to play in the role playing games too.
Has there been similar work before?
I’m particularly interested in cosmic identity, where you see humanity as a small part of a wider cosmos, including potentially hostile and potentially useful aliens. It has a number of properties that I think make it interesting, which I’ll discuss in a full post, if people think this is worth exploring.
Are there identities that people think should be explored too?
The cosmic identity and related issues have been considered and I even used them to make a conjecture about alignment. As for role-playing games, I doubt that they are actually useful. Unless, of course, you mean something like Cannell’s proposal.
As for “the idea of arms races and the treacherous turn”, the AI-2027 team isn’t worried about such a risk, they are more worried about the race itself causing the humans to do worse safety checks.
I think that there might be perverse incentives if identities or view points get promoted in a legible fashion. To hack that system rather than to do useful work.
So it might be good to have identity promotion to be done in a way that is obfuscated or ineffable in some way.
I’ve been thinking a lot about identity (as in pg, keep your identity small).
Specifically which identities might lead to safe development of AI. And trying to validate that by running these different activities:
1. Role playing games where the participants are asked to take on specific identities and play through a scenario where AI has to be created.
2. Similar things where LLMs are prompted to take on particular roles and given agency to play in the role playing games too.
Has there been similar work before?
I’m particularly interested in cosmic identity, where you see humanity as a small part of a wider cosmos, including potentially hostile and potentially useful aliens. It has a number of properties that I think make it interesting, which I’ll discuss in a full post, if people think this is worth exploring.
Are there identities that people think should be explored too?
The cosmic identity and related issues have been considered and I even used them to make a conjecture about alignment. As for role-playing games, I doubt that they are actually useful. Unless, of course, you mean something like Cannell’s proposal.
As for “the idea of arms races and the treacherous turn”, the AI-2027 team isn’t worried about such a risk, they are more worried about the race itself causing the humans to do worse safety checks.
But slightly irrational actors might not race (especially if they know that other actors are slightly irrational in the same or compatible way.)
I think that there might be perverse incentives if identities or view points get promoted in a legible fashion. To hack that system rather than to do useful work.
So it might be good to have identity promotion to be done in a way that is obfuscated or ineffable in some way.