Currently due to worries about arms races, and races to the bottom, people might not share the safest information about AI development. This makes public trust in the development of AI hard by actors with secret knowledge. One possibility is shadow decision making, giving the knowledge of the secret methods and the desires of an actor to a third party who makes go no go decisions. A second is building trust by building non AI software in the public interest, and that organisation being trusted to build AI with secret knowledge. Probably some mix of the two might be good.
Galathir
If your view of the problem is very complex you might lose to get compressed easily as there would be lots of mutual information
Has there been any work on representation under extreme information asymmetry?
I’m thinking like having AIs trained to make the same decisions as you would and them being given the secret or info-hazardous material to make governance decisions on your behalf. To avoid info leakage.
I’ve been thinking about problems with mind copying and democracy with uploads.
In order to avoid Sibyl attacks you might want to implement something like compressibility weightings of course. So if a voter has lots of similarity to other voters it is not weighted very much.
Otherwise you get a race to the bottom with viewpoints that might not capture the richness of humanity ( there is a pressure to simplify the thing being copied to get more copies of it given a certain amount of compute)
But slightly irrational actors might not race (especially if they know that other actors are slightly irrational in the same or compatible way.)
Is the rational mind set an existential risk? It spreads the idea of arms races and the treacherous turn. Should we be encouraging less than rational world views to spread if so what? And should we be coding them into our AI? You probably want them to be hard to predict so they cannot be exploited easily.
If it is it would still be worth preserving as an example of an insidious threat that should be guarded against. Perhaps in a simulation for people to interact with.
You might want as rational a choice of mindset to adopt as possible though. Decision making under deep uncertainty seems to allow you to deviate from the traditionally rational. You can evaluate plans under different world views and pick actions or plans that don’t seem too bad under them all. This could allow irrational world views to have a voice.
How irrational do you want to accept a world view under deep uncertainty? Perhaps you need to evaluate the outcomes from that world view and see if there might be something hidden that it is tapping into.
I think that there might be perverse incentives if identities or view points get promoted in a legible fashion. To hack that system rather than to do useful work.
So it might be good to have identity promotion to be done in a way that is obfuscated or ineffable in some way.
Galathir’s Shortform
I’ve been thinking a lot about identity (as in pg, keep your identity small).
Specifically which identities might lead to safe development of AI. And trying to validate that by running these different activities:
1. Role playing games where the participants are asked to take on specific identities and play through a scenario where AI has to be created.
2. Similar things where LLMs are prompted to take on particular roles and given agency to play in the role playing games too.
Has there been similar work before?
I’m particularly interested in cosmic identity, where you see humanity as a small part of a wider cosmos, including potentially hostile and potentially useful aliens. It has a number of properties that I think make it interesting, which I’ll discuss in a full post, if people think this is worth exploring.
Are there identities that people think should be explored too?
There is a reason why the arms race around more and bigger nuclear weapons stopped and hasn’t started again. Why do people think that is and can we use that understanding to stop the arms race (to be there first and control the future) around AI