OpenAI could help X-risk by wagering itself

While brainstorming on “conditional on things going well with AI, how did it happen”, I came up with the following idea. I think it is extremely unrealistic right now.[1] However, it might become relevant later if some key people in top AI companies have a change of heart. As a result, it seems useful to have the idea floating around.


OpenAI[2] is at the center of attention regarding recent AI progress. I expect that in the view of both public and the politicians, OpenAI’s actions and abilities are representative of actions and abilities of AI companies in general. As a result, if OpenAI genuinely tried to make their AI safe, and failed in a very spectacular manner, this would cause a sudden and massive shift in public opinion on AI in general.

Now, I don’t expect OpenAI to sacrifice itself by faking incompetence. (Nor would I endorse being deceitful here.) However, there might be some testable claim that is something of crux for both “AI-optimists” and, say, MIRI. OpenAI could make some very public statements about their abilities to control AI. And they could really stick their neck out by making the claims falsifiable and making a possible failure impossible to deny (and memetically fit, etc). This way, if they fail, this would create a Risk Awareness Moment where enough people become mindful of AI-risk that we can put in place extreme risk-reduction measures that wouldn’t be possible otherwise. Conversely, if they succeed, that would be a genuine evidence that such measures are not necessary.

I end with an open problem: Can we find claims, about the ability of AI companies to control their AI, that would simultaneously:

  1. be testable,

  2. serve as strong evidence regarding the question where extreme measures regarding AI-risk are needed,

  3. have the potential to create a risk-awareness moment (ie, be salient to decision-makers and public),

  4. be possible for an AI company to endorse despite satisfying (i-iii).

  1. ^

    Mostly, I don’t expect this to happen, because any given AI company has nothing to gain from this. However, some individual employees of an AI company might have sufficient uncertainty about AI risk to make this worth it to them. And it might be possible that a sub-group of people within an AI company could make a public commitment such as the one above somewhat unilaterally, in a way that wouldn’t allow the company to back out gracefully

  2. ^

    I will talk about OpenAI for concreteness, but this applies equally well to some of the other major companies (Facebook, Microsoft, Google; with “your (grand)parents know the name” being a good heuristic).