Plan B in AI Safety approach

Central claim: Ambitious alignment of superintelligent AI will not be solved in time, therefore different variants of plan B should be explored and implemented.

Below is a short overview and links to my works on AI safety. The central theme of all them is “plan B”. Thus it could be called the “plan B agenda”.

Ambitious alignment of arbitrary superintelligent AI can’t be solved before AGI creation: a) proof is too complex and unreliable b) it needs at least superintelligent AI safety team to be created с) Dangerous AI may appear soon – 2030. Manydifferenttypes of AI-related x-risks are possible, not only paperclipper.

The best available solution is limiting the power of AI: artificial stupidity, a limited number of use cases, internal surveillance, multilevelboxing.

Safer AI architecture is needed: data-driven AI and human-like functional architecture for interpretability. Human-like AI could be modelled around a real-world person: it is not an upload, but very similar to side-loading. E.g. trained on someone’s texts.

Not only local but global solution of AI safety is needed: AI-nanny based on global surveillance and empowered by agreement between nuclear powers. Human-like AI or a first upload could be a “virtual tzar” of this.

The nature of human values depends on who is “moral subject”: society, brain, personality or subpersonality. Human values are not stable preferences. “Real choices” are more important than values, as “values” are notalways useful philosophical construction. Also, meta-values are more important: life extension, personal universes, – as they give anyone freedom to have their own values.

An AI that is learning human values is dangerous as it doesn’t know that values yet.

Unfriendly AI will not be very much interested to kill humans for atoms, as atoms have very small instrumental value, and living humans have larger instrumental value on all stages of AI’s evolution. But we have to help to understand this for future AI, especially for “Young AI”. We could empower our position by making some precomitments.

My works on AI Safety:

AI safety defense levels, a roadmap

Message to Any Future AI: “There are several instrumental reasons why exterminating humanity is not in your interest”

Catching Treacherous Turn: A Model of the Multilevel AI Boxing

Catastrophically Dangerous AI is Possible Before 2030

Literature Review: What Artificial General Intelligence Safety Researchers Have Written About the Nature of Human Values.

AI Alignment Problem: “Human Values” don’t Actually Exist

First human upload as AI Nanny

Narrow AI Nanny: Reaching Strategic Advantage via Narrow AI to Prevent Creation of the Dangerous Superintelligence

Artificial Intelligence in Life Extension: from Deep Learning to Superintelligence

Military AI as a Convergent Goal of Self-Improving AI

Classification of Global Catastrophic Risks Connected with Artificial Intelligence

The Global Catastrophic Risks Connected with Possibility of Finding Alien AI During SETI

Classification of the Global Solutions of the AI Safety Problem.

Could slaughterbots wipe out humanity? Assessment of the global catastrophic risk posed by autonomous weapons.

Levels of Self-Improvement in AI and their Implications for AI Safety

Possible Dangers of the Unrestricted Value Learners” – LW post