[Question] What Does AI Alignment Success Look Like?

Suppose you are put in stasis and wake up 10 years after the FOOM. You are trying to figure out if the AI Alignment project succeeded or not. How can you tell? Not vaguely, concretely. What metrics indicate success, and what metrics indicate failure? The following are potential examples, based on various sci-fi AI tropes. No need to discuss each one separately, there are millions more, the idea is to delineate failures from success in the general case.

  • Earth has been turned into a mega-brain with no visible humans around.

  • The Universe around the Solar System has disappeared (or turned into an apparently solid shell of unknown composition), but humans are still around, living in abundance of anything they want or need and seemingly content to stay that way.

  • Every human gets their own (real or simulated) universe to play with. Some end up creating trillions of creatures whom they torture for fun.

  • Humanity lives inside a giant simulation.

  • AI development is under a strictly enforced interdict.

  • Anything remotely alive-looking is digimon-like creatures, who proclaim themselves human.

  • The universe appears empty except for one black hole that encodes humanity (in some form) in its horizon microstates.

Any links discussing this would be appreciated, too.

No comments.