What if we Align the AI and nobody cares?
In the classic AI Foom scenario, a single unified AI passes the boundary where it is capable of recursive self-improvement. Thereafter it increases in intelligence hyper exponentially until it rapidly gains control of the future.
We now know this is not going to happen. By the time a superintelligent (>all humans) AI is built, the world will already be filled with trillions of AIs spanning the entire range from tiny models running as microservices on low-power IOT devices to super-human (but not yet super-itelligent) models serving as agents for some of the most powerful organizations on Earth.
For ecological reasons, small models will dramatically outnumber large ones both in number and in terms of absolute compute at their disposal. Building larger, more powerful models will be seen primarily as an engineering problem, and at no point will a single new model be in a position to overpower the entire ecosystem that created it.
What does solving the Alignment Problem look like in this future?
It does not look like: We invent a clever machine-learning technique that allows us to create a single AI that understands our values and hand control of the future over to it.
Instead it looks like: At the single moment of maximum change in the transition from Human to Artificial Intelligence, we collectively agree that the outcome was “good”.
The thing is, we will have absolutely no clue when we pass the point of maximum change. We are nowhere near that point, but we have already passed the point where no single human can actually keep track of everything that’s happening. We can only say that the rate of change will be faster than now. How much faster? 10x? 100x? We can calculate some upper bound (assuming, for example, that the total energy used by AI doesn’t exceed that produced by our sun). But the scale of the largest AI experiments can continue to grow at least 22 orders of magnitude before hitting fundamental limits.
Edit: This is incorrect. GPT-3 used ~1GWH. So the correct OOM should be 34-12=22
Given the current doubling rate of 3.4 months for AI experiments, this means that we will reach the point of maximum change at some point in the next 2 decades (sooner, assuming we are not currently near the maximum rate of change).
So, currently the way alignment gets solved is: things continue to get crazier until they literally cannot get crazier any faster. When we reach that moment, we look back and ask: was it worth it? And if the answer is yes, congratulations, we solved the alignment problem.
So, if your plan is to keep freaking out until the alignment problem is solved, then I have good news for you. The alignment problem will be solved at precisely the moment you are maximally freaked out.
Alternative Title: “AI Alignment” Considered Harmful
Imagine a world with two types of people: Machine Learning Engineers and AI Alignment Experts.
Every day, the Machine Learning Engineer wakes up and asks himself: Given the current state of the art in Machine Learning, how can I use the tools and techniques of Machine Learning to achieve my goals?
Every day, the AI Alignment Expert wakes up an asks himself a question: How can I make sure that future yet-to-be-invented tools and techniques of Machine Learning are not harmful?
One of these people is much more likely to succeed at building a future that reflects their ideals than the other.