So, currently the way alignment gets solved is: things continue to get crazier until they literally cannot get crazier any faster. When we reach that moment, we look back and ask: was it worth it? And if the answer is yes, congratulations, we solved the alignment problem.
I don’t know if I’m block-headed-ly missing the point, but I don’t know what this paragraph means.
How exactly does the world accelerating as much as possible mean that we solved the alignment problem?
At the single moment of maximum change in the transition from Human to Artificial Intelligence, we collectively agree that the outcome was “good”.
We never actually “solve” the alignment problem (in the EY sense of writing down a complete set of human values and teaching them to the AI). Instead, we solve the Alignment problem by doing the hard work of engineering AI that does what we want and riding the wave all the way to the end of the S-curve.
edit:
I mean, maybe we DO that, but by the time we do no one cares.
No one cares because...there are other systems who are not operating on a complete set of human values (including many small, relatively dumb AI systems) that are steering the world instead?
No one cares for the same reason no one (in the present) cares that AI can now pass the Turing Test. By the time we get there, we are grappling with different questions.
“Can you define an AI model that preserves a particular definition of human values under iterative self-improvement” simply won’t be seen as the question of the day, because by the time we can do it, it will feel “obvious” or “unimpressive”.
Wait what? Getting an AI to do what you want is usually considered the hard part of the alignment problem, right?
Edit: I guess you’re talking about the outer alignment, societal, or collective alignment problem, getting solved much as it is now, by a collection of compromises among only semi-misaligned agents.
I’m claiming we never solve the problem of building AI’s that “lase” in the sense of being able to specify an agent that achieves a goal at some point in the far future. Instead we “stumble through” by iteratively making more and more powerful agents that satisfy our immediate goals and game theory/ecological considerations mean that no single agent every takes control of the far future.
The idea makes sense. How long are you thinking we stumble through for?
Game theory says that humans need to work in coalitions and make allies because no individual human is that much more powerful than any other. With agents that can self improve and self replicate, I don’t think that holds.
And even if that balance of power were workable, my original objection stands. It seems to me that some misaligned troublemaker is bound to bring it all down with current or future tools of mass destruction.
Game theory says that humans need to work in coalitions and make allies because no individual human is that much more powerful than any other. With agents that can self improve and self replicate, I don’t think that holds.
Even if agents can self-replicate, it makes no sense to run GPT-5 on every single micro-processer on Earth. This implies we will have a wide variety of different agents operating across fundamentally different scales of “compute size”. For math reasons, the best way to coordinate a swarm of compute-limited agents is something that looks like free-market capitalism.
One possible worry is that humans will be vastly out-competed by future life forms. But we have a huge advantage in terms of existing now. Compounding interest rates imply that anyone alive today will be fantastically wealthy in a post-singularity world. Sure, some people will immediately waste all of that, but as long as at least some humans are “frugal”, there should be more than enough money and charity to go around.
I don’t really have much to say about the “troublemaker” part, except that we should do the obvious things and not give AI command and control of nuclear weapons. I don’t really believe in gray-goo or false-vacuum or anything else that would allow a single agent to destroy the entire world without the rest of us collectively noticing and being able to stop them (assuming cooperative free-market supporting agents always continue to vastly [100x+] outnumber troublemakers).
Okay, I’m understanding your proposed future better. I still think that anything recursively self-improving (RSI) will be the end of us, if it’s not aligned to be long-term stable. And that even non-RSI self-replicating agents are a big problem for this scenario (since they can cooperate nearly perfectly). But their need for GPU space is an important limitation.
I think this is a possible way to get to the real intelligence explosion of RSI, and it’s the likely scenario we’re facing if language model cognitive architectures take off like I think they will But I don’t think it helps with the need to get alignment right for the first real superintelligence. That will be capable of either stealing, buying, or building its own compute resources.
I don’t know if I’m block-headed-ly missing the point, but I don’t know what this paragraph means.
How exactly does the world accelerating as much as possible mean that we solved the alignment problem?
We never actually “solve” the alignment problem (in the EY sense of writing down a complete set of human values and teaching them to the AI). Instead, we solve the Alignment problem by doing the hard work of engineering AI that does what we want and riding the wave all the way to the end of the S-curve.
edit:
I mean, maybe we DO that, but by the time we do no one cares.
No one cares because...there are other systems who are not operating on a complete set of human values (including many small, relatively dumb AI systems) that are steering the world instead?
No one cares for the same reason no one (in the present) cares that AI can now pass the Turing Test. By the time we get there, we are grappling with different questions.
“Can you define an AI model that preserves a particular definition of human values under iterative self-improvement” simply won’t be seen as the question of the day, because by the time we can do it, it will feel “obvious” or “unimpressive”.
Wait what? Getting an AI to do what you want is usually considered the hard part of the alignment problem, right?
Edit: I guess you’re talking about the outer alignment, societal, or collective alignment problem, getting solved much as it is now, by a collection of compromises among only semi-misaligned agents.
I’m claiming we never solve the problem of building AI’s that “lase” in the sense of being able to specify an agent that achieves a goal at some point in the far future. Instead we “stumble through” by iteratively making more and more powerful agents that satisfy our immediate goals and game theory/ecological considerations mean that no single agent every takes control of the far future.
Does that make more sense?
The idea makes sense. How long are you thinking we stumble through for?
Game theory says that humans need to work in coalitions and make allies because no individual human is that much more powerful than any other. With agents that can self improve and self replicate, I don’t think that holds.
And even if that balance of power were workable, my original objection stands. It seems to me that some misaligned troublemaker is bound to bring it all down with current or future tools of mass destruction.
Even if agents can self-replicate, it makes no sense to run GPT-5 on every single micro-processer on Earth. This implies we will have a wide variety of different agents operating across fundamentally different scales of “compute size”. For math reasons, the best way to coordinate a swarm of compute-limited agents is something that looks like free-market capitalism.
One possible worry is that humans will be vastly out-competed by future life forms. But we have a huge advantage in terms of existing now. Compounding interest rates imply that anyone alive today will be fantastically wealthy in a post-singularity world. Sure, some people will immediately waste all of that, but as long as at least some humans are “frugal”, there should be more than enough money and charity to go around.
I don’t really have much to say about the “troublemaker” part, except that we should do the obvious things and not give AI command and control of nuclear weapons. I don’t really believe in gray-goo or false-vacuum or anything else that would allow a single agent to destroy the entire world without the rest of us collectively noticing and being able to stop them (assuming cooperative free-market supporting agents always continue to vastly [100x+] outnumber troublemakers).
Okay, I’m understanding your proposed future better. I still think that anything recursively self-improving (RSI) will be the end of us, if it’s not aligned to be long-term stable. And that even non-RSI self-replicating agents are a big problem for this scenario (since they can cooperate nearly perfectly). But their need for GPU space is an important limitation.
I think this is a possible way to get to the real intelligence explosion of RSI, and it’s the likely scenario we’re facing if language model cognitive architectures take off like I think they will But I don’t think it helps with the need to get alignment right for the first real superintelligence. That will be capable of either stealing, buying, or building its own compute resources.