Humans aren’t fit to run the world, and there’s no reason to think humans can ever be fit to run the world.
I see this argument pop up every so often. I don’t find it persuasive because it presents a false choice in my view.
Our choice is not between having humans run the world and having a benevolent god run the world. Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).
If you think the alignment problem is hard, you probably believe that humans can’t be trusted to delegate to an AI, which means we are left with either having humans run the world (something humans can’t be trusted to do) or having humans build an AI to run the world (also something humans can’t be trusted to do).
The best path, in my view, is to pick and choose in order to make the overall task as easy as possible. If we’re having a hard time thinking of how to align an AI for a particular situation, add more human control. If we think humans are incompetent or untrustworthy in some particular circumstance, delegate to the AI in that circumstance.
It’s not obvious to me that becoming wiser is difficult—your comment is light on supporting evidence, violence seems less frequent nowadays, and it seems possible to me that becoming wiser is merely unincentivized, not difficult. (BTW, this is related to the question of how effective rationality training is.)
However, again, I see a false choice. We don’t have flawless computerized wisdom at the touch of a button. The alignment problem remains unsolved. What we do have are various exotic proposals for computerized wisdom (coherent extrapolated volition, indirect normativity) which are very difficult to test. Again, insofar as you believe the problem of aligning AIs with human values is hard, you should be pessimistic about these proposals working, and (relatively) eager to shift responsibility to systems we are more familiar with (biological humans).
Let’s take coherent extrapolated volition. We could try & specify some kind of exotic virtual environment where the AI can simulate idealized humans and observe their values… or we could become idealized humans. Given the knowledge of how to create a superintelligent AI, the second approach seems more robust to me. Both approaches require us to nail down what we mean by an “idealized human”, but the second approach does not include the added complication+difficulty of specifying a virtual environment, and has a flesh and blood “human in the loop” observing the process at every step, able to course correct if things seem to be going wrong.
The best overall approach might be a committee of ordinary humans, morally enhanced humans, and morally enhanced ems of some sort, where the AI only acts when all three parties agree on something (perhaps also preventing the parties from manipulating each other somehow). But anyway...
You talk about the influence of better material conditions and institutions. Fine, have the AI improve our material conditions and design better institutions. Again I see a false choice between outcomes achieved by institutions and outcomes achieved by a hypothetical aligned AI which doesn’t exist. Insofar as you think alignment is hard, you should be eager to make an AI less load-bearing and institutions more load-bearing.
Maybe we can have an “institutional singularity” where we have our AI generate a bunch of proposals for institutions, then we have our most trusted institution choose from amongst those proposals, we build the institution as proposed, then have that institution choose from amongst a new batch of institution proposals until we reach a fixed point. A little exotic, but I think I’ve got one foot on terra firma.
Our choice is not between having humans run the world and having a benevolent god run the world.
Right, I agree that having a benevolent god run the world is not within our choice set.
Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).
Well just to re-state the suggestion in my original post: is this dichotomy between humans running the world or something else running the world really so inescapable? The child in the sand pit does not really run the world, and in an important way the parent also does not run the world—certainly not from the perspective of the child’s whole-life trajectory.
I buy into the delegation framing, but I think that the best targets for delegation look more like “slightly older and wiser versions of ourselves with slightly more space” (who can themselves make decisions about whether to delegate to something more alien). In the sand-pit example, if the child opted into that arrangement then I would say they have effectively delegated to a version of themselves who is slightly constrained and shaped by the supervision of the adult. (But in the present situation, the most important thing is that the parent protects them from the outside the world while they have time to grow.)
I see this argument pop up every so often. I don’t find it persuasive because it presents a false choice in my view.
Our choice is not between having humans run the world and having a benevolent god run the world. Our choice is between having humans run the world, and having humans delegate the running of the world to something else (which is kind of just an indirect way of running the world).
If you think the alignment problem is hard, you probably believe that humans can’t be trusted to delegate to an AI, which means we are left with either having humans run the world (something humans can’t be trusted to do) or having humans build an AI to run the world (also something humans can’t be trusted to do).
The best path, in my view, is to pick and choose in order to make the overall task as easy as possible. If we’re having a hard time thinking of how to align an AI for a particular situation, add more human control. If we think humans are incompetent or untrustworthy in some particular circumstance, delegate to the AI in that circumstance.
It’s not obvious to me that becoming wiser is difficult—your comment is light on supporting evidence, violence seems less frequent nowadays, and it seems possible to me that becoming wiser is merely unincentivized, not difficult. (BTW, this is related to the question of how effective rationality training is.)
However, again, I see a false choice. We don’t have flawless computerized wisdom at the touch of a button. The alignment problem remains unsolved. What we do have are various exotic proposals for computerized wisdom (coherent extrapolated volition, indirect normativity) which are very difficult to test. Again, insofar as you believe the problem of aligning AIs with human values is hard, you should be pessimistic about these proposals working, and (relatively) eager to shift responsibility to systems we are more familiar with (biological humans).
Let’s take coherent extrapolated volition. We could try & specify some kind of exotic virtual environment where the AI can simulate idealized humans and observe their values… or we could become idealized humans. Given the knowledge of how to create a superintelligent AI, the second approach seems more robust to me. Both approaches require us to nail down what we mean by an “idealized human”, but the second approach does not include the added complication+difficulty of specifying a virtual environment, and has a flesh and blood “human in the loop” observing the process at every step, able to course correct if things seem to be going wrong.
The best overall approach might be a committee of ordinary humans, morally enhanced humans, and morally enhanced ems of some sort, where the AI only acts when all three parties agree on something (perhaps also preventing the parties from manipulating each other somehow). But anyway...
You talk about the influence of better material conditions and institutions. Fine, have the AI improve our material conditions and design better institutions. Again I see a false choice between outcomes achieved by institutions and outcomes achieved by a hypothetical aligned AI which doesn’t exist. Insofar as you think alignment is hard, you should be eager to make an AI less load-bearing and institutions more load-bearing.
Maybe we can have an “institutional singularity” where we have our AI generate a bunch of proposals for institutions, then we have our most trusted institution choose from amongst those proposals, we build the institution as proposed, then have that institution choose from amongst a new batch of institution proposals until we reach a fixed point. A little exotic, but I think I’ve got one foot on terra firma.
Right, I agree that having a benevolent god run the world is not within our choice set.
Well just to re-state the suggestion in my original post: is this dichotomy between humans running the world or something else running the world really so inescapable? The child in the sand pit does not really run the world, and in an important way the parent also does not run the world—certainly not from the perspective of the child’s whole-life trajectory.
I buy into the delegation framing, but I think that the best targets for delegation look more like “slightly older and wiser versions of ourselves with slightly more space” (who can themselves make decisions about whether to delegate to something more alien). In the sand-pit example, if the child opted into that arrangement then I would say they have effectively delegated to a version of themselves who is slightly constrained and shaped by the supervision of the adult. (But in the present situation, the most important thing is that the parent protects them from the outside the world while they have time to grow.)