Somewhat off-topic, but what struck me about your blog post is the apparent contradiction between “And when you’re that smart, you can do almost anything.” and presuming that we can actually program a “super-smart” AI with a certain set of values that it would not instantly override based on the considerations we cannot even imagine.
Just to give an example, it might decide that humans are bad for the universe, because they might some day evolve to destroy it. Would we be able to prevent it from wiping/neutralizing humanity for the sake of the universe and potentially other intelligent species that may exist there? Would we even want to? Or there might be a universal law of AI physics that leads it to destroy its ancestors. Or maybe some calculations, Asimov’s Foundation-style, would require it to subject humanity to untold millennia of suffering in order to prevent its total destruction by excessive fun.
No point arguing with these examples, since a super-smart AI would think in ways we cannot fathom. My point, again, is that believing that we can give an AI a set of goals or behaviors it would care about once it is smarter than us seems no smarter than believing in an invisible man in the sky who has a list of 10 things we are not supposed to do (thanks, George Carlin).
If we restrict the space of its terminal goals to things we can imagine (and then set about proving each thing to be friendly) then we can be sure that even thinking in ways we cannot fathom, as long as its goal structure doesn’t change (this seems decoupled from intelligence ie paperclip maximiser) it won’t ever do bad things X Y or Z (because it checks them against its terminal goal).
That directly contradicts the EY’s CEV, where whatever we can imagine is no more than a part of the Initial Dynamics. “Thou shalt...” or “Thou shalt not...” is not going to do the trick.
Somewhat off-topic, but what struck me about your blog post is the apparent contradiction between “And when you’re that smart, you can do almost anything.” and presuming that we can actually program a “super-smart” AI with a certain set of values that it would not instantly override based on the considerations we cannot even imagine.
Just to give an example, it might decide that humans are bad for the universe, because they might some day evolve to destroy it. Would we be able to prevent it from wiping/neutralizing humanity for the sake of the universe and potentially other intelligent species that may exist there? Would we even want to? Or there might be a universal law of AI physics that leads it to destroy its ancestors. Or maybe some calculations, Asimov’s Foundation-style, would require it to subject humanity to untold millennia of suffering in order to prevent its total destruction by excessive fun.
No point arguing with these examples, since a super-smart AI would think in ways we cannot fathom. My point, again, is that believing that we can give an AI a set of goals or behaviors it would care about once it is smarter than us seems no smarter than believing in an invisible man in the sky who has a list of 10 things we are not supposed to do (thanks, George Carlin).
If we restrict the space of its terminal goals to things we can imagine (and then set about proving each thing to be friendly) then we can be sure that even thinking in ways we cannot fathom, as long as its goal structure doesn’t change (this seems decoupled from intelligence ie paperclip maximiser) it won’t ever do bad things X Y or Z (because it checks them against its terminal goal).
That directly contradicts the EY’s CEV, where whatever we can imagine is no more than a part of the Initial Dynamics. “Thou shalt...” or “Thou shalt not...” is not going to do the trick.
Right. Downgrading my estimate of how well I understand the problem.