I think a problem arises with conclusion 4: I can agree that humans imperfectly steering the world for their own values has resulted in a world averagely ok, but AI will possibly be much more powerful than humans. As far as corporation and sovereing states can be seen to be super-human entities, then we can see that imperfect value optimization has created massive suffering: think of all the damage a ruthless corporation can inflict e.g. by polluting the environment, or a state where political assassination is easy and widespread. An imperfectly aligned value optimization might result in an average world that is ok, but possibly this world would be separated in a heaven and hell, which I think is not an acceptable outcome.
This is a good point. Pretty much all the things we’re optimizing for which aren’t our values are due to coordination problems. (There’s also Akrasia/addiction sorts of things, but that’s optimizing for values which we don’t endorse upon reflection, and so arguably isn’t as bad as optimizing for a random part of value-space.)
So, Moloch might optimize for things like GDP instead of Gross National Happiness, and individuals might throw a thousand starving orphans under the bus for a slightly bigger yacht or whatever, but neither is fully detached from human values. Even if U(orphans)>>U(yacht), at least there’s an awesome yacht to counterbalance the mountain of suck.
I guess the question is precisely how diverse human values are in the grand scheme of things, and what the odds are of hitting a human value when picking a random or semi-random subset of value-space. If we get FAI slightly wrong, precisely how wrong does it have to be before it leaves our little island of value-space? Tiling the universe with smiley faces is obviously out, but what about hedonium, or wire heading everyone? Faced with an unwinnable AI arms race and no time for true FAI, I’d probably consider those better than nothing.
That’s a really, really tiny sliver of my values though, so I’m not sure I’d even endorse such a strategy if the odds were 100:1 against FAI. If that’s the best we could do by compromising, I’d still rate the expected utility of MIRI’s current approach higher, and hold out hope for FAI.
I think a problem arises with conclusion 4: I can agree that humans imperfectly steering the world for their own values has resulted in a world averagely ok, but AI will possibly be much more powerful than humans.
As far as corporation and sovereing states can be seen to be super-human entities, then we can see that imperfect value optimization has created massive suffering: think of all the damage a ruthless corporation can inflict e.g. by polluting the environment, or a state where political assassination is easy and widespread.
An imperfectly aligned value optimization might result in an average world that is ok, but possibly this world would be separated in a heaven and hell, which I think is not an acceptable outcome.
This is a good point. Pretty much all the things we’re optimizing for which aren’t our values are due to coordination problems. (There’s also Akrasia/addiction sorts of things, but that’s optimizing for values which we don’t endorse upon reflection, and so arguably isn’t as bad as optimizing for a random part of value-space.)
So, Moloch might optimize for things like GDP instead of Gross National Happiness, and individuals might throw a thousand starving orphans under the bus for a slightly bigger yacht or whatever, but neither is fully detached from human values. Even if U(orphans)>>U(yacht), at least there’s an awesome yacht to counterbalance the mountain of suck.
I guess the question is precisely how diverse human values are in the grand scheme of things, and what the odds are of hitting a human value when picking a random or semi-random subset of value-space. If we get FAI slightly wrong, precisely how wrong does it have to be before it leaves our little island of value-space? Tiling the universe with smiley faces is obviously out, but what about hedonium, or wire heading everyone? Faced with an unwinnable AI arms race and no time for true FAI, I’d probably consider those better than nothing.
That’s a really, really tiny sliver of my values though, so I’m not sure I’d even endorse such a strategy if the odds were 100:1 against FAI. If that’s the best we could do by compromising, I’d still rate the expected utility of MIRI’s current approach higher, and hold out hope for FAI.