Sure, humans are effectively ruthless in wiping out individual ant colonies. We’ve even wiped out more than a few entire species of ant. But our ruthfulness about our ultimate goals — well, I guess it’s not exactly ruthfulness that I’m talking about...
...The fact that it’s not in our nature to simply define an easy-to-evaluate utility function and then optimize, means that it’s not mere coincidence that we don’t want anything radical enough to imply the elimination of all ant-kind. In fact, I’m pretty sure that for a large majority of people, there’s no utopian ideal you could pitch and they’d buy into, that’s so radical enough that getting there would imply or even suggest actions that would kill all ants. Not because humanity wouldn’t be capable of doing that, just that we’re not capable of wanting that, and that fact may be related to our (residual) ruthfulness and to our intelligence itself. And metaphorically, from a superintelligence’s perspective, I think that humanity-as-a-whole is probably closer to being Formicidae than it is to being one species of ant.
...
This post, and its line of argument, is not about saying “AI alignment doesn’t matter”. Of fucking course it does. What I’m saying is: “it may not be the case that any tiny misalignment of a superintelligence is fatal/permanent”. Because yes, a superintelligence can and probably will change the world to suit its goals, but it won’t ruthlessly change the whole world to perfectly suit its goals, because those goals will not, themselves, be perfectly coherent. And in that gap, I believe there will probably still be room for some amount of humanity or posthumanity-that’s-still-commensurate-with-extrapolated-human-values having some amount of say in their own fates.
The response I’m looking for is not at all “well, that’s all OK then, we can stop worrying about alignment”. Because there’s a huge difference between future (post)humans living meagerly under sufferance in some tiny remnant of the world that a superintelligence doesn’t happen to care about coherently enough to change, or them thriving as an integral part of the future that it does care about and is building, or some other possibility better or worse than those. But what I am arguing is that I think the “win big or lose big are the only options” attitude I see as common in alignment circles (I know that Eleizer isn’t really cutting edge anymore, but, look at his recent April Fools’ “joke” for an example) may be misguided. Not every superintelligence that isn’t perfectly friendly is terrifyingly unfriendly, and I think that admitting other possibilities (without being complacent about them) might help useful progress in pursuing alignment.
...
As for your points about therapy: yes, of course, my off-the-cuff one-paragraph just-so-story was oversimplified. And yes, you seem to know a lot more about this than I do. But I’m not sure the metaphor is strong enough to make all that complexity matter here.
Sure, humans are effectively ruthless in wiping out individual ant colonies. We’ve even wiped out more than a few entire species of ant. But our ruthfulness about our ultimate goals — well, I guess it’s not exactly ruthfulness that I’m talking about...
...The fact that it’s not in our nature to simply define an easy-to-evaluate utility function and then optimize, means that it’s not mere coincidence that we don’t want anything radical enough to imply the elimination of all ant-kind. In fact, I’m pretty sure that for a large majority of people, there’s no utopian ideal you could pitch and they’d buy into, that’s so radical enough that getting there would imply or even suggest actions that would kill all ants. Not because humanity wouldn’t be capable of doing that, just that we’re not capable of wanting that, and that fact may be related to our (residual) ruthfulness and to our intelligence itself. And metaphorically, from a superintelligence’s perspective, I think that humanity-as-a-whole is probably closer to being Formicidae than it is to being one species of ant.
...
This post, and its line of argument, is not about saying “AI alignment doesn’t matter”. Of fucking course it does. What I’m saying is: “it may not be the case that any tiny misalignment of a superintelligence is fatal/permanent”. Because yes, a superintelligence can and probably will change the world to suit its goals, but it won’t ruthlessly change the whole world to perfectly suit its goals, because those goals will not, themselves, be perfectly coherent. And in that gap, I believe there will probably still be room for some amount of humanity or posthumanity-that’s-still-commensurate-with-extrapolated-human-values having some amount of say in their own fates.
The response I’m looking for is not at all “well, that’s all OK then, we can stop worrying about alignment”. Because there’s a huge difference between future (post)humans living meagerly under sufferance in some tiny remnant of the world that a superintelligence doesn’t happen to care about coherently enough to change, or them thriving as an integral part of the future that it does care about and is building, or some other possibility better or worse than those. But what I am arguing is that I think the “win big or lose big are the only options” attitude I see as common in alignment circles (I know that Eleizer isn’t really cutting edge anymore, but, look at his recent April Fools’ “joke” for an example) may be misguided. Not every superintelligence that isn’t perfectly friendly is terrifyingly unfriendly, and I think that admitting other possibilities (without being complacent about them) might help useful progress in pursuing alignment.
...
As for your points about therapy: yes, of course, my off-the-cuff one-paragraph just-so-story was oversimplified. And yes, you seem to know a lot more about this than I do. But I’m not sure the metaphor is strong enough to make all that complexity matter here.