The step where you say that aligned ASI will want what humans want is, in my opinion, an unjustified leap. Any ASI, aligned or not, will naturally understand that humans don’t know what we want, not in detail, not in general, not in out-of-distribution hypothetical scenarios. An aligned ASI would, as you clearly understand, have to grapple with that fact, but I don’t think it would just acquiesce to current stated values at each moment. I also wouldn’t want it to.
I don’t know how much this helps, the problem is still there, but I hope if we align ASI enough to avoid extinction in the short to medium term, we’ll have aligned it enough to solve this problem in the medium to long term. Because if not, I would argue that the kind of weirdness you’re pointing towards is still a kind of extinction and replacement.
I wasn’t assuming the ASI was just taking our word for it:
The ASI are aligned, so they want whatever the humans want. Presumably they are using superintelligent Value Learning or AI Assisted Alignment or something to continuously improve their understanding of that. So they will presumably understand our Evolutionary Psychology, Neurology, Psychology, Anthropology, Sociology, etc. far better than we currently do.
So I was actually assuming they had a large, sophisticated ASI research project to figure out human values / what the humans want in ever increasing detail. But that would obviously include surveying and including recent changes in it, if humans are getting edited, or changing their minds, or the effects of cultural changes. Failing to do that is like a company still making the products that were in style 50 years ago, and not doing any customer research. Why would we make ASI aligned to outdated values? Clearly we won’t.
The step where you say that aligned ASI will want what humans want is, in my opinion, an unjustified leap. Any ASI, aligned or not, will naturally understand that humans don’t know what we want, not in detail, not in general, not in out-of-distribution hypothetical scenarios. An aligned ASI would, as you clearly understand, have to grapple with that fact, but I don’t think it would just acquiesce to current stated values at each moment. I also wouldn’t want it to.
I don’t know how much this helps, the problem is still there, but I hope if we align ASI enough to avoid extinction in the short to medium term, we’ll have aligned it enough to solve this problem in the medium to long term. Because if not, I would argue that the kind of weirdness you’re pointing towards is still a kind of extinction and replacement.
I wasn’t assuming the ASI was just taking our word for it:
So I was actually assuming they had a large, sophisticated ASI research project to figure out human values / what the humans want in ever increasing detail. But that would obviously include surveying and including recent changes in it, if humans are getting edited, or changing their minds, or the effects of cultural changes. Failing to do that is like a company still making the products that were in style 50 years ago, and not doing any customer research. Why would we make ASI aligned to outdated values? Clearly we won’t.
But as you say, this only speeds the problem up.
Fair enough, and I agree that’s a plausible scenario.