While I’m inclined to agree with the conclusion of this essay, namely
...the biases induced by instrumental rationality at best weakly support Bostrom’s conclusion that machine superintelligence is likely to lead to existential catastrophe.
I don’t think the essay presents good arguments for the conclusion. This is in part because the third criterion,
...learning how well satisfied Sia’s desires are at some worlds won’t tell us how well satisfied her desires are at other worlds. That is, the degree to which her desires are satisfied at some worlds is independent of how well satisfied they are at any other worlds.
is likely to be very strongly violated in practice— we should expect “similar” worlds to be valued similarly.
Perhaps more importantly, though, this essay seems not to criticize the notion of “randomly selected goal” or “most goals.” I think the best— in my view decisive— criticisms of instrumental convergence arguments attack these concepts, because they are, in my view, highly arbitrary at best and incoherent at worst. The only non-arbitrary measure over goals is the empirical probability measure over goals which is induced by the actual training procedure we use to create AIs, and it’s far from obvious that this distribution is malign in the way assumed by instrumental convergence arguments.
I think this is one of the biggest issues, in practice, as I view at least some of the arguments for AI doom to essentially ignore structure, and I suspect that they’re committing a similar error to people who argue that the no free lunch theorem makes intelligence and optimization in general so expensive that AI can’t progress at all.
This is especially true for the orthogonality thesis.
While I’m inclined to agree with the conclusion of this essay, namely
I don’t think the essay presents good arguments for the conclusion. This is in part because the third criterion,
is likely to be very strongly violated in practice— we should expect “similar” worlds to be valued similarly.
Perhaps more importantly, though, this essay seems not to criticize the notion of “randomly selected goal” or “most goals.” I think the best— in my view decisive— criticisms of instrumental convergence arguments attack these concepts, because they are, in my view, highly arbitrary at best and incoherent at worst. The only non-arbitrary measure over goals is the empirical probability measure over goals which is induced by the actual training procedure we use to create AIs, and it’s far from obvious that this distribution is malign in the way assumed by instrumental convergence arguments.
I think this is one of the biggest issues, in practice, as I view at least some of the arguments for AI doom to essentially ignore structure, and I suspect that they’re committing a similar error to people who argue that the no free lunch theorem makes intelligence and optimization in general so expensive that AI can’t progress at all.
This is especially true for the orthogonality thesis.