It seems that if it indeed has an incentive to make its goals explicit, given an implicit utility-function, then the incentive to do so must be a presupposition inherent to the definition of a generally intelligent expected-utility maximizer capable of recursive self-improvement.
That is the general idea of universal instrumental values, yes.
I am aware of that argument but don’t perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default.
I’m not really sure what you mean “by default”. The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn’t become a victim of the wirehead or pornography problems.
IMO, there’s a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking—the second you have to deliberately build in. IMO, it doesn’t make much sense to lump these ideas together and reject both of them on the same grounds—as you seem to be doing.
That is the general idea of universal instrumental values, yes.
I am aware of that argument but don’t perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don’t think that an AGI will be friendly by default I don’t think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
I’m not really sure what you mean “by default”. The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn’t become a victim of the wirehead or pornography problems.
IMO, there’s a big difference between universal instrumental values and values to do with being nice to humans. The first type you get without asking—the second you have to deliberately build in. IMO, it doesn’t make much sense to lump these ideas together and reject both of them on the same grounds—as you seem to be doing.