I like the point here about how stability of goals might be an instrumentally convergent feature of superintelligence. It’s an interesting point.
On the other hand, intuitive human reasoning would suggest that this is overly inflexible if you ever ask yourself ‘could I ever come up with a better goal than this goal?’. What better would mean for a superintelligence seems hard to define, but it also seems hard to imagine that it would never ask the question.
Separately, your opening statements seem to be at least nearly synonymous to me:
“First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment”
every easily specified goal you can give an AI would result in misalignment ~ = give a hypothetical super intelligence a goal that you might regret later (i.e., misalignment)
I like the point here about how stability of goals might be an instrumentally convergent feature of superintelligence. It’s an interesting point.
On the other hand, intuitive human reasoning would suggest that this is overly inflexible if you ever ask yourself ‘could I ever come up with a better goal than this goal?’. What better would mean for a superintelligence seems hard to define, but it also seems hard to imagine that it would never ask the question.
Separately, your opening statements seem to be at least nearly synonymous to me:
“First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment”
every easily specified goal you can give an AI would result in misalignment ~ = give a hypothetical super intelligence a goal that you might regret later (i.e., misalignment)