Someone else could probably explain this better then me but i will give it a try.
First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment.
The “paperclip” part in paperclip maximizer is just a placeholder, it could have been ”diamonds” or “digits of Pi” or “seconds of runtime” and the end result is the same.
Second, one of the expected properties of a hypothetical super intelligence is having robust goals, as in it doesn’t change it’s goals at all because changing your goals will make you less likely to achive your end goal.
In short not wanting to change your goals is an emergent instrumintal value of having a goal to began with, for a more human example if your goal is to get rich then taking a pill that magically rewires your brain so that you no longer want money is a terrible idea (unless the pill comes with a sum of money that you couldn’t have possible collected on your own but that is a hypothetical that probably wouldn’t ever happen)
The problem is mostly how to rebustly install goals into the AI which our current methods just don’t suffice as the AI often ends up with unintended goals.
If only we had a method of just writting down a utility function that just says “if True: make_humans_happy” instead of beating the model with a stick untill it seems to comply.
I like the point here about how stability of goals might be an instrumentally convergent feature of superintelligence. It’s an interesting point.
On the other hand, intuitive human reasoning would suggest that this is overly inflexible if you ever ask yourself ‘could I ever come up with a better goal than this goal?’. What better would mean for a superintelligence seems hard to define, but it also seems hard to imagine that it would never ask the question.
Separately, your opening statements seem to be at least nearly synonymous to me:
“First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment”
every easily specified goal you can give an AI would result in misalignment ~ = give a hypothetical super intelligence a goal that you might regret later (i.e., misalignment)
Someone else could probably explain this better then me but i will give it a try.
First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment.
The “paperclip” part in paperclip maximizer is just a placeholder, it could have been ”diamonds” or “digits of Pi” or “seconds of runtime” and the end result is the same.
Second, one of the expected properties of a hypothetical super intelligence is having robust goals, as in it doesn’t change it’s goals at all because changing your goals will make you less likely to achive your end goal.
In short not wanting to change your goals is an emergent instrumintal value of having a goal to began with, for a more human example if your goal is to get rich then taking a pill that magically rewires your brain so that you no longer want money is a terrible idea (unless the pill comes with a sum of money that you couldn’t have possible collected on your own but that is a hypothetical that probably wouldn’t ever happen)
The problem is mostly how to rebustly install goals into the AI which our current methods just don’t suffice as the AI often ends up with unintended goals.
If only we had a method of just writting down a utility function that just says “if True: make_humans_happy” instead of beating the model with a stick untill it seems to comply.
I hope that explaines it
I like the point here about how stability of goals might be an instrumentally convergent feature of superintelligence. It’s an interesting point.
On the other hand, intuitive human reasoning would suggest that this is overly inflexible if you ever ask yourself ‘could I ever come up with a better goal than this goal?’. What better would mean for a superintelligence seems hard to define, but it also seems hard to imagine that it would never ask the question.
Separately, your opening statements seem to be at least nearly synonymous to me:
“First off the paperclip maximizer isn’t about how easy it is to give a hypothetical super intelligent a goal that you might regret later and not be able to change.
It is about the fact that almsot every easily specified goal you can give an AI would result in misalignment”
every easily specified goal you can give an AI would result in misalignment ~ = give a hypothetical super intelligence a goal that you might regret later (i.e., misalignment)