Paperclip metaphor is not very useful if interpreted as “humans tell the AI to make paperclips, and it does that, and the danger comes from doing exactly what we said because we said a dumb goal”.
There is a similar-ish interpretation, which is good and useful, which is “if the AI is going to do exactly what you say, you have to be insanely precise when you tell it what to do, otherwise it will Goodhart the goal.” The danger comes from Goodharting, rather than humans telling it a dumb goal. The paperclip example can be used to illustrate this, and I think this is why it’s commonly used.
And he is referencing in the first tweet (with inner alignment), that we will have very imprecise (think evolution-like) methods of communicating a goal to an AI-in-training.
So apparently he intended the metaphor to communicate that the AI-builders weren’t trying to set “make paperclips” as the goal, they were aiming for a more useful goal and “make paperclips” happened to be the goal that it latched on to. Tiny molecular squiggles is better here because it’s a more realistic optima of an imperfectly learned goal representation.
So apparently he intended the metaphor to communicate that the AI-builders weren’t trying to set “make paperclips” as the goal, they were aiming for a more useful goal and “make paperclips” happened to be the goal that it latched on to.
So, something like: “AI, do things that humans consider valuable!” and the AI going: “uhm, actually paperclips have a very good cost:value ratio if you produce them in mass...”?
Paperclip metaphor is not very useful if interpreted as “humans tell the AI to make paperclips, and it does that, and the danger comes from doing exactly what we said because we said a dumb goal”.
There is a similar-ish interpretation, which is good and useful, which is “if the AI is going to do exactly what you say, you have to be insanely precise when you tell it what to do, otherwise it will Goodhart the goal.” The danger comes from Goodharting, rather than humans telling it a dumb goal. The paperclip example can be used to illustrate this, and I think this is why it’s commonly used.
And he is referencing in the first tweet (with inner alignment), that we will have very imprecise (think evolution-like) methods of communicating a goal to an AI-in-training.
So apparently he intended the metaphor to communicate that the AI-builders weren’t trying to set “make paperclips” as the goal, they were aiming for a more useful goal and “make paperclips” happened to be the goal that it latched on to. Tiny molecular squiggles is better here because it’s a more realistic optima of an imperfectly learned goal representation.
Yes, this makes a lot of sense, thank you.
So, something like: “AI, do things that humans consider valuable!” and the AI going: “uhm, actually paperclips have a very good cost:value ratio if you produce them in mass...”?