Jeremy Gillen comments on What’s wrong with the paperclips scenario?

Jeremy Gillen 7 Jan 2023 18:48 UTC
29 points
0
Paperclip metaphor is not very useful if interpreted as “humans tell the AI to make paperclips, and it does that, and the danger comes from doing exactly what we said because we said a dumb goal”.
There is a similar-ish interpretation, which is good and useful, which is “if the AI is going to do exactly what you say, you have to be insanely precise when you tell it what to do, otherwise it will Goodhart the goal.” The danger comes from Goodharting, rather than humans telling it a dumb goal. The paperclip example can be used to illustrate this, and I think this is why it’s commonly used.
And he is referencing in the first tweet (with inner alignment), that we will have very imprecise (think evolution-like) methods of communicating a goal to an AI-in-training.
So apparently he intended the metaphor to communicate that the AI-builders weren’t trying to set “make paperclips” as the goal, they were aiming for a more useful goal and “make paperclips” happened to be the goal that it latched on to. Tiny molecular squiggles is better here because it’s a more realistic optima of an imperfectly learned goal representation.
- No77e 7 Jan 2023 18:51 UTC
  5 points
  0
  Parent
  Yes, this makes a lot of sense, thank you.
- Viliam 8 Jan 2023 23:46 UTC
  2 points
  0
  Parent
  So apparently he intended the metaphor to communicate that the AI-builders weren’t trying to set “make paperclips” as the goal, they were aiming for a more useful goal and “make paperclips” happened to be the goal that it latched on to.
  So, something like: “AI, do things that humans consider valuable!” and the AI going: “uhm, actually paperclips have a very good cost:value ratio if you produce them in mass...”?