More on the accusation against Amodei of making a strawman argument. I think that’s a false accusation. Here’s Amodei:
One of the most important hidden assumptions, and a place where what we see in practice has diverged from the simple theoretical model, is the implicit assumption that AI models are necessarily monomaniacally focused on a single, coherent, narrow goal, and that they pursue that goal in a clean, consequentialist manner. In fact, our researchers have found that AI models are vastly more psychologically complex, as our work on introspection or personas shows. Models inherit a vast range of humanlike motivations or “personas” from pre-training (when they are trained on a large volume of human work).
And Yudkowsky:
A paperclip maximizer is not “monomoniacally” “focused” on paperclips. We talked about a superintelligence that wanted 1 thing, because you get exactly the same results as from a superintelligence that wants paperclips and staples (2 things), or from a superintelligence that wants 100 things. The number of things It wants bears zero relevance to anything. It’s just easier to explain the mechanics if you start with a superintelligence that wants 1 thing, because you can talk about how It evaluates “number of expected paperclips resulting from an action” instead of “expected paperclips * 2 + staples * 3 + giant mechanical clocks * 1000” and onward for a hundred other terms of Its utility function that all asymptote at different rates.
What do you get from a consequentialist superintelligence that wants paperclips and staples? By default, you get a squiggle maximizer. The molecular squiggles are one of three things:
The cheapest molecule that counts as a paperclip
The cheapest molecule that counts as a staple
The cheapest molecule that counts as both a paperclip and a staple
The same applies for a consequentialist superintelligence that has a hundred other simple terms of its utility function. If the terms of its utility function asymptote at different rates then almost all of the universe is converted into whatever asymptotes at the slowest rate. The utility function may be complex in some mathematical sense, but the behavior is still “monomaniacally focused on a single, coherent, narrow goal”. One might also call it the behavior of ruthless sociopath.
To me, the main error of Amodei’s quote is calling this a “hidden assumption”, when it’s actually a carefully argued conclusion.
Yes, many have predicted a risk from future AIs being ruthless sociopathic consequentialists
No, current LLM AIs are not ruthless sociopathic consequentialists as far as we can tell
That’s because current LLM AIs are not consequentialists.
However, either we’ll get future AIs that use a different paradigm and are consequentialists, or we’ll find a way to make LLM AIs that are consequentialist, because consequentialism is effective.
More on the accusation against Amodei of making a strawman argument. I think that’s a false accusation. Here’s Amodei:
And Yudkowsky:
What do you get from a consequentialist superintelligence that wants paperclips and staples? By default, you get a squiggle maximizer. The molecular squiggles are one of three things:
The cheapest molecule that counts as a paperclip
The cheapest molecule that counts as a staple
The cheapest molecule that counts as both a paperclip and a staple
The same applies for a consequentialist superintelligence that has a hundred other simple terms of its utility function. If the terms of its utility function asymptote at different rates then almost all of the universe is converted into whatever asymptotes at the slowest rate. The utility function may be complex in some mathematical sense, but the behavior is still “monomaniacally focused on a single, coherent, narrow goal”. One might also call it the behavior of ruthless sociopath.
To me, the main error of Amodei’s quote is calling this a “hidden assumption”, when it’s actually a carefully argued conclusion.
A better response is Why we should expect ruthless sociopath ASI. This responds to the observations underlying Amodei:
Yes, many have predicted a risk from future AIs being ruthless sociopathic consequentialists
No, current LLM AIs are not ruthless sociopathic consequentialists as far as we can tell
That’s because current LLM AIs are not consequentialists.
However, either we’ll get future AIs that use a different paradigm and are consequentialists, or we’ll find a way to make LLM AIs that are consequentialist, because consequentialism is effective.
Shortly after that, everyone dies.
Unless we don’t build it.