I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas.
There is no highly viral meme going around right now about producing tons of paperclips.
A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times).
One might argue that these things give humans joy, so they don’t count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some ‘enjoyment’ from the paperclips… but I don’t think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful.
No egregore has turned the entire world to paperclips just yet. But of course that hasn’t happened, else we would have already lost.
Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don’t directly care about, for non-obvious reasons. Those things could be paperclips.
(And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don’t think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I really appreciate your list of claims and unclear points. Your succinct summary is helping me think about these ideas.
A few examples came to mind: sports paraphernalia, tabletop miniatures, and stuffed animals (which likely outnumber real animals by hundreds or thousands of times).
One might argue that these things give humans joy, so they don’t count. There is some validity to that. AI paperclips are supposed to be useless to humans. On the other hand, one might also argue that it is unsurprising that subsystems repurposed to seek out paperclips derive some ‘enjoyment’ from the paperclips… but I don’t think that argument will hold water for these examples. Looking at it another way, some amount of paperclips are indeed useful.
No egregore has turned the entire world to paperclips just yet. But of course that hasn’t happened, else we would have already lost.
Even so: consider paperwork (like the tax forms mentioned in the post), skill certifications in the workplace, and things like slot machines and reality television. A lot of human effort is wasted on things humans don’t directly care about, for non-obvious reasons. Those things could be paperclips.
(And perhaps some humans derive genuine joy out of reality television, paperwork, or giant piles of paperclips. I don’t think that changes my point that there is evidence of egregores wasting resources.)
I think the point under contention isn’t whether current egregores are (in some sense) “optimizing” for things that would score poorly according to human values (they are), but whether the things they’re optimizing for have some (clear, substantive) relation to the things a misaligned AGI will end up optimizing for, such that an intervention on the whole egregores situation would have a substantial probability of impacting the eventual AGI.
To this question I think the answer is a fairly clear “no”, though of course this doesn’t invalidate the possibility that investigating how to deal with egregores may result in some non-trivial insights for the alignment problem.
I agree with you.
I also don’t think it matters whether the AGI will optimize for something current egregores care about.
What matters is whether current egregores will in fact create AGI.
The fear around AI risk is that the answer is “inevitably yes”.
The current egregores are actually no better at making AGI egregore-aligned than humans are at making it human-aligned.
But they’re a hell of a lot better at making AGI accidentally, and probably at all.
So if we don’t sort out how to align egregores, we’re fucked — and so are the egregores.
I think I see what you mean. A new AI won’t be under the control of egregores. It will be misaligned to them as well. That makes sense.