jdp comments on Varieties Of Doom

jdp 20 Nov 2025 15:24 UTC
5 points
0
Clearly! I’m a little reluctant to rephrase it until I have a version that I know conveys what I actually meant, but one that would be very semantically close to the original would be:

“—Contra Bostrom 2014 it is possible to get high quality, nuanced representations of concepts like “happiness” at training initialization. The problem of representing happiness and similar ideas in a computer will not be first solved by the world model of a superintelligent or otherwise incorrigible AI, as in the example Bostrom gives on page 147 in the 2017 paperback under the section “Malignant Failure Modes”: “But wait! This is not what we meant! Surely if the AI is superintelligent, it must understand that when we asked it to make us happy, we didn’t mean that it should reduce us to a perpetually repeating recording of a drugged- out digitized mental episode!”—The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that rep- resents this goal.”″

Part of why I didn’t write it that way in the first place is it would make it a lot bulkier than the other bullet points, so I trimmed it down.