Signer comments on Varieties Of Doom

Signer 20 Nov 2025 17:11 UTC
1 point
0

But even if that’s the case the central consistently repeated version of the value loading problem in Bostrom 2014 centers on how it’s simply not rigorously imaginable how you would get the relevant representations in the first place.

I’m not so sure. Like, first of all, you mean something like “get before superintelligence” or “get into the goal slot”, because there is obviously a method to just get the representations—just build a superintelligence with a random goal, it will have your representations. That difference was explicitly stated then, it is often explicitly stated now—all that “AI will understand but not care”. The focus on the frameworks where it gets hard to translate from humans to programs is consistent with him trying to constrain methods of generating representations to only useful ones.

There is a reason why it is called “the value loading problem” and not “the value understanding problem”. “The value translation problem” would be somewhat in the middle: having actual human utility program would certainly solve some of Bostrom’s problems.

I don’t know whether Bostrom actually thought about non-superintelligent AI that already understands but don’t care. But I don’t think this line of argumentations of yours is correct about why such a scenario contradicts his points. Even if he didn’t consider it, it’s not “contra”, unless it actually contradicts him. What actually may contradict him is not “AI will understand values early” but “AI will understand values early and training such early AI will make it care about right things”.