This is in principle a thing that Nick Bostrom could have believed while writing Superintelligence but the rest of the book kind of makes it incompatible with Occam’s Razor. It’s possible he meant the issues with translating concepts into discrete program representations as the central difficulty and then whether we would be able to make use of such a representation as a noncentral difficulty. (It’s Bostrom, he’s a pretty smart dude, this wouldn’t surprise me, it might even be in the text somewhere but I’m not reading the whole thing again). But even if that’s the case the central consistently repeated version of the value loading problem in Bostrom 2014 centers on how it’s simply not rigorously imaginable how you would get the relevant representations in the first place.
It’s important to remember also that Bostrom’s primary hypothesis in Superintelligence is that AGI will be produced by recursive self improvement such that it’s genuinely not clear you will have a series of functional non superintelligent AIs with usable representations before you have a superintelligent one. The book very much takes the EY “human level is a weird threshold to expect AI progress to stop at” thesis as the default.
But even if that’s the case the central consistently repeated version of the value loading problem in Bostrom 2014 centers on how it’s simply not rigorously imaginable how you would get the relevant representations in the first place.
I’m not so sure. Like, first of all, you mean something like “get before superintelligence” or “get into the goal slot”, because there is obviously a method to just get the representations—just build a superintelligence with a random goal, it will have your representations. That difference was explicitly stated then, it is often explicitly stated now—all that “AI will understand but not care”. The focus on the frameworks where it gets hard to translate from humans to programs is consistent with him trying to constrain methods of generating representations to only useful ones.
There is a reason why it is called “the value loading problem” and not “the value understanding problem”. “The value translation problem” would be somewhat in the middle: having actual human utility program would certainly solve some of Bostrom’s problems.
I don’t know whether Bostrom actually thought about non-superintelligent AI that already understands but don’t care. But I don’t think this line of argumentations of yours is correct about why such a scenario contradicts his points. Even if he didn’t consider it, it’s not “contra”, unless it actually contradicts him. What actually may contradict him is not “AI will understand values early” but “AI will understand values early and training such early AI will make it care about right things”.
This is in principle a thing that Nick Bostrom could have believed while writing Superintelligence but the rest of the book kind of makes it incompatible with Occam’s Razor. It’s possible he meant the issues with translating concepts into discrete program representations as the central difficulty and then whether we would be able to make use of such a representation as a noncentral difficulty. (It’s Bostrom, he’s a pretty smart dude, this wouldn’t surprise me, it might even be in the text somewhere but I’m not reading the whole thing again). But even if that’s the case the central consistently repeated version of the value loading problem in Bostrom 2014 centers on how it’s simply not rigorously imaginable how you would get the relevant representations in the first place.
It’s important to remember also that Bostrom’s primary hypothesis in Superintelligence is that AGI will be produced by recursive self improvement such that it’s genuinely not clear you will have a series of functional non superintelligent AIs with usable representations before you have a superintelligent one. The book very much takes the EY “human level is a weird threshold to expect AI progress to stop at” thesis as the default.
I’m not so sure. Like, first of all, you mean something like “get before superintelligence” or “get into the goal slot”, because there is obviously a method to just get the representations—just build a superintelligence with a random goal, it will have your representations. That difference was explicitly stated then, it is often explicitly stated now—all that “AI will understand but not care”. The focus on the frameworks where it gets hard to translate from humans to programs is consistent with him trying to constrain methods of generating representations to only useful ones.
There is a reason why it is called “the value loading problem” and not “the value understanding problem”. “The value translation problem” would be somewhat in the middle: having actual human utility program would certainly solve some of Bostrom’s problems.
I don’t know whether Bostrom actually thought about non-superintelligent AI that already understands but don’t care. But I don’t think this line of argumentations of yours is correct about why such a scenario contradicts his points. Even if he didn’t consider it, it’s not “contra”, unless it actually contradicts him. What actually may contradict him is not “AI will understand values early” but “AI will understand values early and training such early AI will make it care about right things”.