After this section, it feels like the “do what I mean”/”do what I want” instruction pretty much solves the problem of what we want the AI to value. If the creator the of the AI doesn’t want things that work to a good future, then it seems like they would be unlikely to succeed in specifying a good future through other means. On the other hand, if the creator wants the right thing, then DWIM seems to avoid all perverse instantiations. Additionally, it seems like the only technical requirement is that the AI be able to follow natural language instructions (maybe with a bit of simpler definitions of value for the AI to use while it is still learning). Overall, my impression is that this area doesn’t require nearly as much work as other parts of superintelligence design (such as getting an AI to value goals described in natural language in the first place).
After this section, it feels like the “do what I mean”/”do what I want” instruction pretty much solves the problem of what we want the AI to value. If the creator the of the AI doesn’t want things that work to a good future, then it seems like they would be unlikely to succeed in specifying a good future through other means. On the other hand, if the creator wants the right thing, then DWIM seems to avoid all perverse instantiations. Additionally, it seems like the only technical requirement is that the AI be able to follow natural language instructions (maybe with a bit of simpler definitions of value for the AI to use while it is still learning). Overall, my impression is that this area doesn’t require nearly as much work as other parts of superintelligence design (such as getting an AI to value goals described in natural language in the first place).