The Promethean Servant doesn’t have to be able to generate all those answers. If we could hardcode all of those and programmed it to never make decisions related to them, it would still be dangerous. For instance, if it thought “Fetching coffee is easier when more coffee is nearby->Coffee is most nearby when everything is coffee->convert all possible resources into coffee to maximize fetching”).
We have to imagine a system not specifically designed to fetch the coffee that happens to be instructed to ‘fetch the coffee’. Everything to do with the understanding of any instruction it is given has to be generated by higher level principles.
You should be able to see before any coffee fetching instruction was ever uttered how other problems would be approached by the agent. There’s a sense in which understanding ‘fetch the coffee’ also entails exclusion of things which aren’t fetching the coffee such as transforming the building into a cafetiere. But ‘don’t turn the building into a cafetiere’ is not a rule specified in any dictionary. It is though, the kind of rule that could be generated on the fly by a kernel operating on the principle that the major effects of verbing a noun will tend to be on the noun. The installation of this principle would, to some extent, be visible from behaviours in other scenarios (did the robot use Jupiter to make a giant mechanical leg to kick the Earth when instructed to ‘kick the ball’).
The very idea of an AGI must surely be more like a general solution to a family of problems, than a family of solutions mapping in to a family of problems.
I think the second robot you’re talking about isn’t the candidate for the AGI-could-kill-us-all level alignment concern. It’s more like a self driving car that could hit someone due to inadequate testing.
Guess I’m not sure though how many answers to our questions you envisage the agent you’re describing generating from second principles. That’s the nub here because both the agents I tried to describe above fit the bill of coffee fetching, but with clearly varying potential for world-ending generalisation.