Evan R. Murphy comments on Ngo and Yudkowsky on alignment difficulty

Evan R. Murphy 28 Nov 2021 0:50 UTC
1 point
0
AF
Richard, summarized by Richard: “Consider an AI that, given a hypothetical scenario, tells us what the best plan to achieve a certain goal in that scenario is. Of course it needs to do consequentialist reasoning to figure out how to achieve the goal. But that’s different from an AI which chooses what to say as a means of achieving its goals. [...]”
Eliezer, summarized by Richard: “The former AI might be slightly safer than the latter if you could build it, but I think people are likely to dramatically overestimate how big the effect is. The difference could just be one line of code: if we give the former AI our current scenario as its input, then it becomes the latter.
How does giving the former “planner” AI the current scenario as input turn it into the latter “acting” AI? It still only outputs a plan, which then the operators can review and decide whether or not to carry out.
Also, the planner AI that Richard put forth had two inputs, not one. The inputs were: 1) a scenario, and 2) a goal. So for Eliezer (or anyone who confidently understood this part of the discussion), which goal input are you providing to the planner AI in this situation? Are you saying that the planner AI becomes dangerous when it’s provided with the current scenario and any goal as inputs?