David Johnston comments on The Alignment Problem from a Deep Learning Perspective (major rewrite)

David Johnston 11 Jan 2023 10:53 UTC
2 points
1
Regarding internally represented goals:

More abstractly, goal-directed planning is often an efficient way to leverage limited data [Sutton and Barto, 2018], and is important for humans in many domains. Insofar as goal-directed planning is a powerful way to accomplish many useful tasks, we expect that AI developers will increasingly design architectures expressive enough to support (explicit or implicit) planning, and that optimization over those architectures will push policies to develop internally-represented goals (especially when they’re trained on complex long-horizon tasks). So henceforth we assume that policies will learn internally-represented goals as they become more generally capable, and turn our attention to the question of which types of internally-represented goals they might learn.

This seems to me to be a foundational assumption for the rest of the paper and this passage feels a bit offhand, and the relationship to the previous examples isn’t totally clear. Here’s a suggestion for a slight modification:
- Powerful AIs will be able to succesfully carry out a wide range of difficult tasks
- It is in principle possible that they achieve this this as “reflex agents”, but just like lookup tables don’t generalise well for traditional AI, it seems unlikely in practice that reflex agents can generalise well enough to succesfully carry out a wide range of difficult tasks
- Given knowledge about how your actions influence the environment, goal-directed planning does generalise well
- AIs in practice develop internal representations of “how their actions influence the environment” (your evidence review), and, by the above, goal directed planning is therefore a generalisable technique for carrying out a wide range of difficult tasks
- So it seems that goal-directed planning is a more likely AGI structure than the reflex agents (noting that these two possibilities aren’t necessarily mutually exclusive or exhaustive)