Thomas Kwa comments on Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

Thomas Kwa 25 Nov 2023 11:32 UTC
12 points
0
I’d be pretty scared of an oracle AI that could do novel science, and it might still want things internally. If the oracle can truly do well at designing a fusion power plant, it can anticipate obstacles and make revisions to plans just as well as an agent—if not better because it’s not allowed to observe and adapt. I’d be worried that it does similar cognition to the agent, but with all interactions with the environment done in some kind of efficient simulation. Or something more loosely equivalent.
It’s not clear to me that this is as dangerous as having some generalized skill of routing around obstacles as an agent, but I feel like “wants in the behaviorist sense” is not quite the right property to be thinking about because it depends on the exact interface between your AI and the world rather than the underlying cognition.