There’s a popular concept of “intelligence” as book smarts, like calculus or chess, as opposed to say social skills. So people say that “it takes more than intelligence to succeed in human society”. But social skills reside in the brain, not the kidneys. When you think of intelligence, don’t think of a college professor, think of human beings; as opposed to chimpanzees. If you don’t have human intelligence, you’re not even in the game.
In order to have elite social skills, you need to be able to form accurate models about the thoughts & intentions of others. But being able to form accurate models about the thoughts & intentions of an overseer is exactly the ability we’d like to see in a corrigible AI.
If we can build AI systems that form those models without being goal-driven agents, maybe it’s possible to have the benefits of elite social skills without the costs. I’m optimistic that this is the case—many of our most powerful model-building techniques don’t really behave as though they have some kind of goal they are trying to achieve in the world.
In Superintelligence, Nick Bostrom talks about various “AI superpowers”. One of these is “Social manipulation”, which he summarizes as
And Eliezer Yudkowsky writes:
In order to have elite social skills, you need to be able to form accurate models about the thoughts & intentions of others. But being able to form accurate models about the thoughts & intentions of an overseer is exactly the ability we’d like to see in a corrigible AI.
If we can build AI systems that form those models without being goal-driven agents, maybe it’s possible to have the benefits of elite social skills without the costs. I’m optimistic that this is the case—many of our most powerful model-building techniques don’t really behave as though they have some kind of goal they are trying to achieve in the world.