[Question] What’s state-of-the-art in AI understanding of theory of mind?

Sparked by Eric Topol, I’ve been think­ing lately about biolog­i­cal com­plex­ity, psy­chol­ogy, and AI safety.

A promi­nent con­cern in the AI safety com­mu­nity is the prob­lem of in­stru­men­tal con­ver­gence – for al­most any ter­mi­nal goal, agents will con­verge on in­stru­men­tal goals are helpful for fur­ther­ing the ter­mi­nal goal, e.g. self-preser­va­tion.

The story goes some­thing like this:

  • AGI is given (or ar­rives at) a ter­mi­nal goal

  • AGI learns that self-preser­va­tion is im­por­tant for in­creas­ing its chances of achiev­ing its ter­mi­nal goal

  • AGI learns enough about the world to re­al­ize that hu­mans are a sub­stan­tial threat to its self-preservation

  • AGI finds a way to ad­dress this threat (e.g. by kil­ling all hu­mans)

It oc­curred to me that to be re­ally effec­tive at find­ing & de­ploy­ing a way to kill all hu­mans, the AGI would prob­a­bly need to know a lot about hu­man biol­ogy (and also mar­kets, bu­reau­cra­cies, sup­ply chains, etc.).

We hu­mans don’t have yet a clean un­der­stand­ing of hu­man biol­ogy, and it doesn’t seem like an AGI could get to a su­per­hu­man un­der­stand­ing of biol­ogy with­out run­ning many more em­piri­cal tests (on hu­mans), which would be pretty easy to ob­serve.

Then it oc­curred to me that maybe the AGI doesn’t ac­tu­ally to know a lot about hu­man biol­ogy to de­velop a way to kill all hu­mans. But it seems like it would still need to have a worked-out the­ory of mind, just to get to the point of un­der­stand­ing that hu­mans are agent-like things that could bear on the AGI’s self-preser­va­tion.

So now I’m cu­ri­ous about where the state of the art is for this. From my (lay) un­der­stand­ing, it doesn’t seem like GPT-2 has any­thing ap­prox­i­mat­ing a the­ory of mind. Per­haps OpenAI’s Dota sys­tem or Deep­Mind’s AlphaS­tar is the state of the art here, the­ory-of-mind-wise? (To be suc­cess­ful at Dota or Star­craft, you need to un­der­stand that there are other things in your en­vi­ron­ment that are agent-y & will work against you in some cir­cum­stances.)

Cu­ri­ous what else is in the liter­a­ture about this, and also about how im­por­tant it seems to oth­ers.