Minimizing Empowerment for Safety

I haven’t put much thought into this post; it’s off the cuff.

DeepMind has published a couple of papers on maximizing empowerment as a form of intrinsic motivation for Unsupervised RL /​ Intelligent Exploration.

I never looked at either paper in detail, but the basic idea is that you should seek to maximize mutual information between (future) outcomes and actions or policies/​options. Doing so means an agent knows what strategy to follow to accomplish a given outcome.

It seems plausible that instead minimizing empowerment in the case where there is a reward function could help steer an agent away from pursuing instrumental goals which have large effects.

So that might be useful for “taskification”, “limited impact”, etc.