I think using external empowerment as a simpler bound on the long term component of humanity’s utility function is one of the more exciting developments in AI alignment. Your article has a number of interesting links to prior work that I completely missed.
Part of my reason for optimism is I have a model of how empowerment naturally emerges in efficient long term planning from compounding model uncertainty, and there is already pretty strong indirect evidence the brain uses empowerment. So that allows us to avoid the issue of modeling human utility function(s) precisely enough to avoid inevitable long term drift, because the long term component just converges to empowerment.
There are two large caveats: one—as you discuss here—is that it seems hard to avoid the need to also explicitly model the short term reward and the tradeoff therein (as humans do want to periodically ‘spend’ some of their optionality). I’m reasonably optimistic about that part, as most ways we spend optionality are more like conversions between different forms of optionality.
The second major issue is identifying the correct agent(s) to empower, which clearly should not just be individual human brains. Humans are somewhat altruistic, so just optimizing for empowerment of a single human brain is incorrect even for the long term component. I think we can handle this by optimizing 1. for humanity or agency more broadly, and or 2. using some more complex distributed software mind theory.
I think using external empowerment as a simpler bound on the long term component of humanity’s utility function is one of the more exciting developments in AI alignment. Your article has a number of interesting links to prior work that I completely missed.
Part of my reason for optimism is I have a model of how empowerment naturally emerges in efficient long term planning from compounding model uncertainty, and there is already pretty strong indirect evidence the brain uses empowerment. So that allows us to avoid the issue of modeling human utility function(s) precisely enough to avoid inevitable long term drift, because the long term component just converges to empowerment.
There are two large caveats: one—as you discuss here—is that it seems hard to avoid the need to also explicitly model the short term reward and the tradeoff therein (as humans do want to periodically ‘spend’ some of their optionality). I’m reasonably optimistic about that part, as most ways we spend optionality are more like conversions between different forms of optionality.
The second major issue is identifying the correct agent(s) to empower, which clearly should not just be individual human brains. Humans are somewhat altruistic, so just optimizing for empowerment of a single human brain is incorrect even for the long term component. I think we can handle this by optimizing 1. for humanity or agency more broadly, and or 2. using some more complex distributed software mind theory.
That link doesn’t seem correct?
not sure what you mean here. As a heuristic present in biases? or in split second reactions?