the value generators are about as simple and general as we could have gotten
Would you say it’s something like empowerment? Quoting Jacob:
Empowerment provides a succinct unifying explanation for much of the apparent complexity of human values: our drives for power, knowledge, self-actualization, social status/influence, curiosity and even fun[4] can all be derived as instrumental subgoals or manifestations of empowerment. Of course empowerment alone can not be the only value or organisms would never mate: sexual attraction is the principle deviation later in life (after sexual maturity), along with the related cooperative empathy/love/altruism mechanisms to align individuals with family and allies (forming loose hierarchical agents which empowerment also serves).
The key central lesson that modern neuroscience gifted machine learning is that the vast apparent complexity of the adult human brain, with all its myriad task specific circuitry, emerges naturally from simple architectures and optimization via simple universal learning algorithms over massive data. Much of the complexity of human values likewise emerges naturally from the simple universal principle of empowerment.
Empowerment-driven learning (including curiosity as an instrumental subgoal of empowerment) is the clear primary driver of human intelligence in particular, and explains the success of video games as empowerment superstimuli and fun more generally.
This is good news for alignment. Much of our values—although seemingly complex—derive from a few simple universal principles. Better yet, regardless of how our specific terminal values/goals vary, our instrumental goals simply converge to empowerment regardless. Of course instrumental convergence is also independently bad news, for it suggests we won’t be able to distinguish altruistic and selfish AGI from their words and deeds alone. But for now, let’s focus on that good news:
Safe AI does not need to learn a detailed accurate model of our values. It simply needs to empower us.
Would you say it’s something like empowerment? Quoting Jacob: