At an early singularity summit, Jürgen Schmidhuber, who did some of the pioneering work on self-modifying agents that preserve their own utility functions with his Gödel machine, also solved the friendly AI problem. Yes, he came up with the one true utility function that is all you need to program into AGIs!
(For God’s sake, don’t try doing this yourselves. Everyone does it. They all come up with different utility functions. It’s always horrible.)
His one true utility function was “increasing the compression of environmental data.” Because science increases the compression of environmental data: if you understand science better, you can better compress what you see in the environment. Art, according to him, also involves compressing the environment better. I went up in Q&A and said, “Yes, science does let you compress the environment better, but you know what really maxes out your utility function? Building something that encrypts streams of 1s and 0s using a cryptographic key, and then reveals the cryptographic key to you.”
At first it seemed to me that EY refutes the entire idea that “increasing the compression of environmental data” is intrinsically valuable. This surprised me because my intuition says it is intrinsically valuable, though less so than other things I value.
But EY’s larger point was just that it’s highly nontrivial for people to imagine the global maximum of a function. In this specific case, building a machine that encrypts random data seems like a failure of embedded agency rather than a flaw in the idea behind the utility function. What’s going on here?
Something like Goodhart’s Law, I suppose. There are natural situations where X is associated with something good, but literally maximizing X is actually quite bad. (Having more gold would be nice. Converting the entire universe into atoms of gold, not necessarily so.)
EY has practiced the skill of trying to see things like a machine. When people talk about “maximizing X”, they usually mean “trying to increase X in a way that proves my point”; i.e. they use motivated thinking.
Whatever X you take, the priors are almost 100% that literally maximizing X would be horrible. That includes the usual applause lights, whether they appeal to normies or nerds.
Eliezer Yudkowsky wrote in 2016:
At first it seemed to me that EY refutes the entire idea that “increasing the compression of environmental data” is intrinsically valuable. This surprised me because my intuition says it is intrinsically valuable, though less so than other things I value.
But EY’s larger point was just that it’s highly nontrivial for people to imagine the global maximum of a function. In this specific case, building a machine that encrypts random data seems like a failure of embedded agency rather than a flaw in the idea behind the utility function. What’s going on here?
Something like Goodhart’s Law, I suppose. There are natural situations where X is associated with something good, but literally maximizing X is actually quite bad. (Having more gold would be nice. Converting the entire universe into atoms of gold, not necessarily so.)
EY has practiced the skill of trying to see things like a machine. When people talk about “maximizing X”, they usually mean “trying to increase X in a way that proves my point”; i.e. they use motivated thinking.
Whatever X you take, the priors are almost 100% that literally maximizing X would be horrible. That includes the usual applause lights, whether they appeal to normies or nerds.