Okay, but it looks like original inner misalignment problem? Either model has wrong representation for “human values”, or we fail to recognize proper representation and make it optimize for something else?
On the other hand, properly optimized for human values world should look very weird. It likely includes a lot of aliens having a lot of weird alien fun, and weird qualia factories and...
Nah, I don’t think so. Take the diamond maximizer problem—one problem is finding the function that physically maximizes diamond, e.g. as Julia code. The other one is getting your maximizer/neural network to point to that reliably maximizable function.
As for the “properly optimized human values”, yes. Our world looks quite DeepDream dogs-like compared to the ancestral environment (and, now that I think of it, maybe the degrowth/retvrn/convservative people can be thought of as claiming that our world is already “human value slop” in a number of ways—if you take a look at YouTube shorts and New York Times Square they’re not that different).
Okay, but it looks like original inner misalignment problem? Either model has wrong representation for “human values”, or we fail to recognize proper representation and make it optimize for something else?
On the other hand, properly optimized for human values world should look very weird. It likely includes a lot of aliens having a lot of weird alien fun, and weird qualia factories and...
Nah, I don’t think so. Take the diamond maximizer problem—one problem is finding the function that physically maximizes diamond, e.g. as Julia code. The other one is getting your maximizer/neural network to point to that reliably maximizable function.
As for the “properly optimized human values”, yes. Our world looks quite DeepDream dogs-like compared to the ancestral environment (and, now that I think of it, maybe the degrowth/retvrn/convservative people can be thought of as claiming that our world is already “human value slop” in a number of ways—if you take a look at YouTube shorts and New York Times Square they’re not that different).