Thanks for doing this!
I was trying to work out how the alignment problem could be framed as a game design problem and I got stuck on this idea of rewards being of different ‘types’. Like, when considering reward hacking, how would one hack the reward of reading a book or exploring a world in a video game? Is there such a thing as ‘types’ of reward in how reward functions are currently created? Or is it that I’m failing to introspect on reward types and they are essentially all the same pain/pleasure axis attached to different items?
That last explanation seems hard to resolve with the huge difference in qualia between different motivational sources (like reading a book versus eating food versus hugging a friend… These are not all the same ‘type’ of good, are they?)
Sorry if my question is a little confused. I was trying to convey my thought process. The core question is really:
Is there any material on why ‘types’ of reward signals can or can’t exist for AI and what that looks like?
I disagree humans don’t optimize IGF:
We seem to have different observational data. I do know some people who make all their major life decisions based on quality and quantity of offspring. Most of them are female but this might be a bias in my sample. Specifically, quality trades off against quantity: waiting to find a fitter partner and thus losing part of your reproductive window is a common trade off. Similarly, making sure your children have much better lives than you by making sure your own material circumstances (or health!) are better is another. To be fair, they seem to be a small minority currently but I think that is due to point 3 and would be rectified in more a constant environment.
A lot of our drives do indirectly help IGF. Your aestethic sense may be somewhat wired to your ability to recognize and enjoy the visual appearance of healthy mates. Similarly for healthy environments to grow up in, etc. Sure, it gets hijacked for 20 other things, but how big is the loss in IGF to keep it around? I would argue it’s generally not an issue for the subsection of humans that are directly driven to have big families.
Many of us have badly optimized drives cause our environments have changed too fast. It will take a few generations of constant environment (not gonna happen at our current level of technological progress) to catch up. The obvious example is birth control: sex drive used to actually be a great proxy signal to optimize on offspring. Now it’s no longer but we still love sex. But in a few generations the only people alive are the descendants of people who wanted kids no matter their sex drive. ‘evolution’ will now select directly on desire for kids but it takes awhile to catch up.
I’m not saying evolution optimized us very well, but I don’t think it’s accurate to say that we are not IGF maximizers. The environment has just changed much too quickly and selection pressure has been low the last few generations, but things like birth control actually introduce a new selection pressure on drive to reproduce. Humans are mediocre IGF maximizers in an environment that is changing unusually fast.