It seems to occur mostly without RL. People start wanting to have sex before they have actually had sex.
This doesn’t mean that it isn’t a byproduct of RL. Something needs to be hardcoded, but a simple reward circuit might lead to a highly complex set of desires and cognitive machinery. I think the things you are pointing to in this post sound extremely related to what Shard Theory is trying to tackle. https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values
Indeed, this is exactly the kind of thing I am gesturing at. Certainly, all our repertoires of sexual behaviour are significantly shaped by RL. My point is that evolution has somehow in this case mostly solved some pointers-like problem to get the reward model to suddenly include rewards for sexual behaviour, can do so robustly, and can do so a long time after birth after a decade or so of unsupervised learning and RL has already occurred. Moreover, this reward model leads to people robustly pursuing this goal even fairly off-distribution from the ancestral environment.
This doesn’t mean that it isn’t a byproduct of RL. Something needs to be hardcoded, but a simple reward circuit might lead to a highly complex set of desires and cognitive machinery. I think the things you are pointing to in this post sound extremely related to what Shard Theory is trying to tackle.
https://www.lesswrong.com/posts/iCfdcxiyr2Kj8m8mT/the-shard-theory-of-human-values
Indeed, this is exactly the kind of thing I am gesturing at. Certainly, all our repertoires of sexual behaviour are significantly shaped by RL. My point is that evolution has somehow in this case mostly solved some pointers-like problem to get the reward model to suddenly include rewards for sexual behaviour, can do so robustly, and can do so a long time after birth after a decade or so of unsupervised learning and RL has already occurred. Moreover, this reward model leads to people robustly pursuing this goal even fairly off-distribution from the ancestral environment.