dr_s comments on The Game of Dominance

dr_s 28 Aug 2023 15:10 UTC
2 points
0
I think this is sort of the idea behind a satisficer. Make something that basically never tries too hard, therefore will never reach up to the “conquer the world” class of solutions as they’re way too extreme and you can do good enough with far less. That said, I’m not sure if satisficers are actually proven to be fully safe either.
- Noosphere89 28 Aug 2023 16:40 UTC
  2 points
  0
  Parent
  Something like this is argued to be why humans are frankly exceptionally well aligned to basic homeostatic drives, and the only real failure modes that happened are basically obesity, drugs and maybe alcohol as things that misaligned us with basic needs, as hedonic treadmills/loops essentially tame the RL part of us, and make sure that reward isn’t the optimization target in practice, like TurnTrout’s post below:
  
  https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target
  
  Similarly, 2 beren posts below explain how the PID control loop may be helpful for alignment:
  
  https://www.lesswrong.com/posts/3mwfyLpnYqhqvprbb/hedonic-loops-and-taming-rl
  
  https://www.beren.io/2022-11-29-Preventing-Goodheart-with-homeostatic-rewards/