How do you know that humans and LLMs/current RL agents do optimize the reward? Are there any known theorems or papers on this, because this claim is at least a little bit important.
You may answer here:
https://www.lesswrong.com/posts/GDnRrSTvFkcpShm78/when-is-reward-ever-the-optimization-target
How do you know that humans and LLMs/current RL agents do optimize the reward? Are there any known theorems or papers on this, because this claim is at least a little bit important.
You may answer here:
https://www.lesswrong.com/posts/GDnRrSTvFkcpShm78/when-is-reward-ever-the-optimization-target