My updated thoughts are: Still a great post, not as polished as it should be though. That’s OK. The important thing is that it compiles a big list of problems and alleged problems for RLHF, with links.
Here is the polished version from our team led by Stephen Casper and Xander Davies: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback :)
My updated thoughts are: Still a great post, not as polished as it should be though. That’s OK. The important thing is that it compiles a big list of problems and alleged problems for RLHF, with links.
Here is the polished version from our team led by Stephen Casper and Xander Davies: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback :)