Thanks for writing the paper! I think it will be really impactful and I think it fills a big gap in the literature.
I’ve always wondered what problems RLHF had and mostly I’ve seen only short informal answers about how it incentivizes deception or how humans can’t provide a scalable signal for superhuman tasks which is odd because it’s one of the most commonly used AI alignment methods.
Before your paper, I think this post was the most in-depth analysis of problems with RLHF I’ve seen so I think your paper is now probably the best resource for problems with RLHF. Apart from that post, the List of Lethalities post has a few related sections and this post by John Wentworth has a section on RLHF.
I’m sure your paper will spark future research on improving RLHF because it lists several specific discrete problems that could be tackled!
Thanks, and +1 to adding the resources. Also Charbel-Raphael who authored the in-depth post is one of the authors of this paper! That post in particular was something we paid attention to during the design of the paper.
Thanks for writing the paper! I think it will be really impactful and I think it fills a big gap in the literature.
I’ve always wondered what problems RLHF had and mostly I’ve seen only short informal answers about how it incentivizes deception or how humans can’t provide a scalable signal for superhuman tasks which is odd because it’s one of the most commonly used AI alignment methods.
Before your paper, I think this post was the most in-depth analysis of problems with RLHF I’ve seen so I think your paper is now probably the best resource for problems with RLHF. Apart from that post, the List of Lethalities post has a few related sections and this post by John Wentworth has a section on RLHF.
I’m sure your paper will spark future research on improving RLHF because it lists several specific discrete problems that could be tackled!
Thanks, and +1 to adding the resources. Also Charbel-Raphael who authored the in-depth post is one of the authors of this paper! That post in particular was something we paid attention to during the design of the paper.