RSS

Ran W

Karma: 13

Why do we need RLHF? Imi­ta­tion, In­verse RL, and the role of reward

Ran W3 Feb 2024 4:00 UTC
12 points
0 comments5 min readLW link