Updated version of the paper with results from DeepSeek, and cool discussion about reward models here
https://x.com/OwainEvans_UK/status/1891889067228528954
Updated version of the paper with results from DeepSeek, and cool discussion about reward models here
https://x.com/OwainEvans_UK/status/1891889067228528954