cubefox comments on Why not train reasoning models with RLHF?

cubefox 30 Jan 2025 18:02 UTC
2 points
0
Actually the paper doesn’t have any more on this topic than the paragraph above.
- Caleb Biddulph 30 Jan 2025 21:12 UTC
  1 point
  0
  Parent
  Yeah, but there are probably other interesting takeaways