ryan_greenblatt comments on Shortform

ryan_greenblatt 1 Feb 2025 16:02 UTC
2 points
0
- I think it’s much easier to RL on huge numbers of math problems, including because it is easier to verify and because you can more easily get many problems. Also, for random reasons, doing single turn RL is substantially less complex and maybe faster than multi turn RL on agency (due to variable number of steps and variable delay from environments)
- OpenAI probably hasn’t gotten around to doing as much computer use RL partially due to prioritization.