J Bostock comments on Jemist’s Shortform

J Bostock 28 Jul 2025 21:15 UTC
2 points
0
The constant hazard rate model probably predicts exponential training inference (i.e. the inference done during guess and check RL) compute requirements agentic RL with a given model, because as hazard rate decreases exponentially, we’ll need to sample exponentially more tokens to see an error, and we need to see an error to get any signal.