The constant hazard rate model probably predicts exponential training inference (i.e. the inference done during guess and check RL) compute requirements agentic RL with a given model, because as hazard rate decreases exponentially, we’ll need to sample exponentially more tokens to see an error, and we need to see an error to get any signal.
The constant hazard rate model probably predicts exponential training inference (i.e. the inference done during guess and check RL) compute requirements agentic RL with a given model, because as hazard rate decreases exponentially, we’ll need to sample exponentially more tokens to see an error, and we need to see an error to get any signal.