Looks like you meant to write something more here? Or is the bottleneck what you wrote in point 3?
But also, like, a lot of capability gains are coming from “take a long time to figure this out, train to figure it out more quickly than that” as a core part of what’s going on inside RLVR. So I don’t think it’s a total wash either.
This is true. But also, in order for this to work in the case of a freaking big [space of things to be figured out], you need to start with some [good search heuristics]/[capacity to predict which search paths are promising to pursue]. It seems to me that tons of available [easily checkable-for-validity examples] on the internet of math, programming, etc, suffice to give you such heuristics/prediction/planning skills in the human range and somewhat extrapolate/enhance/refine them (via the “how could I have thought that faster?” method, as you say), but that this approach has its limits.
Looks like you meant to write something more here? Or is the bottleneck what you wrote in point 3?
This is true. But also, in order for this to work in the case of a freaking big [space of things to be figured out], you need to start with some [good search heuristics]/[capacity to predict which search paths are promising to pursue]. It seems to me that tons of available [easily checkable-for-validity examples] on the internet of math, programming, etc, suffice to give you such heuristics/prediction/planning skills in the human range and somewhat extrapolate/enhance/refine them (via the “how could I have thought that faster?” method, as you say), but that this approach has its limits.