On the contrary, it seems like this is the area where we should be able to best apply RL, since there is a clear reward signal.
Is there? It’s one thing to verify whether a proof is correct; whether an expression (posed by a human!) is tautologous to a different expression (also posed by a human!). But what’s the ground-truth signal for “the framework of Bayesian probability/category theory is genuinely practically useful”?
This is the reason I’m bearish on the reasoning models even for math. The realistic benefits of them seem to be:
Much faster feedback loops on mathematical conjectures.
Solving long-standing mathematical challenges such as Riemann’s or P vs. NP.
Mathematicians might be able to find hints of whole new math paradigms in the proofs for the long-standing challenges the models generate.
Of those:
(1) still requires mathematicians to figure out which conjectures are useful. It compresses hours, days, weeks, or months (depending on how well it scales) of a very specific and niche type of work into minutes, which is cool, but not Singularity-tier.
(2) is very speculative. It’s basically “compresses decades of work into minutes”, while the current crop of reasoning models can barely solve problems that ought to be pretty “shallow” from their perspective. Maybe Altman is right, the paradigm is in its GPT-2 stage, and we’re all about to be blown away by what they’re capable of. Or maybe it doesn’t scale past the frontier of human mathematical knowledge very well at all, and the parallels with AlphaZero are overstated. We’ll see.
(3) is dependent on (2) working out.
(The reasoning-model hype is so confusing for me. Superficially there’s a ton of potential, but I don’t think there’s been any real indication they’re up to the real challenges still ahead.)
Is there? It’s one thing to verify whether a proof is correct; whether an expression (posed by a human!) is tautologous to a different expression (also posed by a human!). But what’s the ground-truth signal for “the framework of Bayesian probability/category theory is genuinely practically useful”?
This is the reason I’m bearish on the reasoning models even for math. The realistic benefits of them seem to be:
Much faster feedback loops on mathematical conjectures.
Solving long-standing mathematical challenges such as Riemann’s or P vs. NP.
Mathematicians might be able to find hints of whole new math paradigms in the proofs for the long-standing challenges the models generate.
Of those:
(1) still requires mathematicians to figure out which conjectures are useful. It compresses hours, days, weeks, or months (depending on how well it scales) of a very specific and niche type of work into minutes, which is cool, but not Singularity-tier.
(2) is very speculative. It’s basically “compresses decades of work into minutes”, while the current crop of reasoning models can barely solve problems that ought to be pretty “shallow” from their perspective. Maybe Altman is right, the paradigm is in its GPT-2 stage, and we’re all about to be blown away by what they’re capable of. Or maybe it doesn’t scale past the frontier of human mathematical knowledge very well at all, and the parallels with AlphaZero are overstated. We’ll see.
(3) is dependent on (2) working out.
(The reasoning-model hype is so confusing for me. Superficially there’s a ton of potential, but I don’t think there’s been any real indication they’re up to the real challenges still ahead.)