When you go through a textbook, there are confusions you can notice but not yet immediately resolve, and these could plausibly become RLVR tasks. To choose and formulate some puzzle as an RLVR task, the AI would need to already understand the context of that puzzle, but then training on that task makes it ready to understand more. Setting priorities for learning seems like a general skill that adapts to various situations as you learn to understand them better. As with human learning, the ordering from more familiar lessons to deeper expertise would happen naturally for AI instances as they engage in active learning about their situations.
I think the schleppy path of “learn skills by intentionally training on those specific skills” will be the main way AIs get better in the next few years.
So my point is that automating just this thing might be sufficient, and the perception of its schleppiness is exactly the claim of its generalizability. You need expertise sufficient to choose and formulate the puzzles, not yet sufficient to solve them, and this generation-verification gap keeps moving the frontier of understanding forward, step by step, but potentially indefinitely.
You need expertise sufficient to choose and formulate the puzzles, not yet sufficient to solve them, and this generation-verification gap keeps moving the frontier of understanding forward, step by step, but potentially indefinitely.
Seems plausible. I note that
That world is bottlenecked on compute resources you can pour into training, particularly if AIs remain much less sample efficient than humans when learning new tasks.
Training up the first AI on a skill by doing the generation-verification-gap-shuffle is much more expensive than training up later AIs once you can cheaply run inference on an AI that already has the skill, and training a later AI to delegate to one specialized in this skill might be cheaper still.
This world still sees an explosion of recursively AI capabilities, but those capability gains are not localized to a single AI agent
When you go through a textbook, there are confusions you can notice but not yet immediately resolve, and these could plausibly become RLVR tasks. To choose and formulate some puzzle as an RLVR task, the AI would need to already understand the context of that puzzle, but then training on that task makes it ready to understand more. Setting priorities for learning seems like a general skill that adapts to various situations as you learn to understand them better. As with human learning, the ordering from more familiar lessons to deeper expertise would happen naturally for AI instances as they engage in active learning about their situations.
So my point is that automating just this thing might be sufficient, and the perception of its schleppiness is exactly the claim of its generalizability. You need expertise sufficient to choose and formulate the puzzles, not yet sufficient to solve them, and this generation-verification gap keeps moving the frontier of understanding forward, step by step, but potentially indefinitely.
Seems plausible. I note that
That world is bottlenecked on compute resources you can pour into training, particularly if AIs remain much less sample efficient than humans when learning new tasks.
Training up the first AI on a skill by doing the generation-verification-gap-shuffle is much more expensive than training up later AIs once you can cheaply run inference on an AI that already has the skill, and training a later AI to delegate to one specialized in this skill might be cheaper still.
This world still sees an explosion of recursively AI capabilities, but those capability gains are not localized to a single AI agent