Agency and reflectivity are phenomena that are really broadly applicable, and I think it’s unlikely that memorizing a few facts is the way that that’ll happen. Those traits are more concentrated in places like LessWrong, but they’re almost everywhere. I think to go from “fits the vibe of internet text and absorbs some of the reasoning” to “actually creates convincing internet text,” you need more agency and reflectivity.
My impression is that “memorize more random facts and overfit” is less efficient for reducing perplexity than “learn something that generalizes,” for these sorts of generating algorithms that are everywhere. There’s a reason we see “approximate addition” instead of “memorize every addition problem” or “learn webdev” instead of “memorize every website.”
The RE-bench numbers for task time horizon just keep going up, and I expect them to continue as models continue to gain bits and pieces of the complex machinery required for operating coherently over long time horizons.
As for when we run out of data, I encourage you to look at this piece from Epoch. We run out of RL signal for R&D tasks even later than that.
Agency and reflectivity are phenomena that are really broadly applicable, and I think it’s unlikely that memorizing a few facts is the way that that’ll happen. Those traits are more concentrated in places like LessWrong, but they’re almost everywhere. I think to go from “fits the vibe of internet text and absorbs some of the reasoning” to “actually creates convincing internet text,” you need more agency and reflectivity.
My impression is that “memorize more random facts and overfit” is less efficient for reducing perplexity than “learn something that generalizes,” for these sorts of generating algorithms that are everywhere. There’s a reason we see “approximate addition” instead of “memorize every addition problem” or “learn webdev” instead of “memorize every website.”
The RE-bench numbers for task time horizon just keep going up, and I expect them to continue as models continue to gain bits and pieces of the complex machinery required for operating coherently over long time horizons.
As for when we run out of data, I encourage you to look at this piece from Epoch. We run out of RL signal for R&D tasks even later than that.