Yeah I get that the actual parameter count isn’t, but I think the general argument that bigger pre trains remember more facts, and we can use that to try predict the model size.
Is there a reliable way to distinguish between [remembers more facts] and [infers more correct facts from remembered ones]? If there isn’t, then using remembered facts as an estimate of base model size would be even more noisy than you’d already expect.
I know I get far more questions right on exams than chance would predict when I have 0 direct knowledge/memory of the correct answer. I assume reasoning models have at least some of this kind of capability
Not a reliable source, but I’m open to the possibility (footnote 1)
Yeah I get that the actual parameter count isn’t, but I think the general argument that bigger pre trains remember more facts, and we can use that to try predict the model size.
Is there a reliable way to distinguish between [remembers more facts] and [infers more correct facts from remembered ones]? If there isn’t, then using remembered facts as an estimate of base model size would be even more noisy than you’d already expect.
I know I get far more questions right on exams than chance would predict when I have 0 direct knowledge/memory of the correct answer. I assume reasoning models have at least some of this kind of capability