Oh yes I do know math lol. Yeah the summary above hits most of the main ideas if you’re not too familiar with pure math.
Jeffrey Liang
Thanks interesting! I had not read this paper before.
Some initial thoughts:
Very cool and satisfying that all these scaling laws might emerge from metric space geometry (i.e. dimensionality).
Main differences seem to be: they tackle model scaling, their data manifold is a product of the model while our latent space is a property of the data and its generating process itself, and they provide empirical evidence.
They note that model scaling seems to be pretty independent of architecture. I wonder if the relevant model scaling law in most cases is more similar to our model where it’s a property of the data before being processed by the model.
I might get around to running empirical experiments for this, though I’m pretty busy trying out all my other ideas heh. Would definitely welcome work from others on this! The way I was thinking about testing this was to set up a synthetic regression dataset where you explicitly generate data from a latent space and see how loss scales as you increase data.
The Croissant Principle: A Theory of AI Generalization
Perhaps! I’m not familiar with extended norms. But when you say “let’s put the uniform norm on ” warning bells start going off in my head 😅
Okay I took the nerd bait and signed up for LW to say:
For your example to work you need to restrict the domain of your functions to some compact e.g. because the uniform norm requires the functions to be bounded.
Also note this example works because you’re not using the “usual” topology on which also includes the uniform norm of the derivative and makes the space complete. It is much more difficult if the domain is complete!
Yeah I was originally envisioning this as an ML theory paper which is why it’s math-heavy and doesn’t have experiments. Tbh, as far as I understand, my paper is far more useful than most ML theory papers because it actually engages with empirical phenomena people care about and provides reasonable testable explanations.
Ha, I think some rando saying “hey I have plausible explanations for two mysterious regularities in ML via this theoretical framework but I could be wrong” is way more attention-worthy than another “I proved RH in 1 page!” or “I built ASI in my garage!”
Mmm, I know how to do “good” research. I just don’t think it’s a “good” use of my time. I honestly don’t think adding citations and a lit review will help anybody nearly as much as working on other ideas.
PS: Just because someone doesn’t flash their credentials, doesn’t mean they don’t have stellar credentials ;)