Jeffrey Liang

Karma: 32

Jeffrey Liang 23 Jun 2025 20:28 UTC
1 point
0
in reply to: Caleb Biddulph’s comment on: The Croissant Principle: A Theory of AI Generalization
Yeah I was originally envisioning this as an ML theory paper which is why it’s math-heavy and doesn’t have experiments. Tbh, as far as I understand, my paper is far more useful than most ML theory papers because it actually engages with empirical phenomena people care about and provides reasonable testable explanations.
Ha, I think some rando saying “hey I have plausible explanations for two mysterious regularities in ML via this theoretical framework but I could be wrong” is way more attention-worthy than another “I proved RH in 1 page!” or “I built ASI in my garage!”
Mmm, I know how to do “good” research. I just don’t think it’s a “good” use of my time. I honestly don’t think adding citations and a lit review will help anybody nearly as much as working on other ideas.
PS: Just because someone doesn’t flash their credentials, doesn’t mean they don’t have stellar credentials ;)

Jeffrey Liang 23 Jun 2025 20:11 UTC
1 point
0
in reply to: Caleb Biddulph’s comment on: The Croissant Principle: A Theory of AI Generalization
Oh yes I do know math lol. Yeah the summary above hits most of the main ideas if you’re not too familiar with pure math.

Jeffrey Liang 23 Jun 2025 15:26 UTC
1 point
0
in reply to: jacob_drori’s comment on: The Croissant Principle: A Theory of AI Generalization
Thanks interesting! I had not read this paper before.
Some initial thoughts:
1. Very cool and satisfying that all these scaling laws might emerge from metric space geometry (i.e. dimensionality).
2. Main differences seem to be: they tackle model scaling, their data manifold is a product of the model while our latent space is a property of the data and its generating process itself, and they provide empirical evidence.
3. They note that model scaling seems to be pretty independent of architecture. I wonder if the relevant model scaling law in most cases is more similar to our model where it’s a property of the data before being processed by the model.
4. I might get around to running empirical experiments for this, though I’m pretty busy trying out all my other ideas heh. Would definitely welcome work from others on this! The way I was thinking about testing this was to set up a synthetic regression dataset where you explicitly generate data from a latent space and see how loss scales as you increase data.

The Croissant Principle: A Theory of AI Generalization

Jeffrey Liang22 Jun 2025 17:58 UTC

20 points

6 comments2 min readLW link

Jeffrey Liang 13 Jun 2025 14:59 UTC
4 points
0
in reply to: CuriouslyNuclear’s comment on: Discontinuous Linear Functions?!
Perhaps! I’m not familiar with extended norms. But when you say “let’s put the uniform norm on $C^{1} (R)$ ” warning bells start going off in my head 😅

Jeffrey Liang 6 Jun 2025 22:49 UTC
12 points
3
on: Discontinuous Linear Functions?!
Okay I took the nerd bait and signed up for LW to say:
For your example to work you need to restrict the domain of your functions to some compact e.g. $C^{1} ([0, 1])$ because the uniform norm requires the functions to be bounded.
Also note this example works because you’re not using the “usual” topology on $C^{1} ([0, 1])$ which also includes the uniform norm of the derivative and makes the space complete. It is much more difficult if the domain is complete!