Thanks interesting! I had not read this paper before.
Some initial thoughts:
Very cool and satisfying that all these scaling laws might emerge from metric space geometry (i.e. dimensionality).
Main differences seem to be: they tackle model scaling, their data manifold is a product of the model while our latent space is a property of the data and its generating process itself, and they provide empirical evidence.
They note that model scaling seems to be pretty independent of architecture. I wonder if the relevant model scaling law in most cases is more similar to our model where it’s a property of the data before being processed by the model.
I might get around to running empirical experiments for this, though I’m pretty busy trying out all my other ideas heh. Would definitely welcome work from others on this! The way I was thinking about testing this was to set up a synthetic regression dataset where you explicitly generate data from a latent space and see how loss scales as you increase data.
Thanks interesting! I had not read this paper before.
Some initial thoughts:
Very cool and satisfying that all these scaling laws might emerge from metric space geometry (i.e. dimensionality).
Main differences seem to be: they tackle model scaling, their data manifold is a product of the model while our latent space is a property of the data and its generating process itself, and they provide empirical evidence.
They note that model scaling seems to be pretty independent of architecture. I wonder if the relevant model scaling law in most cases is more similar to our model where it’s a property of the data before being processed by the model.
I might get around to running empirical experiments for this, though I’m pretty busy trying out all my other ideas heh. Would definitely welcome work from others on this! The way I was thinking about testing this was to set up a synthetic regression dataset where you explicitly generate data from a latent space and see how loss scales as you increase data.