Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 15 Jan 2025 14:56 UTC
3 points
0
Actually we don’t even need to train a new byte latent transformer. We can just generate patches using GPT-2 small.
1. Do the patches correspond to atomic concepts?
2. If we turn this into an embedding scheme, and train a larger LM on the patches generated as such, do we get a better LM?
3. Can’t we do this recursively to get better and better patches?