brambleboy comments on Daniel Tan’s Shortform

brambleboy 11 Oct 2025 20:42 UTC
2 points
0
Yes, their goal is to make extremely parameter-efficient tiny models, which is quite different from the goal of making scalable large models. Tiny LMs and LLMs have evolved to have their own sets of techniques. Parameter sharing and recurrence works well for tiny models but increases compute costs a lot for large ones, for example.