Vladimir_Nesov comments on Vladimir_Nesov’s Shortform

Vladimir_Nesov 3 Mar 2026 15:18 UTC
2 points
0
Training that would normally want to use full weights usually works almost as well with LoRA when you don’t need too many training steps. As with KV-cache, you can maintain separate LoRA weights for each request/user. It’s the next level of difficulty to get task-specific training to work over so many steps that LoRA (as opposed to full weights) starts becoming a problem. Recurrent state could also probably play this role, and doesn’t need to be nearly as large as the main model, the same as with KV-cache and LoRA weights.