Training that would normally want to use full weights usually works almost as well with LoRA when you don’t need too many training steps. As with KV-cache, you can maintain separate LoRA weights for each request/user. It’s the next level of difficulty to get task-specific training to work over so many steps that LoRA (as opposed to full weights) starts becoming a problem. Recurrent state could also probably play this role, and doesn’t need to be nearly as large as the main model, the same as with KV-cache and LoRA weights.
Training that would normally want to use full weights usually works almost as well with LoRA when you don’t need too many training steps. As with KV-cache, you can maintain separate LoRA weights for each request/user. It’s the next level of difficulty to get task-specific training to work over so many steps that LoRA (as opposed to full weights) starts becoming a problem. Recurrent state could also probably play this role, and doesn’t need to be nearly as large as the main model, the same as with KV-cache and LoRA weights.