Fabien Roger comments on Questions for labs

Fabien Roger 2 May 2024 14:49 UTC
2 points
0
Isn’t that only ~10x more expensive than running the forward-passes (even if you don’t do LoRA)? Or is it much more because of communications bottlenecks + the infra being taken by the next pretraining run (without the possibility to swap the model in and out).