We use Unsloth for fine-tuning 8B models because it offers support for single GPU training with high throughput. For 70B models, we use the Together AI fine-tuning API. Axolotl is used for 405B models due to its support for Fully Sharded Data Parallel (FSDP) training. Together AI does not support 405B fine-tuning, and Unsloth lacks distributed training capabilities, while the Pro version is in development.
Thank you for your details on your setup. Can you drop hints how long each training run took on these systems?
Thank you for your details on your setup. Can you drop hints how long each training run took on these systems?