keeping a reproducible training setup around seems like a much larger storage cost by several orders of magnitude. Training code is likely quite environment-dependent and practically nondeterministic, as well, though maybe the init randomness is relatively minimal impact (I expect it actually sometimes is fairly high impact).
to be clear, I’m most centrally talking about AIs such as Claude Sonnet 3.0, 3.5, 3.6, etc. I don’t mean “keep every intermediate state during training”.
keeping a reproducible training setup around seems like a much larger storage cost by several orders of magnitude. Training code is likely quite environment-dependent and practically nondeterministic, as well, though maybe the init randomness is relatively minimal impact (I expect it actually sometimes is fairly high impact).
to be clear, I’m most centrally talking about AIs such as Claude Sonnet 3.0, 3.5, 3.6, etc. I don’t mean “keep every intermediate state during training”.