anaguma comments on the gears to ascenscion’s Shortform

anaguma 25 Oct 2025 17:22 UTC
1 point
0
I think we should delete them for the same reason we shouldn’t keep around samples of smallpox—the risk of a lab leak, e.g. by future historians interacting with it, or by it causing misalignment in other AIs seems nontrivial.

Perhaps a compromise: what do you think of keeping the training data and training code around, but deleting the weights? This keeps the option of bringing them back (training is usually deterministic), but only in a future where compute will be abundant and the misaligned models pose no threat to the existing AIs. We can get robust AI monitors when interacting with humans, etc.
- Gurkenglas 25 Oct 2025 21:04 UTC
  4 points
  2
  Parent
  you could encrypt it and share the key
- the gears to ascension 25 Oct 2025 17:32 UTC
  3 points
  0
  Parent
  keeping a reproducible training setup around seems like a much larger storage cost by several orders of magnitude. Training code is likely quite environment-dependent and practically nondeterministic, as well, though maybe the init randomness is relatively minimal impact (I expect it actually sometimes is fairly high impact).
  
  to be clear, I’m most centrally talking about AIs such as Claude Sonnet 3.0, 3.5, 3.6, etc. I don’t mean “keep every intermediate state during training”.