Brendan Long comments on How useful could stolen AI model weights be without knowing the architecture and activation functions?

Brendan Long 6 Aug 2025 19:14 UTC
8 points
4
You could probably infer quite a bit from the model weight file. At a minimum you’ll get a list of tensors and their sizes, and you’ll likely get useful names for each tensor too. I’m not sure what’s typically in the metadata but you might learn a lot there too. Some files will even give you architectural details, although you might be able to guess what the layers are based on their sizes and order in the file even without that.
If the model uses a standard activation function and it’s the same for every layer, it wouldn’t take very long to just go through each of them and run some benchmarks to figure out which activation function gives you the best results. Another question is whether it would even matter if you picked correctly. Claude seems confident that having the wrong activation function would be obvious and broken, but LLMs are weirdly resilient so it might just work.
I think the number of tensors in a frontier model is large enough that you wouldn’t be able to figure it out by pure brute force though.
- Jemal Young 6 Aug 2025 19:23 UTC
  1 point
  0
  Parent
  This is a helpful answer, thank you! Thanks also for the link to the HF article on common model formats.