The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase. But assuming you literally just get a big list of named tensors, my guess is that it could be anywhere between many person-months and millions of person-years (mostly spent re-inventing the algorithms) depending on how many architectural novelties went into the model. For context, it took a decently sized effort to actually get Gemma working properly, given a paper describing how it works and a faulty implementation.
The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase.
Thanks for this. So I guess when knowledgeable people talk about stealing a model’s weights as being equivalent to stealing the model itself, “steal the weights” is shorthand that implies also stealing the minimal *other elements you’d need to replicate the model. [Edit: changed “the minimal” to “other”]
The scenario seems unrealistic because of the thieves would likely be able to steal important parts of the codebase. But assuming you literally just get a big list of named tensors, my guess is that it could be anywhere between many person-months and millions of person-years (mostly spent re-inventing the algorithms) depending on how many architectural novelties went into the model. For context, it took a decently sized effort to actually get Gemma working properly, given a paper describing how it works and a faulty implementation.
Thanks for this. So I guess when knowledgeable people talk about stealing a model’s weights as being equivalent to stealing the model itself, “steal the weights” is shorthand that implies also stealing
the minimal*other elements you’d need to replicate the model. [Edit: changed “the minimal” to “other”]