I think the infrastructure changes required are pretty straightforward, at least within the reference class of changes on large ML codebases in general. like it would take me at most a few days to implement. if done in a reasonable way, it also seems very low risk for breaking other things if done right (the probe uses trivial memory and compute, so you wouldn’t expect it to substantially interact systems wise). regularly retraining the probe is also not really that bad imo.
I think the infrastructure changes required are pretty straightforward, at least within the reference class of changes on large ML codebases in general. like it would take me at most a few days to implement. if done in a reasonable way, it also seems very low risk for breaking other things if done right (the probe uses trivial memory and compute, so you wouldn’t expect it to substantially interact systems wise). regularly retraining the probe is also not really that bad imo.