Mikewins comments on Theoretical predictions on the sample efficiency of training policies and activation monitors

Mikewins 11 Jan 2026 2:56 UTC
1 point
0
Does the 1/sqrt(N) error for SGD assume single-pass? It seems like if we’re bottlenecked on few data points we can use multi-pass and do nearly as well as bayesian (at least for half-spaces).
- Alek Westover 11 Jan 2026 6:53 UTC
  1 point
  0
  Parent
  Hi Mike: Training for multiple epochs on the datapoints should totally be fair game, although I wouldn’t have guessed it to be necessary. I’ll try that.
  
  Does the $1 / \sqrt{N}$ error for SGD assume single-pass
  
  I’m not quite sure how to apply the SGD bound here. The SGD bound is only about optimizing convex functions. The function we care about is:
  
  $L (v) = E_{x \sim N (0, I)} δ (sign (x . v), sign (x . w))$
  
  which is not convex.
  
  (. = dot product, delta = 1 if args are the same, 0 else).