DanielFilan comments on The other paper that killed deep learning theory

DanielFilan 30 Apr 2026 20:26 UTC
LW: 2 AF: 2
0
AF
Re: uniform convergence bounds, you say

The “high level properties of h” part is how they introduce data-dependency into their bound, in order to escape the Zhang et al. 2016 result.

I’m confused—aren’t properties of h different from properties of the data?
- LawrenceC 1 May 2026 0:04 UTC
  LW: 2 AF: 2
  0
  AF Parent
  Yeah, good question. I think the word “data-dependent” has different connotations (even if it is standard terminology).
  Using the sketch definition
  With high probability over possible training sets S, for all h in the hypothesis class, we have |expected test error of hypothesis h—empirical error of h on S| ⇐ (Some bound involving the size of the training data and high level properties of h).^[2]
  You’re right that properties of h are, in general different from properties of the data. The “data-dependent” part enters this inequality when the right hand side depends on properties of the learned hypothesis , which depend on the training data you sampled . In classical bounds, the RHS depends only on properties of the class H (VC dim, Rademacher complexity of the whole class), not on any particular h. Those give the same number for every S. Meanwhile, the spectral-norm bounds described in that section of the post will depend on the weights of the learned network (and are, as a rule, higher on memorizing solutions than generalizing ones).
  (Of course, a sufficiently nitpicky person might argue that the data-dependent bounds are uniform-convergence bounds over an implicit, S-indexed sub-class — “all h’ with ‖W’‖_spec ≤ ‖W(S)‖_spec”. But given this sub-class is S-indexed, I think it’s still fair to call the bound data-dependent.)
  I think this is a reasonable confusion, and I’ll expand the footnote to clarify.
  - DanielFilan 2 May 2026 16:00 UTC
    LW: 2 AF: 2
    0
    AF Parent
    So is that h not part of the universal quantification over h in H?
    - DanielFilan 2 May 2026 20:44 UTC
      LW: 2 AF: 2
      2
      AF Parent
      Oh I currently think the thing that’s going on is that it’s a hypothesis-dependent bound that you then apply to the hypothesis learned from the data.
      - LawrenceC 2 May 2026 21:30 UTC
        LW: 2 AF: 2
        0
        AF Parent
        Yep