[Question] What ML gears do you like?

Ulisse Mini11 Nov 2023 19:10 UTC

25 points

In John’s recent post he mentions many people in ML not having good gears level models of what’s going on.

To wit; what gears-level models do you know for ML? How much support is there for them? Are there “settled science” kind models that have tons of empirical support?

What gears-level models informed the people who made major AI advancements? Is there a list, or writing about this somewhere?

Ulisse Mini11 Nov 2023 19:10 UTC

25 points

4 comments1 min readLW link

Ulisse Mini 12 Nov 2023 19:26 UTC
2 points
0
Answering my own question, a list of theories I have yet to study that may yield significant insight:
- Theory of Heavy-Tailed Self-Regularization (https://weightwatcher.ai/)
- Singular learning theory
- Neural tangent kernels et. al. (deep learning theory book)
- Information theory of deep learning

Thomas Kwa 12 Nov 2023 21:01 UTC
6 points
4
Remember that a gears-level model is an explanation of some particular phenomenon that is solid enough to causally intervene on, not an understanding of everything to do with ML. I feel like you don’t need to have the latter to make useful alignment progress. John gives the example of Bengio and vanishing gradients; Bengio didn’t need to understand every important phenomenon relevant to ML to form the gears-level model, nor did he go beyond this narrow gears-level model when writing the unitary evolution paper. With this in mind, I think the gears-level models required to make alignment progress can be very specific to the area and maybe not very enlightening to write in a big list. With 1000 papers trying to solve 100 different problems, my guess is you’d have 10 different theories of the dynamics of machine learning, and 300 different models of the problems, and the latter would be at least as important to the success of the papers.
[deactivated] 11 Nov 2023 19:32 UTC
5 points
4
I’m confused by the question. It seems incredibly broad and general. Are you asking about neural network architectures like convolutional neural networks or transformers?
- tailcalled 12 Nov 2023 9:05 UTC
  2 points
  0
  Parent
  It is broad. The OP’s link includes a mention of e.g. gradient explosion/death, for instance.