Charlie Steiner comments on Measurement, Optimization, and Take-off Speed

Charlie Steiner 12 Sep 2021 5:23 UTC
LW: 4 AF: 2
AF
I’m confused about your picture of “outer optimization power.” What sort of decisions would be informed by knowing how sensitive the learned model is to perturbations of hyperparameters?
Any thoughts on just tracking the total amount of gradient-descending done, or total amount of changes made, to measure optimization?