I’m confused about your picture of “outer optimization power.” What sort of decisions would be informed by knowing how sensitive the learned model is to perturbations of hyperparameters?
Any thoughts on just tracking the total amount of gradient-descending done, or total amount of changes made, to measure optimization?
I’m confused about your picture of “outer optimization power.” What sort of decisions would be informed by knowing how sensitive the learned model is to perturbations of hyperparameters?
Any thoughts on just tracking the total amount of gradient-descending done, or total amount of changes made, to measure optimization?