jacob_cannell comments on Gradient Hacking via Schelling Goals

jacob_cannell 29 Dec 2021 20:09 UTC
3 points

Currently, meta learning research tries to find some parameter initialization that leads to quick learning on many tasks. The idea is that you can set the improvement from SGD as an optimization objective and use meta-gradients to get model parameters that are easy for the base optimizer to optimize.

Just a small nitpick—what you describe (meta learning a param init) is certainly a meta-learning technique, but that term is broader and also encompasses actually learning better optimizers (although I guess you could make the two more equivalent by burning the SGD/update circuitry into the model).