jacob_cannell comments on How I’m thinking about GPT-N

jacob_cannell 18 Jan 2022 0:57 UTC
2 points
So actually L1/L2 regularization does allow you to compress the model by reducing entropy, as evidenced by the fact that any effective pruning/quantization system necessarily involves some strong regularizer applied during training or after.

The model itself can’t possibly know or care whether you later actually compress said weights or not, so it’s never the actual compression itself that matters, vs the inherent compressibility (which comes from the regularization).