So actually L1/L2 regularization does allow you to compress the model by reducing entropy, as evidenced by the fact that any effective pruning/quantization system necessarily involves some strong regularizer applied during training or after.
The model itself can’t possibly know or care whether you later actually compress said weights or not, so it’s never the actual compression itself that matters, vs the inherent compressibility (which comes from the regularization).
So actually L1/L2 regularization does allow you to compress the model by reducing entropy, as evidenced by the fact that any effective pruning/quantization system necessarily involves some strong regularizer applied during training or after.
The model itself can’t possibly know or care whether you later actually compress said weights or not, so it’s never the actual compression itself that matters, vs the inherent compressibility (which comes from the regularization).