“If I quantize my network with however many bits, how bad is that?” I don’t know, maybe this is one of these things where if I sat down and tried to do it, I’d realize the issue, but it seems doable to me. It seems like there’s possibly something here.
I think the reason this doesn’t work (i.e. why you can only get a Pareto frontier) is that you can only lower bound the description length of the network / components, such that a direct comparison to “loss bits” doesn’t make sense
I think the reason this doesn’t work (i.e. why you can only get a Pareto frontier) is that you can only lower bound the description length of the network / components, such that a direct comparison to “loss bits” doesn’t make sense