Also I agree with using loss/time as the measure of performance, since it’s fairly straightforward to interpret (loss recovered per unit time). If I were reviewing this, I’d look for that.
For efficiency in practice, I think most ML papers look at FLOP/s since it is hardware agnostic. Maybe a good measure of efficiency here would be loss per FLOP per second? I haven’t seen that used, but it might reflect how performance scales with computational speed.
Edit: Actually thinking about it, the test-time efficiency might be a better comparison, assuming the two scale within roughly the same complexity class. I think from a product perspective, speed for users is super (maybe the most) valuable.
Ah, that makes more sense, thanks!
Also I agree with using loss/time as the measure of performance, since it’s fairly straightforward to interpret (loss recovered per unit time). If I were reviewing this, I’d look for that.
For efficiency in practice, I think most ML papers look at FLOP/s since it is hardware agnostic. Maybe a good measure of efficiency here would be loss per FLOP per second? I haven’t seen that used, but it might reflect how performance scales with computational speed.
Edit: Actually thinking about it, the test-time efficiency might be a better comparison, assuming the two scale within roughly the same complexity class. I think from a product perspective, speed for users is super (maybe the most) valuable.