Moksh Nirvaan comments on How Fast is Algorithmic Progress in AI Inference?

Moksh Nirvaan 14 Jul 2025 3:45 UTC
2 points
0
Your key identification strategy is that the lowest open-source price ≈ marginal cost, hence ≈ underlying algorithmic efficiency. Two places I could imagine this breaking:
- Latency tiers. A provider might post a single blended price that is still optimized for low-latency interactive traffic. If so, the true “slow-batch” marginal cost could be lower, making the efficiency trend steeper.
- Cross-subsidy / reputation games. Smaller hosts sometimes keep prices above marginal cost to signal quality or to funnel users onto higher-margin add-ons. Has anyone tried to benchmark private FLOP/token on consumer GPUs for the same checkpoints to triangulate?
- Hans Gundlach 17 Jul 2025 5:35 UTC
  1 point
  0
  Parent
  Thanks so much for the feedback! 1. I agree, I think many providers have a blended price that it is still optimized for low latency. We hope that selecting the lowest price provider among all providers of a given model mitigates this (or at least selects a consistent set of providers that make consistent choices) 2. I’d be really interested in such data. I don’t know of any such empirical results, but Erdil et al’s Inference Economics has a good theoretical model.