Your key identification strategy is that the lowest open-source price ≈ marginal cost, hence ≈ underlying algorithmic efficiency. Two places I could imagine this breaking:
Latency tiers. A provider might post a single blended price that is still optimized for low-latency interactive traffic. If so, the true “slow-batch” marginal cost could be lower, making the efficiency trend steeper.
Cross-subsidy / reputation games. Smaller hosts sometimes keep prices above marginal cost to signal quality or to funnel users onto higher-margin add-ons. Has anyone tried to benchmark private FLOP/token on consumer GPUs for the same checkpoints to triangulate?
Thanks so much for the feedback! 1. I agree, I think many providers have a blended price that it is still optimized for low latency. We hope that selecting the lowest price provider among all providers of a given model mitigates this (or at least selects a consistent set of providers that make consistent choices) 2. I’d be really interested in such data. I don’t know of any such empirical results, but Erdil et al’s Inference Economics has a good theoretical model.
Your key identification strategy is that the lowest open-source price ≈ marginal cost, hence ≈ underlying algorithmic efficiency. Two places I could imagine this breaking:
Latency tiers. A provider might post a single blended price that is still optimized for low-latency interactive traffic. If so, the true “slow-batch” marginal cost could be lower, making the efficiency trend steeper.
Cross-subsidy / reputation games. Smaller hosts sometimes keep prices above marginal cost to signal quality or to funnel users onto higher-margin add-ons. Has anyone tried to benchmark private FLOP/token on consumer GPUs for the same checkpoints to triangulate?
Thanks so much for the feedback! 1. I agree, I think many providers have a blended price that it is still optimized for low latency. We hope that selecting the lowest price provider among all providers of a given model mitigates this (or at least selects a consistent set of providers that make consistent choices) 2. I’d be really interested in such data. I don’t know of any such empirical results, but Erdil et al’s Inference Economics has a good theoretical model.