Dagon comments on Throughput vs. Latency

Dagon 13 Jan 2024 19:09 UTC
4 points
0
I’ve done a fair bit of work on scaling (and EXTREMELY related but often overlooked cost-measurement and pricing) for a high-volume mid-latency message transformation and routing system for a giant cloud infrastructure provider.
I like that you mentioned hiring, as there really are only two ways to get more done: increase resources or be more efficient in handling the work. Both are generally much longer-term choices, and don’t solve a short-term problem.
In the short term, the only real option is load-shedding (dropping or much-longer-than-normal queuing). If you’re overloaded, someone is going to be disappointed, and you should have a policy about who it will be, in order to decide whether you’re dropping work evenly distributed to hurt many stakeholders a little bit, or targeted dropping to annoy only a few stakeholders by a larger amount. Or the reverse thinking: what work is most important, drop everything else. You also need a policy for intake, and whether to reject, tentatively accept, or just accept and then maybe have to shed new work.
Which gets back to the value of the work units you’re performing. How upset a stakeholder will be if their work is dropped/delayed had better be related to how much they’ll pay for you to have sufficient slack to do that work in time for it to still be useful.
And all of this is made MUCH harder by work that is variable in resources required, illegible in value, and often more time-consuming to analyze than to actually perform. At scale, this requires a somewhat complicated acceptance mechanism—submission is tentative (meaning: acceptance of request does not imply any guarantee of action), and includes a bid of how much value it has to the requestor (could be just QOS level, with a contractual rate for that bucket). Then multiple statuses of request, most commonly “complete” with no intermediate, but also “queued” to indicate a longer-than-normal delay, and “dropped”. Most notably, don’t include “in-progress” or anything—that doesn’t mean anything different than “queued”—there’s no actual guarantee until it’s done. For human-granularity systems, you STILL need the policy, though the API is just the expected granularity of Slack communication.
As you note, slack (the concept, not the product) doesn’t need to be idle. It can be easily-pre-empted work, which is fairly cheap to drop or pause. For retail establishments, cleaning the store is a very common use of slack. For software companies, low-priority bugfixing or refactors or prototyping can be slack (they can also be important work, which can get very confusing).