There is another factor that is often relevant here, which is queuing theory. There’s a famous chart (scroll down here), plotting “Percent of capacity used” on the x-axis, and “Expected wait time” on the y-axis. Wait time is approximately flat until the system reaches 70-80% load, at which point the wait time soars. This is used to decide things like, “How many ticket windows do we need to open at the amusement park?” But it also matters a lot for web servers and for many batch-processing systems.
A system approaching 70% capacity may see dramatic performance degradation as a result of a small load increase. Not all systems behave like this, but many systems do meet the necessary criteria to degrade in this way.
When designing monitoring systems, I often switch the load graph’s color to yellow around 60%, and to red by 80%. If your database’s CPU or disks are hitting 70% utilization, your system may be about to fall off a cliff.
Running systems near 100% utilization with acceptable wait times often requires prioritization, fairness constraints, reserving the last bit of capacity for latency-sensitive requests, etc. This can involve years of engineering effort to tune in the hardest cases. This is one reason the cloud is popular: When you hit 70% utilization, or when your wait times start to climb, you can just spin up another server instead of doing heavy engineering to squeeze out the last 15-20% of capacity without degrading latency.
There is another factor that is often relevant here, which is queuing theory. There’s a famous chart (scroll down here), plotting “Percent of capacity used” on the x-axis, and “Expected wait time” on the y-axis. Wait time is approximately flat until the system reaches 70-80% load, at which point the wait time soars. This is used to decide things like, “How many ticket windows do we need to open at the amusement park?” But it also matters a lot for web servers and for many batch-processing systems.
A system approaching 70% capacity may see dramatic performance degradation as a result of a small load increase. Not all systems behave like this, but many systems do meet the necessary criteria to degrade in this way.
When designing monitoring systems, I often switch the load graph’s color to yellow around 60%, and to red by 80%. If your database’s CPU or disks are hitting 70% utilization, your system may be about to fall off a cliff.
Running systems near 100% utilization with acceptable wait times often requires prioritization, fairness constraints, reserving the last bit of capacity for latency-sensitive requests, etc. This can involve years of engineering effort to tune in the hardest cases. This is one reason the cloud is popular: When you hit 70% utilization, or when your wait times start to climb, you can just spin up another server instead of doing heavy engineering to squeeze out the last 15-20% of capacity without degrading latency.