A colleague asked me why I said to use a ratio of response time to service time of 2:1 in
Sizing to Fail. Was it just magic, or was there any science behind it? It turns out to be a range, found by observation, rather like the number of things you can keep in your mind at once: "five, plus or minus two".
If you draw a graph of both utilization and response time, you'll find that at twice the service time, you're pretty close to the point about which the response time curves upwards and the utilization levels off, as shown here.
In this particular case, the program has a service time of one tenth of a second. If I run it on a uniprocessor, the most requests per second it will ever deliver is ten, as one tenth of a second goes into one second exactly ten times.
In theory, the utilization should be a straight line from zero to 100% at 10 requests per second (TPS), then a horizontal line. This is the blue dotted line labeled "Bound" in the top diagram.
In practice, you'll never get a square corner because some requests come in while the machine is still working on a previous request. The next request has to wait (in a queue) for the processor to be free. That's why the real utilization curve starts to level off a bit below 10 TPS and then bends gently to the right until it's horizontal.
In the lower diagram, the response time should theoretically be a horizontal line at one tenth of a second, then shoot upward as the program "hits the wall", as is shown in blue once more.
At two tenths of a second, one would be just past the 10 TPS line. As with the utilization curve, the response time curve has a gentle bend instead of a square corner, and so is just a fraction before the 10 TPS line at two tenths of a second. It brackets the correct value, and so is a good engineering approximation. A better mathematician than I could probably demonstrate that twice the service time is in the center of the inflection points of the family of curves from real systems, and is therefor a good number mathematically as well. A worse one might try to use it to recommend a 70% utilization threshold, which Michael Ley disproved in Why a 70% Utilization Threshold is just ROT.
In practice, it's a good number because it's reasonably easy to hit, by fiddling with the parameters to JMeter or LoadRunner. This is because the curve is just beginning to take off skywards at this point, and a small change in load will cause a similarly small change in response time.
As for me, I just remember that a smart mathematician at Teamquest suggested 2:1, and remember it. Along with roughly four other things at any one time...