Since Richard Foote has started to encroach on my territory (by writing about the CBO), I’ve decided to responsed by moving briefly into Cary Millsap’s speciality (by writing about queueing). I don’t intend to get very technical, though, I just want to give an example of how queue theory relates to Oracle by answering a question I got from a client a few weeks ago:
“How can they be complaining that response times are worse, the throughput is up by 5%?”
The unfortunate answer, of course, is that the response time might be worse because the throughput is better; and all I want to do in this note is give you a carefully constructed example to show how this can happen:
- Assume you have machine with one CPU.
- Assume you have two processes that wake up periodically to do some work.
- Process 1 runs a job that uses 0.1 CPU seconds (and no other resources), produces N units of output per executions and wakes up once per second to run.
- Process 2 runs a job that uses 0.5 CPU seconds (and no other resources), produced 5N units of output per execution and wakes up once every five seconds to run.
In any ten seconds, each process (running alone on the machine) uses 10% of the CPU and produced 10N units of output, and response time matches CPU time. But what happens when both processes are started up at roughly the same time.
If you’re lucky process 2 will always start running its job shortly after process 1 has just finished a run, and finish its job shortly before process 1 starts its next run.
If you’re unlucky both processes will start a run simultaneously – and only one of them will get the CPU. At this point, a typical machine will be using time-slicing to make it appear that the two processes are actually running concurrently, so the two jobs will start switching on and off the CPU every 0.01 seconds (say). The (approximate) effect of this is that process 1 completes its job after 0.2 seconds having spent 0.1 seconds working and 0.1 seconds waiting, and process 2 completes its job after 0.6 seconds having spent 0.5 seconds working and 0.1 seconds waiting.
Response times are worse – dramatically so in the case of process 1. We have plenty of spare capacity (80%, in fact) on the CPU, but the timing of the arrival of jobs makes a big difference to the response time for each job.
In the case of my client, we had a lot more processes like process 1 running, and used some of the spare capacity on the machine to push through a lot more of the 0.1 second tasks – so our throughput went up; but as we increased the number of tasks, we increased the chances of them colliding with the 0.5 second job (and with each other) so individual response times got worse.
To move from my trivial example to a more realistic model of the world you need Queue Theory. I made my example as simple as possible with a fixed arrival rate for two tasks of fixed length arriving at regular intervals. To model the real world you need to think about tasks of variable length arriving at randomly distributed intervals – and the mathematics gets a bit harder.
But you don’t need to follow the details of the mathematics to understand the critical consequences: response times can vary significantly because of arrival time even when the machine is far from fully loaded, and response time can get worse even when (or possibly because) throughput is improving.
For more comments on the response time/throughput dilemma, see this item by Doug Burns.