I’m still struggling to catch up after my holiday – it always amazes me how much time it takes to prepare for, and then recover from, a break, but if the client needs something done it has to be done – so the weeks before and after a holiday are always a little frantic.
Still, just to keep things moving on, I’ve posted on my old website a short article I wrote for the last issue UKOUG magazine on the topic of trouble-shooting – the high-level view.
Looking forward to reading the article Jonathan
Comment by Steve — September 4, 2008 @ 9:30 am UTC Sep 4,2008 |
That’s a very worthwhile article, and the distinction between resource wait time and competitive wait time is well made. I think of them as “wait time” and “inherited wait time”, the latter being the case of “I’m waiting because someone else is waiting” or “I’m waiting to wait”. Thinking further about it, it’s probably more technically accurate to think of it as “queue wait time” in the case of resources with serialised access, such as the disk read example.
I also suppose that some Oracle wait events are also clearly examples of queue wait time — “buffer busy wait” and “free buffer wait” spring immediately to mind. For others that could be either a resource wait or a queue wait, such as those relating to i/o, the v$event_histogram should help to discriminate between occurances of the two situations.
Comment by David Aldridge — September 4, 2008 @ 11:40 am UTC Sep 4,2008 |
Interesting way of looking at wait time and service time. I haven’t thought about breaking it down like this. I have tended to focus on different perspectives (points of view) and different levels of granularity. It’s probably easier to use an example.
Imagine a serial, synchronous system consisting of an application server and a database. The app server places a call to the db and waits for a response. The time waited is the residence time of the database. From the POV of the db, we can break this residence time into various service centers (e.g. cpu, disk) and wait times. As you mentioned, the call to disk also be broken into additional service centers (from the pov of the disk) and associated waits (queues). I usually think of the service time as the minimum possible residence time – the residence time at low load (zero queueing). Using that definition, the 6 msec I/O wait in your example would be labeled service time, not wait time (even though Oracle’s accounting of the time is as a wait event). I guess you are right about the fluid boundary between labeling wait and service times.
One consequence of this interpretation is that the database service time (remember that app server measure?) is not equivalent to CPU time. I have seen this definition used in some earlier papers.
Having work occur in parallel confuses the issue as the total residence time is no longer a simple sum of wait times and service times. (I don’t mean concurrent sessions, I mean different services being performed in overlapping time frames) This difficulty is even present (though usually not very important)in some 10046 trace files (I have had talks at a hotsos symposium about this, so it is a known effect, but I haven’t seen anything written). For example, say a client submits a query which returns 1000 records in 10 arrays of 100 rows each. The client asks for the first 100, gets them and then requests the next 100. The db knows the next request is coming so it does the work to gather those records before the next client request is received. The actual client wait time is the difference between request and receipt of data. The db residence time as seen from the 10046 includes work done between a client receipt and the next request (this can be measured from sql*net trace files). This usually isn’t a major problem, but it exists.
Comment by Henry Poras — September 16, 2008 @ 6:54 pm UTC Sep 16,2008 |