There’s a note on the OTN database forum that’s worth reading if you’re puzzled by some of the Statspack or AWR results in 11g. The author is talking about 220.127.116.11 and walks through a couple of inconsistent reports from Statspack – but this prompted me to take a look at some underlying details using version 18.104.22.168 – and I think there’s a bug (or change) in the way that enqueues (locks) are handled that can make both statspack and AWR reports “lose time”.
Essentially the following sequence of events is possible:
Session A starts waiting on a TM lock
after a few seconds v$system_event records a wait for “enq: TM – contention”.
AWR (or statspack) takes a snapshot (which includes v$system_event).
Some time later session A acquires the TM lock – the total_waits for this event does not get incremented in v$system_event.
AWR (or statspack) takes the next snapshot
When you run the AWR/Statspack report across the interval, the “Top N Timed Events” and the other sections on wait events will show time waited only if the total_waits for the event has changed in the interval. But we incremented the count before the first snapshot, and haven’t incremented it since – so the time we spent waiting in this interval disappears. (The time we spent waiting in the previous interval will be in the report for the previous interval, of course.)
Note – this is not the same as the inconsistencies that used to appear in statspack with statement-level errors, where the total resource usage across multiple periods would be reported against the snapshot where the statement completed. In those cases we finally saw the total resource usage even though it was summed against one interval; in this case we lose track of any time spent waiting after the first snapshot.
Of course the problem is not a total whitewash. The time may appear in other parts of the report – for example it will show up in the “DB Time” total – and the results will only be seriously deceptive for waits that are (a) long and (b) are so unluckily synchronised that an interval with a lot of wait time in it doesn’t have a wait event start in the interval and increment the total_waits. The problem may, therefore, only appear as a slight inconsistency in most systems.
Footnote: I haven’t spent much time looking at this, so I don’t know which versions are affected, nor which enqueues the new mechanism applies to (it may only be “TM” enqueue, as other waits tend to time out and restart every few seconds). This note is just an early warning to help you recognise the situation if you ever come across it.