I was on a site recently that had a standard strategy for installing Statspack on every single one of its databases – including the 10g databases. They also happened to have the licences that allowed them to run the AWR Report – which was rather nice because there are little bits of the Statspack report that don’t appear in the AWR report, and vice versa.
One little anomaly did hit me in the eye though. Here are the ‘Top five’ sections from matching Statspack and AWR reports from one of their 10.2.0.2 datbases. Spot the difference:
Statspack: Snapshot Snap Id Snap Time ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ Begin Snap: 22048 31-Oct-07 04:59:26 End Snap: 22052 31-Oct-07 05:59:27 Elapsed: 60.02 (mins) Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~ ~~~~~~ CPU time 1,738 85.4 db file sequential read 219,767 192 1 9.4 SQL*Net message from dblink 1,549 65 42 3.2 db file scattered read 40,632 12 0 .6 log file parallel write 4,537 10 2 .5 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ AWR: Snap Id Snap Time ~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ Begin Snap: 5533 31-Oct-07 05:00:22 End Snap: 5534 31-Oct-07 06:00:35 Elapsed: 60.20 (mins) DB Time: 59.37 (mins) Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~ ~~~~~~ ~~~~~~~~~~ CPU time 3,211 90.1 db file sequential read 219,977 192 1 5.4 User I/O db file scattered read 40,632 12 0 0.3 User I/O log file parallel write 4,599 10 2 0.3 System I/O db file parallel write 6,617 8 1 0.2 System I/O ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You notice that the reports are out of synch by about one minute. That’s not the reason for the disappearance of the “SQL*Net message from dblink”, by the way – that wait has been demoted (and I’m not convinced it’s right) to the “idle” events in 10g.
The most significant difference, though, is in the CPU time. There were only eight CPUs in the box, and CPU recoding is very fine-grained in 10g – so the one minute slippage isn’t enough to account for 1,500 CPU seconds difference in usage.
So where has that difference come from? The answer lies in the system statistics:
[Edited: 8th Nov 2007: No, it’s not in the system statistics – the figures reported in the next few lines were just an unlucky coincidence that pointed me in completely the wrong direction.]
AWR: Statistic Total per Second per Trans ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~~~ CPU used by this session 173,859 48.1 33.0 recursive cpu usage 157,926 43.7 29.9 ~~~~~~~ 331,785 Statspack: Statistic Total per Second per Trans ~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~~~~~ CPU used by this session 173,807 48.3 32.9 recursive cpu usage 157,926 43.9 29.9
Notice that the AWR report of CPU seems to be derived by adding “CPU used by this use” to “recursive cpu usage” – nearly.
Of course AWR and Statspack can’t both be right – and it’s a bit alarming that when you switch from one tool to the other you can get a much larger figure for CPU usage suddenly appearing in the headline stats.
The problem is this – which is “right”, Statspack or AWR ?
It could be that AWR (in it’s mmon guise) is just being a little more clever with accounting for CPU and the way it decides what is or isn’t recursive CPU time.
The results above may even be a bug that’s been fixed by 10.2.0.3 because the first couple of experiments I put together didn’t seem to behave the way of the client’s 10.2.0.2.
Whatever the result, watch out for changes to CPU reporting when you switch from Statspack to AWR. The figures may change dramatically, and you may find it hard to work out whether it’s the new figures or the old that are correct.