[Further Reading on Statspack]
A few days ago someone emailed me a Statspack report (10g format) because one of their developers was complaining that “the system” was slow, and they thought this was a little odd because the Statspack report for the period didn’t seem to show any indications that “the system” was suffering from any type of overload.
If you look at the headline figures, and throw in the fact that the machine has 4 CPUs, the following 30 minute snapshot looks like the system is pretty idle.
Load Profile Per Second Per Transaction ~~~~~~~~~~~~ --------------- --------------- Redo size: 11,495.26 11,857.65 Logical reads: 55,989.88 57,754.94 Block changes: 54.24 55.95 Physical reads: 23.23 23.96 Physical writes: 3.49 3.60 User calls: 27.22 28.08 Parses: 8.98 9.27 Hard parses: 0.22 0.22 Sorts: 2.16 2.22 Logons: 0.01 0.01 Executes: 23.50 24.24 Transactions: 0.97 Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time ----------------------------------------- ------------ ----------- ------ ------ CPU time 806 52.8 db file sequential read 31,538 312 10 20.4 read by other session 6,909 279 40 18.3 db file scattered read 950 38 40 2.5 log file parallel write 1,689 19 11 1.2 -------------------------------------------------------------
To paraphrase Cary Millsap: you can’t infer details from a summary. So what you can do in cases like this is go to the developer and say:
“You think the system is slow – give me a demonstration”
and trace what they are doing.
On the other hand, there are cases where a detail has such a massive impact that it stands out from the summary – at least to the extent that there’s a fairly good chance that when you look at the summary, you can make a pretty good guess about the critical detail.
Take a look at the following. The first bit is from the “Top SQL by CPU” section – since the largest time component was CPU, the second bit is from the “Instance Activity” section:
CPU CPU per Elapsd Old Time (s) Executions Exec (s) %Total Time (s) Buffer Gets Hash Value ---------- ------------ ---------- ------ ---------- --------------- ---------- 666.37 1 666.37 82.2 673.45 91,309,041 3893695821 Module: SQL*Plus BEGIN delete_something; END; 163.43 2,147 0.08 20.2 167.92 22,829,998 3889042855 Module: SQL*Plus DELETE FROM target_table WHERE identification_code =:B1 AND target_type='A' AND ROWNUM = 1 163.18 2,147 0.08 20.1 168.45 22,829,612 2403581060 Module: SQL*Plus DELETE FROM target_table WHERE identification_code =:B1 AND target_type='B' AND ROWNUM = 1 163.00 2,147 0.08 20.1 168.06 22,817,444 2353247669 Module: SQL*Plus DELETE FROM target_table WHERE identification_code =:B1 AND target_type='C' AND ROWNUM = 1 161.93 2,147 0.08 20.0 167.77 22,831,939 1017873455 Module: SQL*Plus DELETE FROM target_table WHERE identification_code =:B1 AND target_type='D' AND ROWNUM = 1 78.37 3,358 0.02 9.7 97.76 18,826,838 1226295128 Module: SQL*Plus DELETE FROM another_table WHERE identification_code =:B1 AND ROWNUM = 1 Statistic Total per Second per Trans --------------------------------- ------------------ -------------- ------------ physical reads 46,369 23.2 24.0 ... table scan blocks gotten 110,609,808 55,415.7 57,162.7 table scan rows gotten 8,630,217,805 4,323,756.4 4,460,060.9 table scans (long tables) 1 0.0 0.0 table scans (short tables) 13,368 6.7 6.9
Two things stand out – the 4 similar delete statements are probably called (one after the other, possibly driven by a cursor for-loop) by the delete_something procedure. (The numbers add up very convincingly).
Then look at the tablescans and physical reads: 13,368 (short) tablescans, and 110M blocks scanned. Since we have just one long tablescan and a total of only 46,000 physical reads it’s fairly safe to assume that most of those “table scan blocks gotten” are for the short tablescans – which means about 8,000 blocks per tablescan. (In fact, the “SQL by Buffer Gets” section confirmed this estimate with 10,600 gets per execution.) We could do similar arithmetic to decide that each short tablescan must be for around 640,000 rows.
So we have a total of 13,368 “short” – but painful and expensive – tablescans, and the delete_something procedure is responsible for at least 8,600 of them (4 * 2,147). Given that each execution of a delete statement deletes just one row, there’s a lot of work going on for very little return.
It’s a fair bet that when the developer says: “the system is slow”, he (or she) really means “this procedure is slow” – after all, it seems to be about the only thing that’s happening on the system.
It looks like there’s a structural problem with target_table. Possibly something as simple as a missing index on (identification_code, target_type), or perhaps the index exists but the statistics are misleading, or maybe there’s a type mismatch problem with a bind variable that is causing an implicit coercion that makes the index irrelevant.
Whatever it is, this summary seems to have highlighted a detail that is worth fixing – and could be fixed very quickly and easily.
Warning:
Despite the fact that this cursory inspection of a Statspack report shows an extremely obvious resource hog that almost certainly ought to be addressed, there were a couple of other bits of work that seemed unreasonably expensive (the last statement in the SQL extract, for example). It is possible that my analysis is correct – but irrelevant to the specific developer making the original complaint.
On the other hand, it didn’t take long to spot this problem, and it’s pretty easy to go back to the developer with an extract from the Statspack report and ask if this is the bit of code that is performing badly.
Footnote: Any advice about reading Statspack reports is almost always relevant when reading AWR reports (and vice versa).
Comments and related questions are welcome.