Analysing Statspack (9)

December 18, 2007

Analysing Statspack (9)

Filed under: Statspack,Troubleshooting — Jonathan Lewis @ 7:00 pm GMT Dec 18,2007

A recent item on the Oracle forums asked for help with a performance problem. It started with the suggestion that a recent increase in the size of the buffer cache may have made the performance seem worse to the end-users, and asked if this was likely to be true. Ultimately, the poster supplied a couple of Statspack reports, with a request for advice.

It takes a little time to read through a Statspack report and do a thorough analysis – especially if you try to type up your thoughts as you go – but I thought it would be a nice idea to take a Statspack report from cold and give you a “stream of consciousness” run-down on it. I’ve only spent a bit over an hour on it, including typing time, so I’ve only picked up a few points. There’s a lot more I could say, but in “real life” there’s rarely any point in doing anything past the first couple of obvious issues before taking a pause and starting again.

So, get the report up in a separate window, and start with questions about where the time is going – let’s start with the Top 5 timed events.

The report lasts 24.33 minutes – that’s 1,460 seconds. The CPU usage is 2,485 seconds. If we have two CPUs (cores) then that’s 85% utilisation on average – which is likely to lead to erratic performance. (The OP confirmed that he was running on a dual-core processor in a follow-up post).

The I/O times are very informative. We are seeing lots of waits for single and multiblock reads, and some for direct path reads (which are often quite large). But checking the average wait times, the single block reads are averaging less than one millisecond, and the multiblock reads are averaging less than 2 milliseconds. Correction (see note 9) – single block and multi block reads are taking about 0.535 milliseconds and 0.4 milliseconds per request.

The “direct path read” waits are the worst at an average of 5 milliseconds. We also have a few “db file parallel read” waits running at 8 milliseconds – but since each of these is a multi-block read for a batch of randomly scattered block you expect them to be a little slow.

Most of the I/O is coming out of a cache layer somewhere, so it’s not really likely that we are running into performance problems because of swapping. (The poster may have been worried that the increase the buffer cache may have overloaded the machine’s memory – the I/O speed suggest otherwise).

Funnily enough, when you have an I/O profile like this, it’s not a surprise to see that the CPU load is the most time-consuming item on the system.

Since we’re looking at I/O, let’s drift down to the ‘Instance Activity Stats’ section and check on what type of physical reads we are doing:

physical reads                9,368,260 -- this really means 'number of blocks read'
physical reads direct           150,731 -- in 20,000 requests according to the waits
physical writes direct          156,341 -- close match to reads, maybe these are to TEMP.       

prefetched blocks             7,571,725 -- most of the blocks reads are probably tablescans (or index fast full scans)       

table scans (long tables)         1,401 -- ouch:  'short' is 2% of the (2.56Gb cache)
table scans (short tables)      399,408       
index fast full scans (full)     39,169

table scan blocks gotten     21,721,946       

table scan rows gotten      962,368,767

Not only are we doing a lot of tablescans by physical read, we are probably hitting twice as many blocks on in-memory short tablescans. As a crude approximation, let’s pretend that the prefetch blocks are all from the long tablescans, and that it’s just one particular table. 7,571,725/1,401 = 5,357 blocks. Can we find lots of activity in this order of magnitude ?

How about looking for some of the heavy hitters. Queries with large physical reads. (We’ll try to find the small tablescans in a minute)

Look at the SQL ordered by Reads – here are two funny ones at the top of the list:


                                                     CPU      Elapsd
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
      6,278,029          602       10,428.6   67.0   345.41   1150.66 1698697300
Module: COMUTL.EXE
SELECT DISTINCT PAY_CALENDAR_ID FROM LV_POSTEDREC
WHERE TXN_ID = :B2 AND ITEMCAT_ID = :B1 AND CANCELLED = 'N'       

      6,263,849            1    6,263,849.0   66.9   347.29   1173.55 1930794498
Module: COMUTL.EXE
  SELECT bit_empname(bmt.employee_id)|| :"SYS_B_00" ||bmt.employee_id as full_name,
    bmt.TXN_DATE,
    bmd.receipt_date as receipt_dt,
    TO_CHAR(bmd.receipt_date,:"SYS_B_01") receipt_date,
    bmt.itemcat_id as itemcat_id,
    bmd.itemcat_id as detail_itemcat_id,
    bmd.txn_id,
    bmd.txn_id||:"SY

Notice how these two exceed the total number of physical I/Os for the period – and yet do roughly the same amount of physical I/O in about the same amount of CPU and elapsed time ?

Notice how the top one has its text in upper case with the bind variables as “:Bnnn” (that’s typically a PL/SQL thing).

Notice the bit_empname() function in the second statement. I suspect there may be other calls to PL/SQL functions in the second statement, and one of them executes the SQL shown in the first statement. (Yes, I know the numbers don’t match perfectly, but there may be other queries which also call the function). I think we have to find that function, and find out why the SQL it’s calling does something so brutal when a couple of the inputs have suggestive names like txn_id and itemcat_id. It mights simply be a missing index, it might be a problem with statistics, it might be a coercion problem.

While we’re on that SQL, let’s do a quick search for that bit_empname() function:


                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     11,689,686            1   11,689,686.0   20.9   347.29   1173.55 1930794498
Module: COMUTL.EXE
  SELECT bit_empname(bmt.employee_id)|| :"SYS_B_00" ||bmt.employee_id as full_name,
    bmt.TXN_DATE,         

      8,449,083           28      301,753.0   15.1   100.83    352.69  879397727
Module: HRM.EXE
SELECT employee_id,     BiT_EmpName(employee_id) Full_Name, cessation_date
FROM hrm_employee   ORDER BY BiT_EmpName(employee_id) ASC

And while we’re there, have a look at this from the ‘SQL by Executions’ report, and the corresponding entry in ‘SQL by Gets’

                                                CPU per    Elap per
 Executions   Rows Processed   Rows per Exec    Exec (s)   Exec (s)  Hash Value
------------ --------------- ---------------- ----------- ---------- ----------
     960,195         959,661              1.0       0.00        0.00  736330208
Module: COMUTL.EXE
SELECT LTRIM(FAMILY_NAME || ' ' || GIVEN_NAME)
FROM ADMIN.HRM_EMPLOYEE
WHERE EMPLOYEE_ID = :B1       

                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
      2,889,753      960,195            3.0    5.2    29.54     85.03  736330208
SELECT LTRIM(FAMILY_NAME || ' ' || GIVEN_NAME)
FROM ADMIN.HRM_EMPLOYEE
WHERE EMPLOYEE_ID = :B1

I don’t think we need to ask why we managed to do 960,600 executions of a single query in just 24 minutes when we see these two together. That’s not nice on the CPU, and could have quite an impact on the library cache latch and shared pool latch in 9i. You have to be a little careful about how you call PL/SQL functions from SQL, especially if they are functions that execute SQL statements. (We’ll have a look at the parse statistics in a minute).

Thinking about executions, we could carry on down that section to see the following:


                                                CPU per    Elap per
 Executions   Rows Processed   Rows per Exec    Exec (s)   Exec (s)  Hash Value
------------ --------------- ---------------- ----------- ---------- ----------
     322,432         322,432              1.0       0.00        0.00  808880058
Module: PL/SQL Developer
SELECT NVL(T.USER_GROUP,'EMPTY') FROM SA_USERPROFILE T WHERE T.USER_ID = USER       

     322,432         322,432              1.0       0.00        0.00 2403185938
Module: PL/SQL Developer
SELECT USER FROM SYS.DUAL       

     322,432         322,432              1.0       0.00        0.00 2484732973
Module: im70.exe
Select count(*) from admin.PRHQ_hrm_employee where employee_id = :"SYS_B_0"       

     322,432         322,432              1.0       0.00        0.00 3127527770
Module: PL/SQL Developer
SELECT COUNT(*) FROM ADMIN.SA_COLSECURITY T, ADMIN.SA_USERPROFILE SU
WHERE SU.USER_ID = USER AND T.TAB_NAME = 'PR_RECORD_DTL'
AND SU.USER_GROUP = T.USER_GROUP AND T.ATTRIBUTE = 0       

     322,432         322,432              1.0       0.00        0.00 3189117890
Module: im70.exe
SELECT SUM(DECODE(A.ITEM_ID,'ADL_WAGES',A.CALCULATED_AMT, A.FIN_BASECUR_AMT))
FROM PR_RECORD_DTL A WHERE EMPLOYEE_ID = :B3
AND ((A.PAY_CALENDAR_ID = :B2 ) OR (A.PAY_CALENDAR_ID = 'SUP_' || :B2))
AND A.PAY_CALENDAR_ID NOT LIKE 'RET%'
AND (A.RECORD_TYPE  'B' OR A.ITEM_ID LIKE 'GROSS_%' OR A.ITEM_ID LIKE '%WAGES')
AND ...

Five statements with exactly the same number of executions. Most of them are in capitals with bind variables in the “:Bnnn” format, and the text of the first two look like a classic bit of ‘privilege checking’ code at the start of a function call.

The one occurrence with the ‘forced’ bind variable of :”SYS_B_0″ may be driving the whole set, or it might be constructed and called through native dynamic SQL somewhere.
The whole group looks like it might be another case of a PL/SQL function that is being called from a SQL statement – it might be worth looking for a statement that has processed something in the region of 322,000 rows, for example this one:

                                                CPU per    Elap per
 Executions   Rows Processed   Rows per Exec    Exec (s)   Exec (s)  Hash Value
------------ --------------- ---------------- ----------- ---------- ----------
      12,490         294,626             23.6       0.00        0.00 4175707856
Module: JDBC Thin Client
SELECT ACTUAL_DAY_TYPE, ACTUAL_SHIFT_ID, ACTUAL_WORK_HOUR, APP_OVERTIME,
ATTEN_PROCESS_TIME, CALC_OVERTIME, CLOCK_DISTRIB_TIME, DB_USER_ID, EARLY_OUT,
EMPCALENDAR_ID, EMPLOYEE_ID, END_DATE, LAST_UPDATE_BY, LAST_UPDATE_TIME,
LATE_IN, LV_APPLICATION_ID, MESSAGE, NORMAL_WORK_HOUR, OVERTIME,
OVERTIME_REASON, PLANNED_WORK_H ...

We could go on looking for large amounts of work, and making guesses about which statements are linked to which resource drains – but I think we’ve already found enough bits and pieces to sort out for the first pass.

Clearly the system is doing some things which are resource intensive, both on I/O and on CPU – and the nature of the I/O is adding to the CPU load. The biggest I/O looks like something that might be fixed relatively easily; at least one of the CPU intensive items is very easy to fix; the hypothetical 322,000 executions of a function may be harder to address, but even that may be the result of a bad execution plan if it’s a function in a where clause.

I said near the top of my notes that I’d pick up on the parse calls. This was when I noticed the large number of executions of a few statements.

If you look at the “System statistics” section of the report for the statistics relating to parses, you’ll notice the following:

parse count (failures)                            48
parse count (hard)                               602
parse count (total)                          682,369
parse time cpu                                 3,998
parse time elapsed                            18,243

The total is pretty high and could lead to some stress on the library cache and shared pool latches, as well as excess CPU usage; and you’ll notice that the elapsed time for parsing is significantly higher than the CPU time for parsing – which is often taken as an indication for contention on these latches.

In fact we’re not really seeing a huge amount of stress on these latches. The difference in the times is more a side effect of the general CPU overload, rather than a direct effect of excessive parsing – whenever we miss a latch we probably have to wait quite a long time before we get back onto the CPU. If you see a lot of ‘soft parses’ though (i.e. parse calls that don’t result in an optimisation stage), your first thought should be to check the effects of the session cursor cache.

Looking further down the report we find that (a) we have seen no “session cursor cache hits” and (b) the parameter session_cached_cursors is not set. As a quick fix, we could set the parameter to a value around 100; it won’t save us much CPU, but it should trim a little off the load. Unfortunately it’s a parameter that can’t be set at the system level without bouncing the database – but if you don’t want to bounce the database you could try creating a logon trigger so that it gets set at the session level over a period of time as users log off and log back on.

As a closing thought, let’s get back to the original question.

Is it possible that the system really did slow down (from the users’ perspective) when the cache size was increased. Yes, possibly, although you would have to be a little unlucky for this to be the case.

Maybe some of the CPU-intensive workload we can see comes from a background task and not from end-user activity. If that’s the case, then the extra cache may have helped the background task to run a lot faster because it no longer has to stop and wait for I/O so frequently.

If the background task doesn’t have to stop until it’s forcibly descheduled by the operating system then the users have to wait longer before they can get their share of the CPU.

It’s surprising how an ‘obvious fix’ can give counter-intuitive results.

[Further Reading on Statspack]

Footnote: Any advice about reading statspack reports is almost always relevant when reading AWR reports (and vice versa).

Comments (9)

9 Comments »

Thank you for posting this “stream of consciousness”.
It really allows to learn from your deep experience.

Comment by Martin — December 20, 2007 @ 10:51 am GMT Dec 20,2007 | Reply
All good Jonathan, very instructive series.

Comment by SeánMacGC — December 20, 2007 @ 11:32 am GMT Dec 20,2007 | Reply
It’s very well explained, as always.
Thank you for your work.

Comment by Davide — December 24, 2007 @ 10:59 am GMT Dec 24,2007 | Reply
Hi Jonathan,

how simple looks when explained by you. Where are these physical reads comming from ? I have identified:
– datafiles reads( system,data,idx, undo…) ( most of phyrds)
– temp reads
– controlfile reads – few
– redologs reads – few
I would expect also for every read to have corresponding wait- so reads close to readwaits.
Next is a delta from 2 snapshots:

select name,value from v$sysstat where name =’physical reads’
union all
select ‘READS’,(select sum(phyrds) from v$filestat)+(select sum(phyrds) from v$tempstat) from dual
union all
select ‘READWAITS’,sum(total_waits) from v$system_event
where event in (‘db file sequential read’,’db file scattered read’,’db file parallel read’,’direct path read’,’direct path read (lob) ‘)

physical reads 402769 – ???
READS 263594
READWAITS 263663

Do I forget something?
Thank you.

Vasile

Comment by Vasile — February 13, 2008 @ 10:33 am GMT Feb 13,2008 | Reply
Vasile:

Physical reads = number of blocks read.
v$filestat.phyrds = number of read requests.

So it’s perfectly reasonable to see the first larger than the second.

Direct path reads can be asynchronous (and therefore not always reported with waits), so I would not have been suprised to see your read waits smaller than your reads; but yours are the other way round.

I can’t think of any obvious “real” reason for the difference, but these things are updated from different pieces of code, so I’m never too suprised to see small differences – especially since the dynamic performance views do not give you point-in-time read-consistency.

Comment by Jonathan Lewis — February 13, 2008 @ 11:55 am GMT Feb 13,2008 | Reply
[…] Anschließend den Anweisungen des Skripts folgen. Unter folgender URL ist eine Beispielanalyse eines derartigen Reports zu finden: https://jonathanlewis.wordpress.com/2007/12/18/analysing-statspack-9/ […]

Pingback by Oracle Statspack — July 9, 2008 @ 9:05 am BST Jul 9,2008 | Reply
Hi Jonathan,

I have few doubts and queries..hope you will find some time to clarify those?

–> The I/O times are very informative. We are seeing lots of waits for single and multiblock reads, and some for direct path reads (which are often quite large).
But checking the average wait times, the single block reads are averaging less than one millisecond, and the multiblock reads are averaging less than 2 milliseconds.

Where do you see average wait times and the single block reads are averaging less than one millisecond, and the multiblock reads are averaging less than 2 milliseconds.?

–>The direct path reads are the worst at an average of 5 milliseconds

how should we conclude that average of 5 milliseconds for direct path reads is bad??

–> when you have an I/O profile like this
How do we say that this is I/O profile??

Apologies..this questions may sound naive..I have started perf tuning a year ago and just learning stuffs!!

your suggestions will be helpful!

Thanks,
Kartik

Comment by kartik — July 29, 2010 @ 10:35 am BST Jul 29,2010 | Reply
- Kartik,
  
  Sometimes you just have to do the arithmetic because the answer isn’t in the report – and then sometimes you do the arithmetic wrong. Look at the Top 5:
```
db file scattered read                            882,872         472    12.58
db file sequential read                           762,365         307     8.19
```
  882,872 reads in 472 seconds (or 472,000 milliseconds) is 0.535 milliseconds per read, and 762,365 reads in 307,000 milliseconds is 0.40 milliseconds per read. I don’t know how I managed to get these wrong when I wrote the note. (Both my statements are true, of course, but not accurate – and I think I must have done one of the calculations upside down to get the 2 millisecond figure.
  
  Your other two comments are really about use of language – I said that the direct path reads were “worst”, this doesn’t mean that they were absolutely bad. But as an “absolute” guideline – a physical read request that really goes to disc is behaving reasonably if it takes about six milliseconds to complete. When I said “I/O profile” it’s not intended as a highly technical expression – I’m just identifying the fact that there’s a pattern or anomaly that probably has some meaning; in this case the pattern is the very fast I/O response time that indicates the use of local filesystem cache.
  
  Thanks for asking the question – it highlighted an error that I shouldn’t have made.
  
  Comment by Jonathan Lewis — July 30, 2010 @ 8:17 pm BST Jul 30,2010 | Reply
  - Hi Jonathan,
    
    May it be that you took the average wait times from the “Avg wait (ms)” column of the “Wait events” table just below the “Top 5 timed events” section? There they are 1 and 0 – the down-rounded of the less than 2 and 1 millisecond you talked about?
    Anyway, if this “Avg wait(ms)” column from the “Wait events” table means what I think it means – it is strange how they came up to this figures – as the number of “Waits” and “Total Wait Time” is the same as in the “Top 5 timed events” section – and by doing the arithmetic – the results are around 0.5 millisecond.
    
    What do you think?
    
    Comment by myron — November 20, 2010 @ 11:21 am GMT Nov 20,2010 | Reply