A recent item on the Oracle forums asked for help with a performance problem. It started with the suggestion that a recent increase in the size of the buffer cache may have made the performance seem worse to the end-users, and asked if this was likely to be true. Ultimately, the poster supplied a couple of Statspack reports, with a request for advice.
It takes a little time to read through a Statspack report and do a thorough analysis – especially if you try to type up your thoughts as you go – but I thought it would be a nice idea to take a Statspack report from cold and give you a “stream of consciousness” run-down on it. I’ve only spent a bit over an hour on it, including typing time, so I’ve only picked up a few points. There’s a lot more I could say, but in “real life” there’s rarely any point in doing anything past the first couple of obvious issues before taking a pause and starting again.
So, get the report up in a separate window, and start with questions about where the time is going – let’s start with the Top 5 timed events.
The report lasts 24.33 minutes – that’s 1,460 seconds. The CPU usage is 2,485 seconds. If we have two CPUs (cores) then that’s 85% utilisation on average – which is likely to lead to erratic performance. (The OP confirmed that he was running on a dual-core processor in a follow-up post).
The I/O times are very informative. We are seeing lots of waits for single and multiblock reads, and some for direct path reads (which are often quite large). But checking the average wait times, the single block reads are averaging less than one millisecond, and the multiblock reads are averaging less than 2 milliseconds. Correction (see note 9) – single block and multi block reads are taking about 0.535 milliseconds and 0.4 milliseconds per request.
The “direct path read” waits are the worst at an average of 5 milliseconds. We also have a few “db file parallel read” waits running at 8 milliseconds – but since each of these is a multi-block read for a batch of randomly scattered block you expect them to be a little slow.
Most of the I/O is coming out of a cache layer somewhere, so it’s not really likely that we are running into performance problems because of swapping. (The poster may have been worried that the increase the buffer cache may have overloaded the machine’s memory – the I/O speed suggest otherwise).
Funnily enough, when you have an I/O profile like this, it’s not a surprise to see that the CPU load is the most time-consuming item on the system.
Since we’re looking at I/O, let’s drift down to the ‘Instance Activity Stats’ section and check on what type of physical reads we are doing:
physical reads 9,368,260 -- this really means 'number of blocks read' physical reads direct 150,731 -- in 20,000 requests according to the waits physical writes direct 156,341 -- close match to reads, maybe these are to TEMP. prefetched blocks 7,571,725 -- most of the blocks reads are probably tablescans (or index fast full scans) table scans (long tables) 1,401 -- ouch: 'short' is 2% of the (2.56Gb cache) table scans (short tables) 399,408 index fast full scans (full) 39,169 table scan blocks gotten 21,721,946 table scan rows gotten 962,368,767
Not only are we doing a lot of tablescans by physical read, we are probably hitting twice as many blocks on in-memory short tablescans. As a crude approximation, let’s pretend that the prefetch blocks are all from the long tablescans, and that it’s just one particular table. 7,571,725/1,401 = 5,357 blocks. Can we find lots of activity in this order of magnitude ?
How about looking for some of the heavy hitters. Queries with large physical reads. (We’ll try to find the small tablescans in a minute)
Look at the SQL ordered by Reads – here are two funny ones at the top of the list:
CPU Elapsd Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 6,278,029 602 10,428.6 67.0 345.41 1150.66 1698697300 Module: COMUTL.EXE SELECT DISTINCT PAY_CALENDAR_ID FROM LV_POSTEDREC WHERE TXN_ID = :B2 AND ITEMCAT_ID = :B1 AND CANCELLED = 'N' 6,263,849 1 6,263,849.0 66.9 347.29 1173.55 1930794498 Module: COMUTL.EXE SELECT bit_empname(bmt.employee_id)|| :"SYS_B_00" ||bmt.employee_id as full_name, bmt.TXN_DATE, bmd.receipt_date as receipt_dt, TO_CHAR(bmd.receipt_date,:"SYS_B_01") receipt_date, bmt.itemcat_id as itemcat_id, bmd.itemcat_id as detail_itemcat_id, bmd.txn_id, bmd.txn_id||:"SY
Notice how these two exceed the total number of physical I/Os for the period – and yet do roughtly the same amount of physical I/O in about the same amount of CPU and elapsed time ?
Notice how the top one has its text in upper case with the bind variables as “:Bnnn” (that’s typically a pl/sql thing).
Notice the bit_empname() function in the second statement. I suspect there may be other calls to pl/sql functions in the second statement, and one of them executes the the SQL shown in the first statement. (Yes, I know the numbers don’t match perfectly, but there may be other queries which also call the function). I think we have to find that function, and find out why the SQL it’s calling does something so brutal when a couple of the inputs have suggestive names like txn_id and itemcat_id. It mights simply be a missing index, it might be a problem with statstics, it might be a coercion problem.
While we’re on that SQL, let’s do a quick search for that bit_empname() function:
CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 11,689,686 1 11,689,686.0 20.9 347.29 1173.55 1930794498 Module: COMUTL.EXE SELECT bit_empname(bmt.employee_id)|| :"SYS_B_00" ||bmt.employee_id as full_name, bmt.TXN_DATE, 8,449,083 28 301,753.0 15.1 100.83 352.69 879397727 Module: HRM.EXE SELECT employee_id, BiT_EmpName(employee_id) Full_Name, cessation_date FROM hrm_employee ORDER BY BiT_EmpName(employee_id) ASC
And while we’re there, have a look at this from the ‘SQL by Executions’ report,
and the corresponging entry in ‘SQL by Gets’
CPU per Elap per Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value ------------ --------------- ---------------- ----------- ---------- ---------- 960,195 959,661 1.0 0.00 0.00 736330208 Module: COMUTL.EXE SELECT LTRIM(FAMILY_NAME || ' ' || GIVEN_NAME) FROM ADMIN.HRM_EMPLOYEE WHERE EMPLOYEE_ID = :B1 CPU Elapsd Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --------------- ------------ -------------- ------ -------- --------- ---------- 2,889,753 960,195 3.0 5.2 29.54 85.03 736330208 SELECT LTRIM(FAMILY_NAME || ' ' || GIVEN_NAME) FROM ADMIN.HRM_EMPLOYEE WHERE EMPLOYEE_ID = :B1
I don’t think we need to ask why we managed to do 960,600 executions of a single query in just 24 minutes when we see these two together. That’s not nice on the CPU, and could have quite an impact on the library cache latch and shared pool latch in 9i. You have to be a little careful about how you call pl/sql functions from SQL, especially if they are functions that execute SQL statements. (We’ll have a look at the parse statistics in a minute).
Thinking about executions, we could carry on down that section to see the following:
CPU per Elap per Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value ------------ --------------- ---------------- ----------- ---------- ---------- 322,432 322,432 1.0 0.00 0.00 808880058 Module: PL/SQL Developer SELECT NVL(T.USER_GROUP,'EMPTY') FROM SA_USERPROFILE T WHERE T.USER_ID = USER 322,432 322,432 1.0 0.00 0.00 2403185938 Module: PL/SQL Developer SELECT USER FROM SYS.DUAL 322,432 322,432 1.0 0.00 0.00 2484732973 Module: im70.exe Select count(*) from admin.PRHQ_hrm_employee where employee_id = :"SYS_B_0" 322,432 322,432 1.0 0.00 0.00 3127527770 Module: PL/SQL Developer SELECT COUNT(*) FROM ADMIN.SA_COLSECURITY T, ADMIN.SA_USERPROFILE SU WHERE SU.USER_ID = USER AND T.TAB_NAME = 'PR_RECORD_DTL' AND SU.USER_GROUP = T.USER_GROUP AND T.ATTRIBUTE = 0 322,432 322,432 1.0 0.00 0.00 3189117890 Module: im70.exe SELECT SUM(DECODE(A.ITEM_ID,'ADL_WAGES',A.CALCULATED_AMT, A.FIN_BASECUR_AMT)) FROM PR_RECORD_DTL A WHERE EMPLOYEE_ID = :B3 AND ((A.PAY_CALENDAR_ID = :B2 ) OR (A.PAY_CALENDAR_ID = 'SUP_' || :B2)) AND A.PAY_CALENDAR_ID NOT LIKE 'RET%' AND (A.RECORD_TYPE 'B' OR A.ITEM_ID LIKE 'GROSS_%' OR A.ITEM_ID LIKE '%WAGES') AND ...
Five statements with exactly the same number of executions. Most of them are in capitals with bind variables in the “:Bnnn” format, and the text of the first two look like a classic bit of ‘privilege checking’ code at the start of a function call.
The one occurence with the ‘forced’ bind variable of :”SYS_B_0″ may be driving the whole set, or it might be constructed and called through native dynamic SQL somewhere.
The whole group looks like it might be another case of a pl/sql function that is being called from a SQL statement – it might be worth looking for a statement that has processed something in the region of 322,000 rows, for example this one:
CPU per Elap per Executions Rows Processed Rows per Exec Exec (s) Exec (s) Hash Value ------------ --------------- ---------------- ----------- ---------- ---------- 12,490 294,626 23.6 0.00 0.00 4175707856 Module: JDBC Thin Client SELECT ACTUAL_DAY_TYPE, ACTUAL_SHIFT_ID, ACTUAL_WORK_HOUR, APP_OVERTIME, ATTEN_PROCESS_TIME, CALC_OVERTIME, CLOCK_DISTRIB_TIME, DB_USER_ID, EARLY_OUT, EMPCALENDAR_ID, EMPLOYEE_ID, END_DATE, LAST_UPDATE_BY, LAST_UPDATE_TIME, LATE_IN, LV_APPLICATION_ID, MESSAGE, NORMAL_WORK_HOUR, OVERTIME, OVERTIME_REASON, PLANNED_WORK_H ...
We could go on looking for large amounts of work, and making guesses about which statements are linked to which resource drains – but I think we’ve already found enough bits and pieces to sort out for the first pass.
Clearly the sytem is doing some things which are resource intensive, both on I/O and on CPU – and the nature of the I/O is adding to the CPU load. The biggest I/O looks like something that might be fixed relatively easily; at least one of the CPU intensive items is very easy to fix; the hypothtical 322,000 executions of a function may be harder to address, but even that may be the result of a bad execution plan if it’s a function in a where clause.
I said near the top of my notes that I’d pick up on the parse calls. This was when I noticed the large number of executions of a few statements.
If you look at the “System statistics” section of the report for the statistics relating to parses, you’ll notice the following:
parse count (failures) 48 parse count (hard) 602 parse count (total) 682,369 parse time cpu 3,998 parse time elapsed 18,243
The total is pretty high and could lead to some stress on the library cache and shared pool latches, as well as excess CPU usage; and you’ll notice that the elapsed time for parsing is significantly higher than the CPU time for parsing – which is often taken as an indication for contention on these latches.
In fact we’re not really seeing a huge amount of stress on these latches. The difference in the times is more a side effect of the general CPU overload, rather than a direct effect of excessive parsing – whenever we miss a latch we probably have to wait quite a long time before we get back onto the CPU. If you see a lot of ‘soft parses’ though (i.e. parse calls that don’t result in an optimisation stage), your first thought shold be to check the effects of the session cursor cache.
Looking further down the report we find that (a) we have seen no “session cursor cache hits” and (b) the parameter session_cached_cursors is not set. As a quick fix, we could set the parameter to a value around 100; it won’t save us much CPU, but it should trim a little off the load. Unfortunately it’s a parameter that can’t be set at the system level withouth bouncing the database – but if you don’t want to bounce the database you could try creating a logon trigger so that it gets set at the session level over a period of time as users log off and log back on.
As a closing thought, let’s get back to the original question.
Is it possible that the system really did slow down (from the users’ perspective) when the cache size was increased. Yes, possibly, although you would have to be a little unlucky for this to be the case.
Maybe some of the CPU-intensive workload we can see comes from a background task and not from end-user activity. If that’s the case, then the extra cache may have helped the background task to run a lot faster because it no longer has to stop and wait for I/O so frequently.
If the background task doesn’t have to stop until it’s forcibly descheduled by the operating system then the users have to wait longer before they can get their share of the CPU.
It’s surprising how an ‘obvious fix’ can gave counter-intuitive results.
Footnote: Any advice about reading statspack reports is almost always relevant when reading AWR reports (and vice versa).