This is another of the blog notes that have been sitting around for several years – in this case since May 2014, based on a script I wrote a year earlier. It makes an important point about “inconsistency” of timing in the way that Oracle records statistics of work done. As a consequence of being first drafted in May 2014 the original examples showed AWR results from 10.2.0.5 and 11.2.0.4 – I’ve just run the same test on 19.3.0.0 to see if anything has changed.
[Originally drafted May 2014]: I had to post this as a reminder of how easy it is to forget things – especially when there are small but significant changes between versions. It’s based loosely on a conversation from Oracle-L, but I’m going to work the issue in the opposite order by running some code and showing you the ongoing performance statistics rather than the usual AWR approach of reading the performance stats and trying to guess what happened.
The demonstration needs two sessions to run; it’s based on one session running some CPU-intensive SQL inside an anonymous PL/SQL block with a second another session launching AWR snapshots at carefully timed moments. Here’s the code for the working session:
rem rem Script: awr_timing.sql rem Author: Jonathan Lewis rem Dated: May 2013 rem alter session set "_old_connect_by_enabled"=true'; create table kill_cpu(n, primary key(n)) organization index as select rownum n from all_objects where rownum <= 26 -- > comment to avoid wordpress format issue ; execute dbms_stats.gather_table_stats(user,'kill_cpu') pause Take an AWR snapshot from another session and when it has completed press return declare m_ct number; begin select count(*) X into m_ct from kill_cpu connect by n > prior n start with n = 1 ; dbms_lock.sleep(30); end; /
You may recognise an old piece of SQL that I’ve often used as a way of stressing a CPU and seeing how fast Oracle can run. The “alter session” at the top of the code is necessary to use the pre-10g method of running a “connect by” query so that the SQL does a huge number of buffer gets (and “buffer is pinned count” visits). On my current laptop the query takes about 45 seconds (all CPU) to complete. I’ve wrapped this query inside a pl/sql block that then sleeps for 30 seconds.
From the second session you need to launch an AWR snapshot 4 times – once in the pause shown above, then (approximately) every 30 seconds thereafter. The second one should execute while the SQL statement is still running, the third one should execute while the dbms_lock.sleep(30) is taking place, and the fourth one should execute after the pl/sql block has ended and the SQL*Plus prompt is visible.
Once you’ve got 4 snapshots you can generate 3 AWR reports. The question to ask then, is “what do the reports say about CPU usage?” Here are a few (paraphrased) numbers – starting with 10.2.0.5 comparing the “Top 5 timed events”, “Time Model”, “Instance Activity” and “Top SQL by CPU” There are three sets of figures, the first reported while the SQL was still running, the second reported after the SQL statement had completed and the dbms_lock.sleep() call is executing, the last reported after the PL/SQL block has completed. There are some little oddities in the numbers due to backgorund “noise” – but the key points are still clearly visible:
While the SQL was executing
Top 5 Timed events ------------------ CPU Time 26 seconds Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 26.9 100.0 DB CPU 26.2 97.6 Instance Activity ----------------- CPU used by this session 0.65 seconds recursive cpu usage 0.67 seconds SQL ordered by CPU ------------------ 31 seconds reported for both the SQL and PLSQL
During the sleep()
Top 5 Timed events ------------------ CPU Time 19 seconds Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 19.0 100.0 DB CPU 18.6 98.1 Instance Activity ----------------- CPU used by this session 0.66 seconds recursive cpu usage 44.82 seconds SQL ordered by CPU ------------------ 14 seconds reported for both the SQL and PLSQL
After the PL/SQL block ended
Top 5 Timed events ------------------ CPU Time 1 second Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 1.4 99.9 DB CPU 1.4 99.7 Instance Activity ----------------- CPU used by this session 44.68 seconds recursive cpu usage 0.50 seconds SQL ordered by CPU ------------------ 1 second reported for the PLSQL, but the SQL was not reported
Points to notice:
While the SQL was excecuting (and had been executing for about 26 seconds, the Time Model mechanism was recording the work done by the SQL, and the Top N information echoed the Time model CPU figure. At the same time the “CPU used …” Instance Activity Statistics have not recorded any CPU time for the session – and they won’t until the SQL statement completes. Despite this, the “SQL ordered by …” reports double-count in real-time, show the SQL and the PL/SQL cursors as consuming (with rounding errors, presumably) 31 seconds each.
After the SQL execution was over and the session was sleeping the Time model (hence the Top 5) had recorded a further 19 seconds of work. The instance activity, however, has now accumulated 44 seconds of CPU, but only as “recursive CPU usage” (recursive because our SQL was called from with a PL/SQL block), with no “CPU used by this session”. The “SQL ordered by …” figures have recorded the amount of CPU used by both the SQL and PL/SQL in the interval (i.e. 14 seconds – which is a little off) recorded against both.)
After the PL/SQL block has completed the Time Model and the Top 5 report both say that nothing much happened in the interval, but the Instance Activity suddenly reports 44.68 seconds of CPU used by this session – which (roughly speaking) is truish as the PL/SQL block ended and assigned the accumulated recursive CPU usage to the session CPU figure. Finally, when we get down to the “SQL ordered by CPU” the SQL was not reported – it did no work in the interval – but the PL/SQL block was still reported but only with a generous 1 second of CPU since all it did in the interval was finish the sleep call and tidy up the stack before ending.
Now the same sets of figures for 11.2.0.4 – there’s a lot of similarity, but one significant difference:
While the SQL was executing
Top 5 Timed events ------------------ CPU Time 26.6 seconds Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 27.0 100.0 DB CPU 26.6 98.5 Instance Activity ----------------- CPU used by this session 1.09 seconds recursive cpu usage 1.07 seconds SQL ordered by CPU ------------------ 25.6 seconds reported for both the SQL and PLSQL
During the sleep()
Top 5 Timed events ------------------ CPU Time 15.1 seconds Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 15.3 99.8 DB CPU 15.1 98.2 Instance Activity ----------------- CPU used by this session 41.09 seconds recursive cpu usage 41.03 seconds SQL ordered by CPU ------------------ 14.3 seconds reported for the SQL 13.9 seconds reported for the PLSQL
After the PL/SQL block ended
Top 5 Timed events ------------------ CPU Time 1.4 seconds Time Model Time (s) % of DB Time ------------------------------------------------- ------------ sql execute elapsed time 1.5 99.6 DB CPU 1.4 95.4 Instance Activity ----------------- CPU used by this session 1.02 seconds recursive cpu usage 0.95 seconds SQL ordered by CPU ------------------ 0.5 seconds reported for the PLSQL, and no sign of the SQL
Spot the one difference in the pattern – during the dbms_lock.sleep() the Instance Activity Statistic “CPU used by this session” is recording the full CPU time for the complete query, whereas the time for the query appeared only in the “recursive cpu” in the 10.2.0.5 report.
I frequently point out that for proper understanding of the content of an AWR report you need to cross-check different ways in which Oracle reports “the same” information. This is often to warn you about checking underlying figures before jumping to conclusions about “hit ratios”, sometimes it’s to remind you that while the Top 5 Timed Events might say some average looks okay the Event Histogram may say that what you’re looking at is mostly excellent with an occasional disaster thrown in. In this blog note I just want to remind you that if you only ever look at one set of figures about CPU usage there are a few special effects (particularly relating to long running PL/SQL / Java / SQL) where you may have to work out a pattern of behaviour to explain unexpectedly large (or small) figures and contradictory figures, The key to the problem is recognising that different statistics may be updated at different stages in a complex process.
Footnote
I doubt if many people still run 11.2.0.4, so I also re-ran the test on 19.3.0.0 before publishing. The behaviour hasn’t changed since 11.2.0.4 although the query ran a little faster, perhaps due to changes in the mechanisms for this type of “connect by pump”.
11.2.0.4 stats
Name Value ---- ----- session logical reads 33,554,435 consistent gets 33,554,435 consistent gets from cache 33,554,435 consistent gets from cache (fastpath) 33,554,431 no work - consistent read gets 33,554,431 index scans kdiixs1 33,554,433 buffer is not pinned count 16,777,219
19.3.0.0 stats
Name Value ---- ----- session logical reads 16,843,299 consistent gets 16,843,299 consistent gets from cache 16,843,299 consistent gets pin 16,843,298 consistent gets pin (fastpath) 16,843,298 no work - consistent read gets 16,790,166 index range scans 33,554,433 buffer is not pinned count 16,790,169
Some changes are trivial (like the change of name for “index scans kdiixs1”) some are interesting (like some gets not being labelled as “pin” and “pin (fastpath)”), some are baffling (like how you can manage 33M index range scans while doing only 16M buffer gets!)