Oracle Scratchpad

October 9, 2019


Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 12:03 pm BST Oct 9,2019

Here’s a little note that’s been hanging around as a draft for more than eight years according to the OTN (as it was) posting that prompted me to start writing it. At the time there were still plenty of people using Oracle 10g. so the question didn’t seem entirely inappropriate:

On 10g R2 when I open a sqlplus session how can I know my session SID ? I’m not DBA then can not open as sysdba and query v$session.

In all fairly recent versions of Oracle, of course, we have the option to use the sys_context() function to get the SID, but this specific option didn’t appear until some time in the 10g timeline – so you might have spent years “knowing” that you could get the audsid though sys_context(‘userenv’,’sessionid’) there was no equivalent way to get the sid. Now, of course, and even in the timeline of the original posting, the simplest solution to the requirement is to execute:

select sys_context('userenv','sid') from dual;

But there are a number of alternatives – which may occasionally do a better job (and sometimes are just plain silly). It’s also worth noting that even in 19c Oracle still doesn’t have access to v$session.serial# through sys_context() and, anyway, sys_context() behaves like an unpeekable bind variable – which can be a problem.

So here’s the first of several options:

select sid from V$mystat where rownum = 1;

You’ll need SYS to grant you select on v_$mystat to use this one, of course, but v$mystat is a very convenient view giving you the session activity stats since logon for your own session – so there ought to be some mechanism that allows you to see some form of it in place anyway (ideally including the join to v$statname).

One of the oldest ways of getting access to your session ID without having access to any of the dynamic performance views was through the dbms_support package:

variable v1 varchar2(32)
execute :v1 := dbms_support.mysid
execute dbms_output.put_line(:v1)

Again you’ll need SYS to grant you extra privileges, in this case execute on the dbms_support package – worse still, the package is not installed by default. In fact (after installing it) if you call dbms_support.package_version it returns the value: “DBMS_SUPPORT Version 1.0 (17-Aug-1998) – Requires Oracle 7.2 – 8.0.5” – which gives you some idea of how old it is. It used to be useful for the start_trace_in_session() procedure it contains but that procedure has been superseded by many newer mechanisms. If you enable SQL tracing to see what’s happening under the covers when you call dbms_support.mysid you’ll see that the function actually runs the query I showed above against v$mystat .

Unlike dbms_support the dbms_session package is installed automatically with the privilege to execute granted to public,  and this gives you a function to generate a “unique session id”, . The notes in the scripts $ORACLE_HOME/rdbms/admin/dbmssess.sql that create the package say that the return value can be up to 24 bytes long, but so far the maximum I’ve seen is 12.

select dbms_session.unique_session_id from dual;

        to_number(substr(dbms_session.unique_session_id,1,4),'XXXX') sid,
        to_number(substr(dbms_session.unique_session_id,5,4),'XXXX') serial#,
        to_number(substr(dbms_session.unique_session_id,9,4),'XXXX') instance

---------- ---------- --------
       255      24472        1

As you can see, the session_unique_id can be decoded to produce three useful bits of information, and the nice thing about this call is that it gives you session serial# at the same time as the SID. It’s possible, of course, that this query is as efficient as it could be, but there’s some scope for writing a query that uses a non-mergeable in-line view to call the function once, then splits the result into three pieces.

While we’re on the session_unique_id, the dbms_pipe package also has a “unique identifier” function unique_session_name():

SQL> select dbms_pipe.unique_session_name from dual;


It doesn’t take a lot of effort to spot that the “unique session name” is the “unique session id” of dbms_session prefixed with the text “ORA$PIPE$”. It’s convenient for the dbms_pipe package to be able to generate a unique name so that one session can create a safely named pipe and tell another session about it. Anyone using pipes should take advantage of this function for its original purpose. Unlike dbms_session you’ll need to be granted the privilege to execute this package, it’s not available to public. Interestingly the script that creates dbms_pipe says that this function could return 30 bytes – since it appears to be 9 bytes prepended to the (“could be 24 bytes”) dbms_session.unique_session_id you have to wonder whether there’s something more subtle that could happen.

There may be many more mechanisms available as built-ins, but the last one I know of is in the dbms_debug_jdwp package (another package with execute privilege already granted to public and the ability to supply both the sid and serial#):

SQL> select
  2          dbms_debug_jdwp.current_session_id     sid,
  3          dbms_debug_jdwp.current_session_serial serial#
  4  from dual
  5  /

       SID    SERIAL#
---------- ----------
       255      24472

There is a reason why I’ve decided to resurrect this list of ways of getting at a session’s SID, but that’s the topic of another blog note.



October 3, 2019

Trace Files

Filed under: 12c,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 1:38 pm BST Oct 3,2019

A recent blog note by Martin Berger about reading trace files in 12.2 poped up in my twitter timeline yesterday and reminded me of a script I wrote a while ago to create a simple view I could query to read the tracefile generated by the current session while the session was still connected. You either have to create the view and a public synonym through the SYS schema, or you have to use the SYS schema to grant select privileges on several dynamic performance views to the user to allow the user to create the view in the user’s schema. For my scratch database I tend to create the view in the SYS schema.

Script to be run by SYS:

rem     Script: read_trace_122.sql
rem     Author: Jonathan Lewis
rem     Dated:  Sept 2018
rem     Last tested

create or replace view my_trace_file as
        (adr_home, trace_filename) = (
                --      substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                substr(tracefile, 1, instr(tracefile,'/',-1)-1),
                                        substr(tracefile, 1, instr(tracefile,'/',-1)),
                                ) - 2
                        substr(tracefile, instr(tracefile,'/',-1)+1) trace_filename
                where   addr = (
                                select  paddr
                                from    v$session
                                where   sid = (
                                        -- select sid from v$mystat where rownum = 1
                                        -- select dbms_support.mysid from dual

create public synonym my_trace_file for sys.my_trace_file;
grant select on my_trace_file to {some role};

Alternatively, the privileges you could grant to a user from SYS so that they could create their own view:

grant select on v_$process to some_user;
grant select on v_$session to some_user;
grant select on v_$diag_trace_file_contents to some_user;
and optionally one of:
        grant select on v_$mystat to some_user;
        grant execute on dbms_support to some_user;
                but dbms_support is no longer installed by default.

The references to package dbms_support and view v$mystat are historic ones I have lurking in various scripts from the days when the session id (SID) wasn’t available in any simpler way.

Once the view exists and is available, you can enable some sort of tracing from your session then query the view to read back the trace file. For example, here’s a simple “self-reporting” (it’s going to report the trace file that it causes) script that I’ve run from as a demo:

alter system flush shared_pool;
alter session set sql_trace true;

set linesize 180
set trimspool on
set pagesize 60

column line_number      format  999,999
column piece            format  a150    
column plan             noprint
column cursor#          noprint

break on plan skip 1 on cursor# skip 1

        line_number - row_number() over (order by line_number) plan,
        substr(payload,1,instr(payload,' id=')) cursor#,
        substr(payload, 1,150) piece
        file_name = 'xpl.c'
order by

alter session set sql_trace false;

The script flushes the shared pool to make sure that it’s going to trigger some recursive SQL then enables a simple SQL trace. The query then picks out all the lines in the trace file generated by code in the Oracle source file xpl.c (execution plans seems like a likely guess) which happens to pick out all the STAT lines in the trace (i.e. the ones showing the execution plans).

I’ve used the “tabibitosan” method to identify all the lines that belong to a single execution plan by assuming that they will be consecutive lines in the output starting from a line which includes the text ” id=1 “ (the surrounding spaces are important), but I’ve also extracted the bit of the line which includes the cursor number (STAT #nnnnnnnnnnnnnnn) because two plans may be dumped one after the other if multiple cursors close at the same time. There is still a little flaw in the script because sometimes Oracle will run a sys-recursive statement in the middle of dumping a plan to turn an object_id into an object_name, and this will cause a break in the output.

The result of the query is to extract all the execution plans in the trace file and print them in the order they appear – here’s a sample of the output:

----------- ------------------------------------------------------------------------------------------------------------------------------------------------------
         38 STAT #140392790549064 id=1 cnt=0 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=53 us cost=4 size=113 card
         39 STAT #140392790549064 id=2 cnt=0 pid=1 pos=1 obj=37 op='INDEX RANGE SCAN I_OBJ2 (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=0 card=1)'

         53 STAT #140392790535800 id=1 cnt=1 pid=0 pos=1 obj=0 op='MERGE JOIN OUTER (cr=5 pr=0 pw=0 str=1 time=95 us cost=2 size=178 card=1)'
         54 STAT #140392790535800 id=2 cnt=1 pid=1 pos=1 obj=4 op='TABLE ACCESS CLUSTER TAB$ (cr=3 pr=0 pw=0 str=1 time=57 us cost=2 size=138 card=1)'
         55 STAT #140392790535800 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'
         56 STAT #140392790535800 id=4 cnt=0 pid=1 pos=2 obj=0 op='BUFFER SORT (cr=2 pr=0 pw=0 str=1 time=29 us cost=0 size=40 card=1)'
         57 STAT #140392790535800 id=5 cnt=0 pid=4 pos=1 obj=73 op='TABLE ACCESS BY INDEX ROWID TAB_STATS$ (cr=2 pr=0 pw=0 str=1 time=10 us cost=0 size=40 card=1)
         58 STAT #140392790535800 id=6 cnt=0 pid=5 pos=1 obj=74 op='INDEX UNIQUE SCAN I_TAB_STATS$_OBJ# (cr=2 pr=0 pw=0 str=1 time=8 us cost=0 size=0 card=1)'

         84 STAT #140392791412824 id=1 cnt=1 pid=0 pos=1 obj=20 op='TABLE ACCESS BY INDEX ROWID BATCHED ICOL$ (cr=4 pr=0 pw=0 str=1 time=25 us cost=2 size=54 card
         85 STAT #140392791412824 id=2 cnt=1 pid=1 pos=1 obj=42 op='INDEX RANGE SCAN I_ICOL1 (cr=3 pr=0 pw=0 str=1 time=23 us cost=1 size=0 card=2)'

         94 STAT #140392790504512 id=1 cnt=2 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0 str=1 time=432 us cost=6 size=374 card=2)'
         95 STAT #140392790504512 id=2 cnt=2 pid=1 pos=1 obj=0 op='HASH JOIN OUTER (cr=7 pr=0 pw=0 str=1 time=375 us cost=5 size=374 card=2)'
         96 STAT #140392790504512 id=3 cnt=2 pid=2 pos=1 obj=0 op='NESTED LOOPS OUTER (cr=4 pr=0 pw=0 str=1 time=115 us cost=2 size=288 card=2)'
         97 STAT #140392790504512 id=4 cnt=2 pid=3 pos=1 obj=19 op='TABLE ACCESS CLUSTER IND$ (cr=3 pr=0 pw=0 str=1 time=100 us cost=2 size=184 card=2)'
         98 STAT #140392790504512 id=5 cnt=1 pid=4 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=85 us cost=1 size=0 card=1)'
         99 STAT #140392790504512 id=6 cnt=0 pid=3 pos=2 obj=75 op='TABLE ACCESS BY INDEX ROWID IND_STATS$ (cr=1 pr=0 pw=0 str=2 time=8 us cost=0 size=52 card=1)'
        100 STAT #140392790504512 id=7 cnt=0 pid=6 pos=1 obj=76 op='INDEX UNIQUE SCAN I_IND_STATS$_OBJ# (cr=1 pr=0 pw=0 str=2 time=7 us cost=0 size=0 card=1)'
        101 STAT #140392790504512 id=8 cnt=0 pid=2 pos=2 obj=0 op='VIEW  (cr=3 pr=0 pw=0 str=1 time=47 us cost=3 size=43 card=1)'
        102 STAT #140392790504512 id=9 cnt=0 pid=8 pos=1 obj=0 op='SORT GROUP BY (cr=3 pr=0 pw=0 str=1 time=44 us cost=3 size=15 card=1)'
        103 STAT #140392790504512 id=10 cnt=0 pid=9 pos=1 obj=31 op='TABLE ACCESS CLUSTER CDEF$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=15 card=1)'
        104 STAT #140392790504512 id=11 cnt=1 pid=10 pos=1 obj=30 op='INDEX UNIQUE SCAN I_COBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'

        116 STAT #140392791480168 id=1 cnt=4 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3 pr=0 pw=0 str=1 time=62 us cost=3 size=858 card=13)'
        117 STAT #140392791480168 id=2 cnt=4 pid=1 pos=1 obj=21 op='TABLE ACCESS CLUSTER COL$ (cr=3 pr=0 pw=0 str=1 time=24 us cost=2 size=858 card=13)'
        118 STAT #140392791480168 id=3 cnt=1 pid=2 pos=1 obj=3 op='INDEX UNIQUE SCAN I_OBJ# (cr=2 pr=0 pw=0 str=1 time=11 us cost=1 size=0 card=1)'

        126 STAT #140392789565328 id=1 cnt=1 pid=0 pos=1 obj=14 op='TABLE ACCESS CLUSTER SEG$ (cr=3 pr=0 pw=0 str=1 time=21 us cost=2 size=68 card=1)'
        127 STAT #140392789565328 id=2 cnt=1 pid=1 pos=1 obj=9 op='INDEX UNIQUE SCAN I_FILE#_BLOCK# (cr=2 pr=0 pw=0 str=1 time=12 us cost=1 size=0 card=1)'

        135 STAT #140392789722208 id=1 cnt=1 pid=0 pos=1 obj=18 op='TABLE ACCESS BY INDEX ROWID BATCHED OBJ$ (cr=3 pr=0 pw=0 str=1 time=22 us cost=3 size=51 card=
        136 STAT #140392789722208 id=2 cnt=1 pid=1 pos=1 obj=36 op='INDEX RANGE SCAN I_OBJ1 (cr=2 pr=0 pw=0 str=1 time=16 us cost=2 size=0 card=1)'

        153 STAT #140392792055264 id=1 cnt=1 pid=0 pos=1 obj=68 op='TABLE ACCESS BY INDEX ROWID HIST_HEAD$ (cr=3 pr=0 pw=0 str=1 time=25 us)'
        154 STAT #140392792055264 id=2 cnt=1 pid=1 pos=1 obj=70 op='INDEX RANGE SCAN I_HH_OBJ#_INTCOL# (cr=2 pr=0 pw=0 str=1 time=19 us)'

If you want to investigate further, the “interesting” columns in the underlying view are probably: section_name, component_name, operation_name, file_name, and function_name. The possible names of functions, files, etc. vary with the trace event you’ve enabled.


September 28, 2019

Negative Offload

Filed under: Exadata,Execution plans,HCC,Oracle,Troubleshooting — Jonathan Lewis @ 5:38 pm BST Sep 28,2019

At the Trivadis Performance Days 2019 I did a presentation on using execution plans to understand what a query was doing. One of the examples I showed was a plan from an Exadata system (using that needed to go faster. The plan was from the SQL Monitor report and all I want to show you is one line that’s reporting a tablescan. To fit the screen comfortably I’ve removed a number of columns from the output.

The report had been generated while the statement was still running (hence the “->” at the left hand edge) and the query had scanned 166 segments (with no partition elimination) of a table with 4,500 data segments (450 range partitions and 10 hash sub-partitions – note the design error, by the way, hash partitioning in Oracle should always hash for a powert of 2).

SQL Plan Monitoring Details (Plan Hash Value=3764612084)  
| Id   |           Operation            | Name  | Read  | Read  | Write | Write |   Cell   | Mem  | Activity |       Activity Detail       |  
|      |                                |       | Reqs  | Bytes | Reqs  | Bytes | Offload  |      |   (%)    |         (# samples)         |   
| -> 5 |      TABLE ACCESS STORAGE FULL | TXN   |  972K | 235GB |       |       | -203.03% |   7M |    63.43 | Cpu (1303)                  | 
|      |                                |       |       |       |       |       |          |      |          | cell smart table scan (175) | 

In the presentation I pointed out that for a “cell smart table scan” (note the Activity Detail colum) this line was using a surprisingly large amount of CPU.

We had been told that the table was using hybrid columnar compression (HCC) and had been given some figures that showed the compression factor was slightly better than 4. I had also pointed out that the typical size of a read request was 256KB. (Compare Read Reqs with Read Bytes)

To explain the excessive CPU I claimed that we were seeing “double decompression” – the cell was decompressing (uncompressing) compression units (CUs), finding that the resulting decompressed data was larger than the 1MB unit that Exadata allows and sending the original compressed CU to the database server where it was decompressed again – and the server side decompression was burning up the CPU.

This claim is (almost certainly) true – but the justification I gave for the claim was at best incomplete (though, to be brutally honest, I have to admit that I’d made a mistake): I pointed out that the Cell Offload was negative 200% and that this was what told us about the double decompression. While double decompression was probably happening the implication I had made was that a negative offload automatically indicated double decompression – and that’s was an incorrect assumption on my part. Fortunately Maurice Müller caught up with me after the session was over and pointed out the error then emailed me a link to a relevant article by Ahmed Aangour.

The Cell Offload is a measure of the difference between the volume of data read and the volume of data returned to the server. If the cell reads 256KB from disc, but the column and row selection means the cell returns 128KB the Cell Offload would be 50%; if the cell returns 64KB the Cell Offload would be 75% (100 * (1 – 64KB/256KB)). But what if you select all the rows and columns from a compressed table – the volume of data after decompression would be larger than the compressed volume the cell had read from disc – and in this case we knew that we were reading 256KB at a time and the compression factor was slightly greater than 4, so the uncompressed data would probably be around 1MB, giving us a Cell Offload of 100 * (1 – 1024KB / 256KB) = negative 300%

Key Point: Any time that decompression, combined with the row and column selection, produces more data than the volume of data read from disc the Cell Offload will go negative. A negative Cell Offload is not inherently a problem (though it might hint at a suboptimal use of compression).

Follow-up Analysis

Despite the error in my initial understanding the claim that we were seeing double decompression was still (almost certainly) true – but we need to be a little more sophisticated in the analysis. The clue is in the arithmetic a few lines further up the page. We can see that we are basically reading 256KB chunks of the table, and we know that 256KB will expand to roughly 1MB so we ought to see a Cell Offload of about -300%; but the Cell Offload is -200%. This suggests fairly strongly that on some of the reads the decompressed data is slightly less than 1MB, which allows the cell to return the decompressed data to the database server, while some of the time the decompressed data is greater than 1MB, forcing the cell to send the original (compressed) CU to the databsae server.

We may even be able work the arithmetic backwards to estimate the number of times double decompression appeared.  Assume that two-thirds of the time the cell decompressed the data and successfully sent (just less than) 1MB back to the database server and one-third of the time the cell decompressed the data and found that the result was too large and sent 256KB of compressed data back to the server, and let’s work with the 972,000 read requests reported to see what drops out of the arithmetic:

  • Total data read: 972,000 * 256KB = 243,000 MB
  • Data sent to db server:  648,000 * 1MB + 324,000 * 256KB = 729,000 MB
  • Cell Offload = 100 * (1 – 729/243) = -200%   Q.E.D.

Of course it would be nice to avoid guessing – and if we were able to check the session activity stats (v$sessstat) while the query was running (or after it had completed) we could pick up several numbers that confirmed our suspicion. For, for example, we would keep an eye on:

	cell CUs sent uncompressed
	cell CUs processed for uncompressed
	EHCC {class} CUs Decompressed

Differences between these stats allows you to work out the number of compression units that failed the 1MB test on the cell server and were sent to the database server to be decompressed. There is actually another statistic named “cell CUs sent compressed” which would make life easy for us, but I’ve not seen it populated in my tests – so maybe it doesn’t mean what it seems to say.

Here’s an example from an system that I presented a few years ago showing some sample numbers.

cell CUs sent uncompressed              5,601
cell CUs processed for uncompressed     5,601

EHCC CUs Decompressed                  17,903
EHCC Query High CUs Decompressed       12,302 

This reveals an annoying feature of 11g (continued in 12.1) that results in double counting of the statistics, confusing the issue when you’re trying to analyze what’s going on. In this case the table consisted of 12,302 compression units, and the query was engineered to cause the performance problem to appear. The first two statistics show us how many CUs were decompressed successfully (we’ll see a change appearing there in 12.1). We then see that all 12,302 of the table’s “query high” compression units were decompressed – but the “total” of all CUs decompressed was 17.903.

It’s not a coincidence that 12,302 + 5,601 = 17,903; there’s some double counting going on. I don’t know how many of the statistics are affected in this way, but Oracle has counted the CUs that passsed decompression once as they were processed at the cell server and again as they arrived at the database server. In this example we can infer that 12,302 – 5,601 = 6,701 compression units failed decompression at the cell server and were sent to the database server in compressed form to be decompressed again.

Here’s a couple of sets of figures from some similar tests run on – one with a table compressed to query high another compressed to query low. There is one critical difference from the 11g figures but the same double-counting seems to have happened. In both cases the “EHCC Query [Low|High] CUs Decompressed” show the correct number of CUs in each table. Note, though that the “cell CUs processed for uncompress” in 12.1 appear to report the number of attempted decompressions rather than 11g’s number of successful decompressions.


cell CUs sent uncompressed                     19,561	-- successful decompressions at cell server
cell CUs processed for uncompressed            19,564	=> 3 failures

EHCC CUs Decompressed                          39,125	=  2 * 19,561 successes + 3 db server decompression
EHCC Query High CUs Decompressed               19,564


cell CUs sent uncompressed                     80,037	-- successful decompressions at cell server
cell CUs processed for uncompressed            82,178	=> 2,141 failures

EHCC CUs Decompressed                         162,215	=  2 * 80,037 successes + 2,141 db server decompressions
EHCC Query Low CUs Decompressed                82,178


I’ve annotated the figures to explain the arithmetic.

There has been some significant renaming and separation of statistics in 12.2, as described in this post by Roger MacNicol, and the problems of double-counting should have disappeared. I haven’t yet tested my old models in the latest versions of Oracle, though, so can’t show you anyy figures to demonstrate the change.


There are 4 key points to note in this posting.

  • Hash (sub)partitioning should be based on powers of 2, otherwise some partitions will be twice size of others.
  • There is a 1MB limit on the “data packet” sent between the cell server and database server in Exadata.
  • If you select a large fraction of the rows and columns from an HCC compressed table you may end up decompressing a lot of your data twice if the decompressed data for a read request is larger than the 1MB unit (and the cost will be highly visible at the database server as CPU usage).
  • The Cell Offload figure for a tablescan (in particular) will go negative if the volume of data sent from the cell server to the database server is larger than the volume of data read from the disk- even if double decompression hasn’t been happening.

A little corollary to the third point: if you are writing to a staging table with the expectation of doing an unfiltered tablescan (or a select *), then you probably don’t want to use hybrid columnar compression on the table as you will probably end up using a lot of CPU at the database server to compress it, then do double-decompression using even more CPU on the database server.  It’s only if you really need to minimise disk usage and have lots of CPU capacity to spare that you have a case for using hybrid columnar compression for the table (and Oracle In-Memory features may also change the degree of desirability).


I haven’t said anything about accessing table data by index when the table is subject to HCC compression. I haven’t tested the mechanism in recent versions of Oracle but it used to be the case that the cell server would supply the whole compression unit (CU) to the database server which would decompress it to construct the relevant row. One side effect of this was that the same CU could be decompressed (with a high CPU load) many times in the course of a single query.


August 27, 2019


Filed under: AWR,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 3:59 pm BST Aug 27,2019

This is another of the blog notes that have been sitting around for several years – in this case since May 2014, based on a script I wrote a year earlier. It makes an important point about “inconsistency” of timing in the way that Oracle records statistics of work done. As a consequence of being first drafted in May 2014 the original examples showed AWR results from and – I’ve just run the same test on to see if anything has changed.


[Originally drafted May 2014]: I had to post this as a reminder of how easy it is to forget things – especially when there are small but significant changes between versions. It’s based loosely on a conversation from Oracle-L, but I’m going to work the issue in the opposite order by running some code and showing you the ongoing performance statistics rather than the usual AWR approach of reading the performance stats and trying to guess what happened.

The demonstration needs two sessions to run; it’s based on one session running some CPU-intensive SQL inside an anonymous PL/SQL block with a second another session launching AWR snapshots at carefully timed moments. Here’s the code for the working session:

rem     Script:         awr_timing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2013

alter session set "_old_connect_by_enabled"=true';

create table kill_cpu(n, primary key(n))
organization index
select  rownum n
from    all_objects
where   rownum <= 26 -- > comment to avoid wordpress format issue

execute dbms_stats.gather_table_stats(user,'kill_cpu')

pause Take an AWR snapshot from another session and when it has completed  press return

        m_ct    number;

        select  count(*) X
        into    m_ct
        from    kill_cpu
        connect by
                n > prior n
        start with
                n = 1



You may recognise an old piece of SQL that I’ve often used as a way of stressing a CPU and seeing how fast Oracle can run. The “alter session” at the top of the code is necessary to use the pre-10g method of running a “connect by” query so that the SQL does a huge number of buffer gets (and “buffer is pinned count” visits). On my current laptop the query takes about 45 seconds (all CPU) to complete. I’ve wrapped this query inside a pl/sql block that then sleeps for 30 seconds.

From the second session you need to launch an AWR snapshot 4 times – once in the pause shown above, then (approximately) every 30 seconds thereafter. The second one should execute while the SQL statement is still running, the third one should execute while the sleep(30) is taking place, and the fourth one should execute after the pl/sql block has ended and the SQL*Plus prompt is visible.

Once you’ve got 4 snapshots you can generate 3 AWR reports. The question to ask then, is “what do the reports say about CPU usage?” Here are a few (paraphrased) numbers – starting with comparing the “Top 5 timed events”, “Time Model”, and “Instance Activity” There are three sets of figures, the first reported while the SQL was still running, the second reported after the SQL statement had completed and the dbms_lock.sleep() call is executing, the last reported after the PL/SQL block has completed. There are some little oddities in the numbers due to backgorund “noise” – but the key points are still clearly visible:

While the SQL was executing

Top 5
CPU Time                       26 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     26.9        100.0
DB CPU                                       26.2         97.6

Instance Activity
CPU used by this session         0.65 seconds
recursive cpu usage              0.67 seconds

SQL ordered by CPU
31 seconds reported for both the SQL and PLSQL

During the sleep()

Top 5
CPU Time                        19 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     19.0        100.0
DB CPU                                       18.6         98.1

Instance Activity
CPU used by this session         0.66 seconds
recursive cpu usage             44.82 seconds

SQL ordered by CPU
14 seconds reported for both the SQL and PLSQL

After the PL/SQL block ended

Top 5
CPU Time                         1 second

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                      1.4         99.9
DB CPU                                        1.4         99.7

Instance Activity
CPU used by this session        44.68 seconds
recursive cpu usage              0.50 seconds

SQL ordered by CPU
1 second reported for the PLSQL, but the SQL was not reported

Points to notice:

While the SQL was excecuting (and had been executing for about 26 seconds, the Time Model mechanism was recording the work done by the SQL, and the Top N information echoed the Time model CPU figure. At the same time the “CPU used …” Instance Activity Statistics have not recorded any CPU time for the session – and they won’t until the SQL statement completes. Despite this, the “SQL ordered by …” reports double-count in real-time, show the SQL and the PL/SQL cursors as consuming (with rounding errors, presumably) 31 seconds each.

After the SQL execution was over and the session was sleeping the Time model (hence the Top 5) had recorded a further 19 seconds of work. The instance activity, however, has now accumulated 44 seconds of CPU, but only as “recursive CPU usage” (recursive because our SQL was called from with a PL/SQL block), with no “CPU used by this session”. The “SQL ordered by …” figures have recorded the amount of CPU used by both the SQL and PL/SQL in the interval (i.e. 14 seconds – which is a little off) recorded against both.)

After the PL/SQL block has completed the Time Model and the Top 5 report both say that nothing much happened in the interval, but the Instance Activity suddenly reports 44.68 seconds of CPU used by this session – which (roughly speaking) is truish as the PL/SQL block ended and assigned the accumulated recursive CPU usage to the session CPU figure. Finally, when we get down to the “SQL ordered by CPU” the SQL was not reported  – it did no work in the interval – but the PL/SQL block was still reported but only with a generous 1 second of CPU since all it did in the interval was finish the sleep call and tidy up the stack before ending.

Now the same sets of figures for – there’s a lot of similarity, but one significant difference:

While the SQL was executing

Top 5
CPU Time                        26.6 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     27.0        100.0
DB CPU                                       26.6         98.5

Instance Activity
CPU used by this session         1.09 seconds
recursive cpu usage              1.07 seconds

SQL ordered by CPU
25.6 seconds reported for both the SQL and PLSQL

During the sleep()

Top 5
CPU Time                        15.1 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                     15.3         99.8
DB CPU                                       15.1         98.2

Instance Activity
CPU used by this session        41.09 seconds
recursive cpu usage             41.03 seconds

SQL ordered by CPU
14.3 seconds reported for the SQL
13.9 seconds reported for the PLSQL

After the PL/SQL block ended

Top 5
CPU Time                         1.4 seconds

Time Model                               Time (s) % of DB Time
------------------------------------------------- ------------
sql execute elapsed time                      1.5         99.6
DB CPU                                        1.4         95.4

Instance Activity
CPU used by this session         1.02 seconds
recursive cpu usage              0.95 seconds

SQL ordered by CPU
0.5 seconds reported for the PLSQL, and no sign of the SQL

Spot the one difference in the pattern – during the sleep() the Instance Activity Statistic “CPU used by this session” is recording the full CPU time for the complete query, whereas the time for the query appeared only in the “recursive cpu” in the report.

I frequently point out that for proper understanding of the content of an AWR report you need to cross-check different ways in which Oracle reports “the same” information. This is often to warn you about checking underlying figures before jumping to conclusions about “hit ratios”, sometimes it’s to remind you that while the Top 5 might say some average looks okay the event histogram may say that what you’re looking at is mostly excellent with an occasional disaster thrown in. In this blog note I just want to remind you that if you only ever look at one set of figures about CPU usage there are a few special effects (particularly relating to long running PL/SQL / Java / SQL) where you may have to work out a pattern of behaviour to explain unexpectedly large (or small) figures and contradictory figures, The key to the problem is recognising that different statistics may be updated at different stages in a complex process.


I doubt if many people still run, so I also re-ran the test on before publishing. The behaviour hasn’t changed since although the query ran a little faster, perhaps due to changes in the mechanisms for this type of “connect by pump”. stats

Name                                            Value
----                                            -----
session logical reads                      33,554,435
consistent gets                            33,554,435
consistent gets from cache                 33,554,435
consistent gets from cache (fastpath)      33,554,431
no work - consistent read gets             33,554,431
index scans kdiixs1                        33,554,433
buffer is not pinned count                 16,777,219 stats

Name                                            Value
----                                            -----
session logical reads                      16,843,299
consistent gets                            16,843,299
consistent gets from cache                 16,843,299
consistent gets pin                        16,843,298
consistent gets pin (fastpath)             16,843,298
no work - consistent read gets             16,790,166
index range scans                          33,554,433
buffer is not pinned count                 16,790,169

Some changes are trivial (like the change of name for “index scans kdiixs1”) some are interesting (like some gets not being labelled as “pin” and “pin (fastpath)”), some are baffling (like how you can manage 33M index range scans while doing only 16M buffer gets!)

August 26, 2019


Filed under: CBO,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:19 pm BST Aug 26,2019

A recent thread on the Oracle Developer Community starts with the statement that a query is taking a very long time (with the question “how do I make it go faster?” implied rather than asked). It’s (not that that’s particularly relevant to this blog note), and we have been given a number that quantifies “very long time” (again not particularly relevant to this blog note – but worth mentioning because your “slow” might be my “wow! that was fast” and far too many people use qualitative adjectives when the important detail is quantative). The query had already been running for 15 hours – and here it is:

                        DISTINCT OWNER, TABLE_NAME 
                        FROM     DBA_LOGSTDBY_UNSUPPORTED

There are many obvious suggestions anyone could make for things to do to investigate the problem – start with the execution plan, check whether the object statistics are reasonably representative, run a trace with wait state tracing enabled to see where the time goes; but sometimes that are a couple of very simple observation you can make that point you to simple solutions.

Looking at this query we can recognise that it’s (almost certainly) about a couple of Oracle data dictionary views (which means it’s probably very messy under the covers with a horrendous execution plan) and, as I’ve commented from time to time in the past, Oracle Corp. developers create views for their own purposes so you should take great care when you re-purpose them. This query also has the very convenient feature that it looks like two simpler queries stitched together – so a very simple step in trouble-shooting, before going into any fine detail, is to unstitch the query and run the two parts separately to see how much data they return and how long they take to complete:



It’s quite possble that the worst case scenario for the total run time of the original query could be reduced to the sum of the run time of these two queries. One strategy to achieve this would be a rewrite of the form:

select  * 
from    (

Unfortunately the immediately obvious alternative may be illegal thanks to things like duplicates (which disappear in MINUS operations) or NULLs (which can make ALL the data “disappear” in some cases). In this case the original query might be capable of returning duplicates of (owner, table_name) from dba_lgstdby_not_unique which would collapse to a single ocurrence each in my rewrite – so my version of the query is not logically equivalent (unless the definition of the view enforces uniqueness); on the other hand tracking, back through the original thread to the MoS article where this query comes from, we can see that even if the query could return duplicates we don’t actually need to see them.

And this is the point of the blog note – it’s a general principle (that happens to be a very obvious strategy in this case): if a query takes too long, how does it compare with a simplified version of the query that might be a couple of steps short of the final target. If it’s easy to spot the options for simplification, and if the simplified version operates efficiently, them isolate it (using a no_merge hint if necessary), and work forwards from there. Just be careful that your rewrite remains logically equivalent to the original (if it really needs to).

In the case of this query, the two parts took 5 seconds and 9 seconds to complete, returning 209 rows and 815 rows respectively. Combining the two queries with a minus really should get the required result in no more than 14 seconds.


The “distinct” in the second query is technically redundant as the minus operation applies a sort unique operation to both the two intermediate result sets before comparing them.  Similarly the  “distinct” was also redundant when the second query was used for the “in subquery” construction – again there would be an implied uniqueness operation if the optimizer decided to do a simple unnest of the subquery.





August 13, 2019


Filed under: Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 1:34 pm BST Aug 13,2019

Here’s a problem (and I think it should be called a bug) that I first came across about 6 years ago, then forgot for a few years until it reappeared some time last year and then again a few days ago. The problem has been around for years (getting on for decades), and the first mention of it that I’ve found is MoS Bug 2891576, created in 2003, referring back to Oracle, The problem still exists in Oracle 19.2 (tested on LiveSQL).

Here’s the problem: assume you have a pair of tables (call them parent and child) with a referential integrity constraint connecting them. If the constraint is enabled and not deferred then the following code may fail, and if you’re really unlucky it may only fail on rare random occasions:

insert all
        into parent({list of parent columns}) values({list of source columns})
        into child ({list of child columns})  values({list of source columns})
        {list of columns}
from    {source}

The surprising Oracle error is “ORA-02291: integrity constraint ({owner.constraint_name}) violated – parent key not found”, and the reason is simple (and documented in MoS note 265826.1 Multi-table Insert Can Cause ORA-02291: Integrity Constraint Violated for Master-Detail tables: the order in which the insert operations take place is “indeterminate” so that child rows may be inserted before their parent rows (and for the multi-table insert the constraint checks are not postponed until the statement completes as they are, for instance, for updates to a table with a self-referencing RI constraint).

Two possible workarounds are suggested in Doc ID 265826.1

  • drop the foreign key constraint and recreate it after the load,
  • make the foreign key constraint deferrable and defer it before the insert so that it is checked only on commit (or following an explicit call to make it immediate)

The second option would probably be preferable to the first but it’s still not a very nice thing to do and could leave your database temporarily exposed to errors that are hard to clean up. There are some details of the implementation of deferrable constraints in the comments of this note on index rebuilds if you’re interested in the technicalities.

A further option which seems to work is to create a (null) “before row insert” trigger on the parent table – this appears to force the parent into a pattern of single row inserts and the table order of insertion then seems to behave. Of course you do pay the price of an increase in the volume of undo and redo. On the down-side Bug 2891576 MULTITABLE INSERT FAILS WITH ORA-02291 WHEN FK & TRIGGER ARE PRESENT can also be fouind on MoS, leading 265826.1 to suggests disabling triggers if their correctness is in some way dependent on the order in which your tables are populated. That dependency threat should be irrelevant if the trigger is a “do nothing” trigger. Sadly there’s a final note that I should mention: Bug 16133798 : INSERT ALL FAILS WITH ORA-2291 reports the issue as “Closed: not a bug”

There is a very simple example in the original bug note demonstrating the problem, but it didn’t work on the version of Oracle where I first tested it, so I’ve modified it slightly to get it working on a fairly standard install. (I suspect the original was executed on a database with a 4KB block size.)

drop table child purge;
drop table parent purge;

create table parent (id number primary key);

create table child  (id number, v1 varchar2(4000),v2 varchar2(3920));
alter table child add constraint fk1 foreign key (id) references parent (id);
create or replace trigger par_bri
before insert on parent
for each row

insert all
        into parent ( id ) values ( id )
        into child  ( id ) values ( id )
select  100 id from dual

In the model above, and using an 8KB block in ASSM, the code as is resulted in an ORA-02991 error. Changing the varchar2(3920) to varchar2(3919) the insert succeeded, and when I kept the varchar2(3920) but created the trigger the insert succeeded.

Fiddling around in various ways and taking some slightly more realistic table definitions here’s an initial setup to demonstrate the “randomness” of the failure (tested on various versions up to

rem     Script:         insert_all_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem     Last tested 

create table t1
with generator as (
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
        rownum                          id,
        lpad(rownum,10,'0')             small_vc,
        lpad(rownum,100,'0')            medium_vc,
        lpad(rownum,200,'0')            big_vc
        generator       v1

create table parent(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint par_pk primary key(id)
segment creation immediate

create table child(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint chi_pk primary key(id),
        constraint chi_fk_par foreign key (id) references parent(id)
segment creation immediate

create table child2(
        id              number,
        small_vc        varchar2(10),
        medium_vc       varchar2(100),
        big_vc          varchar2(200),
        constraint ch2_pk primary key(id),
        constraint ch2_fk_par foreign key (id) references parent(id)
segment creation immediate

I’ve created a “source” table t1, and three “target” tables – parent, child and child2. Table parent has a declared primary key and both child and child2 have a referential integrity constraint to parent. I’m going to do a multi-table insert selecting from t1 and spreading different columns across the three tables.

Historical note: When I first saw the “insert all” option of multi-table inserts I was delighted with the idea that it would let me query a de-normalised source data set just once and insert the data into a normalised set of tables in a single statement – so (a) this is a realistic test from my perspective and (b) it has come as a terrible disappointment to discover that I should have been concerned about referential integrity constraints (luckily very few systems had them at the time I last used this feature in this way).

The multi-table insert I’ve done is as follows:

insert all
        into parent(id, small_vc)  values(id, small_vc)
        into child (id, medium_vc) values(id, medium_vc)
        into child2(id, medium_vc) values(id, medium_vc)
--      into child2(id, big_vc)    values(id, big_vc)
        id, small_vc, medium_vc, big_vc
        rownum <= &m_rows_to_insert

You’ll notice that I’ve allowed user input to dictate the number of rows selected for insertion and I’ve also allowed for an edit to change the column that gets copied from t1 to child2. Althought it’s not visible in the create table statements I’ve also tested the effect of varying the size of the big_vc column in t1.

Starting with the CTAS and multi-table insert as shown the insert runs to completion if I select 75 rows from t1, but if I select 76 rows the insert fails with “ORA-02991: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”. If I change the order of the inserts into child1 and child2 the violated constraint is TEST_USER.CH2_FK_PAR – so Oracle appears to be applying the inserts in the order they appear in the statement in some circumstances.

Go back to the original order of inserts for child1 and child2, but use the big_vc option for child2 instead of the medium_vc. In this case the insert succeeds for 39 rows selected from t1, but fails reporting constraint TEST_USER.CH2_FK_PAR when selecting 40 rows. Change the CTAS and define big_vc with as lpad(rownum,195) and the insert succeeds with 40 rows selected and fails on 41 (still on the CH2_FK_PAR constraint); change big_vc to lpad(rownum,190) and the insert succeeds on 41 rows selected, fails on 42.

My hypothesis on what’s happening is this: each table in the multitable insert list gets a buffer of 8KB (maybe matching one Oracle block if we were to try different block sizes). As the statement executes the buffers will fill and, critically, when the buffer is deemed to be full (or full enough) it is applied to the table – so if a child buffer fills before the parent buffer is full you can get child rows inserted before their parent, and it looks like Oracle isn’t postponing foreign key checking to the end of statement execution as it does with other DML – it’s checking as each array is inserted.

Of course there’s a special boundary condition, and that’s why the very first test with 75 rows succeeds – neither of the child arrays gets filled before we reach the end of the t1 selection, so Oracle safely inserts the arrays for parent, child and child2 in that order. The same boundary applies occurs in the first of every other pair of tests that I’ve commented on.

When we select 76 rows from t1 in the first test the child and child2 arrays hit their limit and Oracle attempts to insert the child1 rows first – but the parent buffer is far from full so its rows are not inserted and the attempted insert results in the ORA-02991 error. Doing a bit of rough arithmetic the insert was for 76 rows totalling something like: 2 bytes for the id, plus a length byte, plus 100 bytes for the medium_vc plus a length byte, totalling 76 * 104 =7,904 bytes.

When we switch to using the big_vc for child2 the first array to fill is the child2 array, and we have 3 sets of results as we shorten big_vc:

  • 40 * ((1 + 2) + (1 + 200)) = 8160
  • 41 * ((1 + 2) + (1 + 195)) = 8159
  • 42 * ((1 + 2) + (1 + 190)) = 8148

While I’m fairly confident that my “8KB array” hypothesis is in the right ballpark I know I’ve still got some gaps to explain – I don’t like the fact that I’ve got a break point around 7,900 in the first example and something much closer to 8,192 in the other three examples.  I could try to get extra precision by running up a lot more examples with different numbers and lengths of columns to get a better idea of where the error is appearing – but I’m sufficiently confident that the idea is about right so I can’t persuade myself to make the effort to refine it. An example of an alternative algorithm (which is actually a better fit though a little unexpected) is to assume that the normal 5 byte row overhead (column count, lock byte, flags and 2-byte row directory entry) has been included in the array sizing code, and the insert takes place at the point incoming row breaks, or just touches, the limit. In this case our 4 results would suggest the following figures:

  • 75 * 109 = 8175
  • 39 * 209 = 8151
  • 40 * 204 = 8160
  • 41 * 199 = 8159

With these numbers we can see 8KB (8,192 bytes) very clearly, and appreciate that the one extra row would take us over the critical limit, hence triggering the insert and making the array space free to hold the row.

Bottom Line

If you’re using the multi-table “insert all” syntax and have referential integrity declared between the various target tables then you almost certainly need to ensure that the foreign key constraints are declared as deferrable and then deferred as the insert takes place otherwise you may get random (and, until now, surprisingly inexplicable) ORA-02991 foreign key errors.

A possible alternative workaround is to declare a “do nothing” before row insert trigger on the top-level as this seems to switch the process into single row inserts on the top-most parent that force the other array inserts to take place with their parent row using small array sizes and protecting against the foreign key error. This is not an officially sanctioned workaround, though, and may only have worked by accident in the examples I tried.

It is possible, if the 8KB working array hypothesis is correct, that you will never see the ORA-02991 if the volume of data (number of rows * row length) for the child rows of any given parent row is always less than the size of the parent row – but that might be a fairly risky thing to hope for in a production system. It might be much better to pay the overhead of deferred foreign key checking than having a rare, unpredictable error appearing.


August 11, 2019


Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 9:28 pm BST Aug 11,2019

Here’s a question to provoke a little thought if you’ve got nothing more entertaining to do on a Sunday evening.  What threats do you think of when you see a statement like the following in (say) an AWR report, or in a query against v$sql ?

update tableX set
        col001 = :1, col002 = :2, col003 = :3, ...
        -- etc. 
        -- the names are supposed to indicate that the statement updates 302 columns
        -- etc.
        col301 = :301, col302 = :302
        pk_col = :303

I’ll be writing up some suggestions tomorrow (Monday, UK BST), possible linking to a few other articles for background reading.


The first three comments have already hit the high points, but I’m going to jot down a few notes anyway.

The first two things that really (should) make an impact are:

  • There’s a table in the application with (at least) 303 columns – anything over 255 is a potential disaster area
  • An update statement that updates 302 columns is probably machine generated by a non-scalable application

A secondary detail that might be useful is recognising the pattern of the text – lower case for column names, simple “:nnn” for bind variables.  As it stands I don’t recognise the possible source for this style, but I know it’s not SQL embedded in PL/SQL (which would be all capitals with “:Bnnn” as bind variable names) and it’s not part of a distributed query from a remote database (which would be in capitals with quoted names, and table aliases like “A1, A2, …”), and it’s not “raw” Hiberbate code which produces meaningless table and column aliases based on numbers with a “t” for table and “c” for column.

So let’s think about possible problems and symptoms relating to the two obvious issues:

Wide tables

Once you have more than 255 (real) columns in a table – even if that count includes columns that have been marked unused – Oracle will have to split rows into “rowpieces” that do not exceed 255 columns and chain those pieces together. Oracle will try to be as efficient as possible – with various damage-limiting code changes appearing across versions – attempting store these row pieces together and keeping the number to a minimum, but there are a number of anomalies that can appear that have a significant impact on performance.

Simply having to visit two row pieces to pick up a column in the 2nd row piece (even if it is in the same block) adds to the cost of processing; but when you have to visit a second block to acquire a 2nd (or 3rd, or 4th) rowpiece the costs can be significant. As a quirky little extra, Oracle’s initial generation of row-pieces creates them from the end backwards – so a row with 256 columns starts with a row-piece of one column following by a rowpiece of 255 columns: so you may find that you have to fetch multiple row pieces for virtually every row you access.

It’s worth noting that a row splitting is based only on columns that have been used in the row. If your data is restricted to the first 255 column of a row then the entire row can be stored as a single row piece (following the basic rule that “trailing nulls take no space”); but as soon as you start to update such a row by populating columns past the 255 boundary Oracle will start splitting from the end – and it may create a new trailing row-piece each time you populate a column past the current “highest” column.  In an extreme case I’ve managed to show an example of a single row consisting of 746 row pieces, each in a different block (though that’s a bug/feature that’s been addressed in very recent versions of Oracle).

If rows have been split across multiple blocks then one of the nastier performance problems appears with direct path read tablescans. As Oracle follows the pointer to a secondary row piece it will do a physical read of the target block then immediately forget the target block so, for example, if you have inserted 20 (short) rows into a block then updated all of them in a way that makes them split and all their 2nd row pieces go to the same block further down the table you can find that Oracle does a single direct path read that picks up the head pieces, then 20 “db file sequential read” calls to the same block to pick up the follow-on pieces. (The same effect appears with simple migrated rows.) Contrarily, if you did the same tablescan using “db file scattered read” requests then Oracle might record a single, highly deceptive “table fetch continued row” because it managed to pin the target block and revisit it 20 times.

Often a very wide row (large number of columns) means the table is trying to hold data for multiple types of object. So a table of 750 columns may use the first 20 columns for generic data, columns 21 to 180 for data for object type A, 181 to 395 for data for object type B, and so on.  This can lead to rows with a couple of hundred used columns and several hundred null columns in the middle of each row – taking one byte per null column and burning up lots of CPU as Oracle walks a row to find a relevant column. A particularly nasty impact can appear from this type of arrangement when you upgrade an applications:  imagine you have millions of rows of the “type A” above which use only the first 180 columns.  For some reason the application adds one new “generic” column that (eventually) has to be populated for each row – as the column is populated for a type A row the row grows by 520 (null counting) bytes and splits into at least 3 pieces. The effect could be catastrophic for anyone who had been happy with their queries reporting type A data.

One of the difficulties of handling rows that are chained due to very high column counts is that the statistics can be quite confusing (and subject to change across versions). The most important clue comes from “table fetch continued row”; but this can’t tell you whether your “continued” rows are migrated or chained (or both), which table they come from, and whether you’ve been fetching the same small number multiple times or many different rows. Unfortunately the segment statistics (v$segstat / v$segment_statistics) don’t capture the number of continued fetches by segment – it would be nice if they did since it ought to be a rare (and therefore low-cost) event. The best you can do, probably, is to look at the v$sql_monitor report for queries that report tablescans against large tables but report large numbers of single block reads in the tablescan – and for repeatable cases enable SQL trace with wait tracing enabled against suspect queries to see if they show the characteristic mix of direct path reads and repeated db file sequential reads.

Update every column

The key performance threat in statements that update every column – including the ones that didn’t change – is that Oracle doesn’t compare before and after values when doing the update. Oracle’s code path assumes you know what you’re doing so it saves every “old” value to an undo record (which gets copied to the redo) and writes every “new” value to a redo change vector.  (Fortunately Oracle does check index definitions to see which index entries really have suffered changes, so it doesn’t visit index leaf blocks unnecessarily). It’s possible that some front-end tool that adopts this approach has a configuration option that switches from “one SQL statement for all update” to “construct minimal statement based on screen changes”.

The simple trade-off between these two options is the undo/redo overhead vs. parsing and optimisation overheads as the tool creates custom statements on demand. In the case of the table with more than 255 columns, of course, there’s the added benefit that an update of only the changed columns might limit the update to columns that are in the first rowpiece, eliminating the need (some of the time) to chase pointers to follow-up pieces.

Limiting the update can help with undo and redo, of course, but if the tool always pulls the entire row to the client anyway you still have network costs to consider. With the full row pulled and then updated you may find it takes several SQL*Net roundtrips to transfer the whole row between client and server.  In a quick test on a simple 1,000 column table with an update that set every column in a single row to null (using a bind variables) I found that the a default setup couldn’t even get 1,000 NULLs (let alone “real values”) across the network without resorting to one wait on “SQL*Net more data from client”

variable b1 number
exec :b1 := null;

update t1 set
        col0001 = :b1,
        col0002 = :b1,
        col1000 = :b1

Although “SQL*Net message to/from client” is typically (though not always correctly) seen as an ignorable wait, you need to be a little more suspicious of losing time to “SQL*Net more data to/from client”. The latter two waits mean you’re sending larger volumes of information across the network and maybe you can eliminate some of the data or make the transfer more efficient – perhaps a simple increase in the SDU (session data unit) in the tnsnames.ora, listener.ora, or sqlnet.ora (for default_sdu_size) might be helpful.


One of the features of trouble-shooting from cold is that you don’t know very much about the system you’re looking at – so it’s nice to be able to take a quick overview of a system looking for any potentially interesting anomalies and take a quick note of what they are and what thoughts they suggest before you start asking questions and digging into a system. This article is just a couple of brief notes along the lines of: “that doesn’t look nice- what questions does it prompt”.

August 8, 2019

Free Space

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 1:27 pm BST Aug 8,2019

Several years ago I wrote a note about reporting dba_free_space and dba_extents to produce a map of the space usage in a tablespace in anticipation of messing about with moving or rebuilding objects to try and reduce the size of the files in the tablespace.  In the related page where I published the script I pointed out that a query against dba_extents would be expensive because it makes use of structure x$ktfbue which generates the information dynamically by reading segment header blocks. I also pointed out in a footnote to the original article that if you’ve enabled the recyclebin and have “dropped” some objects then there will be some space that is reported as free but is not quite free since the extents will still be allocated. This brings me to the topic for today’s blog.

While visiting a client site recently I came across an instance that was running a regular report to monitor available space in the database. Basically this was a query against view dba_free_space. Surprisingly it was taking a rather long time to complete – and the reason for this came in two parts. First, the recyclebin was enabled and had some objects in it and secondly there were no stats on the fixed object x$ktfbue.

In the case of the client the particular query produced a plan that included the following lines:

Id  Operation             Name              Rows    Bytes  Cost (%CPU)  Time
--  --------------------- ----------------  ----   ------  -----------  --------
63  HASH JOIN                               2785     212K     46  (85)  00:00:01
64    TABLE ACCESS FULL   RECYCLEBIN$       1589    20657      7   (0)  00:00:01
65    FIXED TABLE FULL    X$KTFBUE          100K    6347K     38 (100)  00:00:01 

This is part of the view where Oracle calculates the size of all the extents of objects in the recyclebin so that they can be reported as free space. Notice that in this plan (which is dependent on version, system stats, object_stats and various optimizer parameters) the optimizer has chosen to do a hash join between the recyclebin (recyclebin$) and the x$ structure – and that has resulted in a “full tablescan” of x$ktfbue, which means Oracle reads the segment header block of every single segment in the entire database. (I don’t know where the row stats came from as there were no stats on x$ktfbue, and this plan was pulled from the AWR history tables so the query had been optimised and captured some time in the past.)

If there had been nothing in the recyclebin the hash join and two tablescans wouldn’t have mattered, unfortunately the recyclebin had been enabled and there were a few rows in recyclebin$, so the “tablescan” happened. Here’s a cut-n-paste from a much simpler query run against a fairly new (no 3rd party app) database running to give you some idea of the impact:

SQL> execute snap_events.start_snap

PL/SQL procedure successfully completed.

SQL> select count(*) from x$ktfbue;


1 row selected.

SQL> execute snap_events.end_snap
Session Events - 01-Aug 21:28:13
Event                                             Waits   Time_outs        Csec    Avg Csec    Max Csec
-----                                             -----   ---------        ----    --------    --------
Disk file operations I/O                              7           0           0        .018           1
db file sequential read                           5,239           0          14        .003           6
SQL*Net message to client                             7           0           0        .000           0
SQL*Net message from client                           7           0       1,243     177.562         572
events in waitclass Other                             3           1           0        .002           0

PL/SQL procedure successfully completed.

On my little laptop, with nothing else going on, I’ve managed to get away with “only” 5,239 single block reads, and squeezed them all into just 14 centiseconds (local SSD helps). The clients wasn’t so lucky – they were seeing tens of thousands of real physical reads.

The ideal solution, of course, was to purge the recyclebin and disable the feature – it shouldn’t be necessary to enable it on a production system – but that’s something that ought to require at least some paperwork. In the short term gathering stats on the fixed table helped because the plan changed from a hash join with “tablescan” of x$ktfbue to a nested loop with an “indexed” access path, looking more like the following (from a query against just recyclebin$ and x$ktfbue)

| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT         |                  |       |       |     4 (100)|          |
|   1 |  NESTED LOOPS            |                  |     7 |   182 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL      | RECYCLEBIN$      |     6 |    66 |     4   (0)| 00:00:01 |
|*  3 |   FIXED TABLE FIXED INDEX| X$KTFBUE (ind:1) |     1 |    15 |     0   (0)|          |

Predicate Information (identified by operation id):

This was still fairly resource-intensive for the client, but was something of an improvement – they had a lot more than 6 items in their recyclebin.

Part of the problem, of course, is that x$ktfbue is one of the objects that Oracle skips when you gather “fixed object” stats – it can be a bit expensive for exactly the reason that querying it can be expensive, all those single block segment header reads.

If you want to check the stats and gather them (as a one-off, probably) here’s some suitable SQL:

        table_name, num_rows, avg_row_len, sample_size, last_analyzed
        owner = 'SYS'
and     table_name = 'X$KTFBUE'



You probably shouldn’t have the recyclebin enabled in a production system; but if you do, and if you also run a regular report on free space (as many sites seem to do) make sure (a) you have a regular routine to minimise the number of objects that it accumulates and (b) gather statistics (occasionally) on x$ktfbue to minimise the overhead of the necessary join between recyclebin$ and x$ktfbue.

August 6, 2019

Parse Solution

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 1:14 pm BST Aug 6,2019

In the “Parse Puzzle” I posted a couple of days ago I showed a couple of extracts from an AWR report that showed contradictory results about the time the instance spent in parsing and hard parsing, and also showed an amazing factor of 4 difference between the DB Time and the “SQL ordered by Elapsed Time”. My example was modelling a real world anomaly I had come across, but was engineered to exaggerate the effect to make it easy to see what was going on.

The key feature was VPD (virtual private database) a.k.a. FGAC (find grained access control) or RLS (row-level security). I’ve created a “policy function” (the thing that generates the “security predicate”) to execute an extremely expensive SQL query; then I’ve created a policy with policy_type = ‘DYNAMIC’ so that the function gets executed every time a query against a particular table is executed. In fact my example holds three tables, and each table has its own policy function, and each policy function calls the same very expensive piece of SQL.

To see the effect this has on the AWR report I’ll now supply the contents of the “SQL ordered by Elapsed Time” and work through the list (though not in the order shown) explaining what each statement represents:

SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
           156.8              1        156.84   99.4   99.8     .0 1ubpdythth4q1
Module: SQL*Plus
select id, f_rls(n1, n2, n3) from fgac_base where rownum .le. 10                   -- edited to avoid WP  format issue

           156.8             33          4.75   99.3   99.8     .0 9dhvggqtk2mxh
Module: SQL*Plus
 select count(*) X from waste_cpu connect by n .gt. prior n start with n = 1       -- edited to avoid WP  format issue

            53.2             10          5.32   33.7   99.8     .0 5531kmrvrzcxy
Module: SQL*Plus

            53.2             11          4.83   33.7   99.8     .0 8g2uv26waqm8g
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE3"(:sn, :on); end;

            52.7             10          5.27   33.4   99.7     .0 awk070fhzd4vs
Module: SQL*Plus

            52.7             11          4.79   33.4   99.7     .0 c8pwn9j11gw5s
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE1"(:sn, :on); end;

            50.9             10          5.09   32.3   99.9     .0 964u0zv0rwpw1
Module: SQL*Plus

            50.9             11          4.63   32.3   99.9     .0 bgqf405f34u4v
Module: SQL*Plus
begin :con := "FGAC_PACK"."FGAC_PREDICATE2"(:sn, :on); end;

             2.8              1          2.79    1.8   98.4     .0 fhf8upax5cxsz
BEGIN sys.dbms_auto_report_internal.i_save_report (:rep_ref, :snap_id, :pr_class
, :rep_id, :suc); END;

             2.6              1          2.64    1.7   98.4     .0 0w26sk6t6gq98

             2.4              1          2.43    1.5   98.3     .0 1q1spprb9m55h


The first statement is an SQL statement that calls a PL/SQL function f_rls() for 10 consecutive rows in an “ordinary table”. This is the query that actuallly takes 157 seconds to complete as far as the client SQL Plus session is concerned.

The function (called 10 times) passed in three values n1, n2, n3. The function uses n1 to query table FGAC_REF1, n2 to query table FGAC_REF2, and n3 to query FGAC_REF3 – and we can see those three queries appearing as statements 5, 7, and 3 (in that order) in the output. The main query takes 157 seconds to complete because each of those statements takes approximately 52 seconds to complete 10 executions each.

But each of those three statements references a table with a policy that causes a predicate function to be executed for each parse and execute of the statement (one parse, 10 executes) and we can see 11 executions each of calls to fgac_pack.fgac_predicateN (N in 1,2,3), which take roughly 4.8 seconds each, for a total of about 52 seconds across 11 executions.

But those calls to the functions (11 each) all execute the same “connect by” query that appears as statement 2 in the list – for a total off 33 calls of a SQL statement that averages 4.75 seconds – and almost all of the “real” database time is in that 33 calls (33 * 4.75 = 156.75).

So we count 157 seconds because that’s the time spent in the “connect by” queries”, but we count that time again (but under PL/SQL execution) because it’s called from the policy functions, then count it again (under SQL execution) because the functions are called by the “select count(*) from fgac_refN” queries, then count it one final time (under SQL execution) for the driving query. This gives us a total 300% of the actual database time apparently being spent in SQL and 100% apparently being spent in PL/SQL.

You’ll notice that “real” optimisation of the SQL that does run would have taken just fractions of a second (as we saw in the Instance Activity Statistics); but one execution of the “connect by” query would have been associated with the first parse call of each of the fgac_refN queries, so 15 seconds of policy function / connect by query time would have been accounted under the parse time elapsed / hard parse time elapsed we saw in the Time Model statistics.

One of the strangest things about the reporting showed up in the ASH figures – which only ever managed to sample three SQL_IDs, reporting them as “on CPU” in every single sample, and those three SQL_IDs were for the “select count(*) from fgac_refN” queries; the “connect by” queries – the real consumer of CPU resource – didn’t get into the ASH sample at all.


I did two things to make an important anomaly very obvious – I added a CPU intensive query to the policy function so that it was easy to see where the time was really going, and I made the VPD policy_type “dynamic” so that the policy function would execute on every parse and execute against the underlying table.

In real life the typcial SQL called within a policy function is very lightweight, and policies tend to be declared with type “context_sensitive”, and this combination minimises the number of calls to the function and the cost of each call. Unfortunately there are 3rd party applications on the market that include complex PL/SQL frameworks in their policy functions and then have the application server reset the session context for every call to the database.

And that’s where I came in – looking at a system where 10% of the CPU was being spent on parsing that apparently couldn’t possibly be happening.

August 4, 2019

Parse Puzzle

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 10:46 am BST Aug 4,2019

Here are some details from an AWR report covering a few minutes in the lifetime of an instance of 18.3. It’s a carefully constructed demonstration and all I’ve done is take an AWR snapshot, execute a single SQL statement, then take another snapshot, so the only thing captured by the report is the work done in that brief time interval. The purpose of the exercise is to demonstrate how some Oracle features can make a complete nonsense of the AWR. (I have, as I often do, produced a model that reproduces an affect that can appear in production but exaggerates the effect to make it more clearly visible.)

First the Time Model statistics:

                                                                % of  % of Total
Statistic Name                                       Time (s) DB Time   CPU Time
------------------------------------------ ------------------ ------- ----------
sql execute elapsed time                                157.8   100.0
DB CPU                                                  157.5    99.8       97.3
parse time elapsed                                       13.6     8.6
hard parse elapsed time                                  13.6     8.6
PL/SQL execution elapsed time                             0.3      .2
PL/SQL compilation elapsed time                           0.0      .0
hard parse (sharing criteria) elapsed time                0.0      .0
repeated bind elapsed time                                0.0      .0
DB time                                                 157.8
background elapsed time                                   7.9
background cpu time                                       4.4                2.7
total CPU time                                          161.9

Note particularly the parse times – the Time Model show 13.6 seconds spent in (hard) parsing.

Note also that (with a small error) DB time = DB CPU = SQL execute elapsed time, and the background time is very small (in particular it’s less than the parse time). This background time, by the way, is probably related to things that Oracle does behind the scenes when you take an AWR snapshot or run an AWR report.

Given the significant amount of time spent in hard parsing let’s take a look at the Instance Activity statistics – picking only the statistics relating to parse calls:

Statistic                                     Total     per Second     per Trans
-------------------------------- ------------------ -------------- -------------
parse count (describe)                            0            0.0           0.0
parse count (failures)                            0            0.0           0.0
parse count (hard)                              325            2.1         325.0
parse count (total)                           1,662           10.5       1,662.0
parse time cpu                                   39            0.3          39.0
parse time elapsed                               42            0.3          42.0

Although the Time Model thinks Oracle has spent 13.6 seconds in (hard) parse time, the Instance Activity Statistics says it has only spent 0.42 seconds (the statistic is repored in hundredths) That a fairly significant difference of opinion. So let’s see if we can find out more from the “SQL ordered by …”, and I’m only going to show you one heading as a teaser for the rest of the weekend:

SQL ordered by Elapsed Time               DB/Inst: OR18/or18  Snaps: 2059-2060
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
   into the Total Database Time multiplied by 100
-> %Total - Elapsed Time  as a percentage of Total DB time
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for  302.6% of Total DB Time (s):             158
-> Captured PL/SQL account for  101.7% of Total DB Time (s):             158

How do you squeeze 400% of the elapsed time into SQL and PL/SQL executions? (Observation: it’s not an IBM P9 taking advantage of SMT/4)

One last set of stats – which will have some room for statistical error since they come from v$active_session_history:

        in_parse, in_hard_parse, sql_id, sql_exec_id, count(*)
        v$active_session_history  ash
        session_id = &m_session_id
and     sample_id between &m_start_sample_id and &m_end_sample_id
group by
        in_parse, in_hard_parse, sql_id, sql_exec_id
order by

- - ------------- ----------- ----------
Y Y 964u0zv0rwpw1		       3
Y Y awk070fhzd4vs		       4
Y Y 5531kmrvrzcxy		       5

N N 5531kmrvrzcxy		      42
N N 964u0zv0rwpw1		      42
N N awk070fhzd4vs		      51

6 rows selected.

So the ASH information seems to agree (reasonably closely) with the Time Model statistics – in the interval of the snapshot it’s noted 12 samples (which statistically represents 12 seconds) of hard parse time (and though my query doesn’t show it, the “event” is  null, i.e. “on CPU”).


To be continued, some time later this week …

June 14, 2019


Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 9:19 am BST Jun 14,2019

Here’s an answer I’ve just offered on the ODC database forum to a fairly generic type of problem.

The question was about finding out why a “program” that used to take only 10 minutes to complete is currently taking significantly longer. The system is running Standard Edition, and the program runs once per day. There’s some emphasis on the desirability of taking action while the program is still running with the following as the most recent statement of the requirements:

We have a program which run daily 10minutes and suddenly one day,it is running for more than 10minutes…in this case,we are asked to look into the backend session to check what exactly the session is doing.I understand we have to check the events,last sql etc..but we need to get the work done by that session in terms of buffergets or physical reads(in case of standard edition)

1)In this case,we cannot enable trace to the session as it already started running and 10 minutes passed way.If we enable trace at this point,will it gives the required info?

2)To check the statistics of this specific session,what is the best way to proceed and this is my initial question.One of my friend pointed out to check v$sess_io and he is not sure whether it will return cumulative values because this view has only sid and not having serial#..and sid is not unique

So, answering the questions as stated, with a little bit of padding:

1) In this case, we cannot enable trace to the session as it already started running and 10 minutes passed way.If we enable trace at this point,will it gives the required info?

If the session has started and has been running for 10 minutes it’s still possible to force tracing into the session and, depending what the program does, you may be lucky enough to get enough information in the trace/tkprof file to help you. The “most-approved” way of doing this for a session is probably through a call to dbms_monitor.session_trace_enable(), but if that’s a package that Standard Edition is not licensed to use then there’s dbms_system.set_sql_trace_in_session().

If this doesn’t help, and if the session is still live and running, you could also look at v$open_cursor for that SID to see what SQL statements (sql_id, child_address, last_active_time and first 60 characters of the text) are still open for that session, then query v$sql for more details about those statements (using sql_id and child_address). The stats you find in those statements are accumulated across all executions by all sessions from the moment the cursor went into the library cache, but if this is a program that runs once per day then it’s possible that the critical statement will only be run by that one session, and the cursor from the previous day will have aged out of the library cache so that what you see is just today’s run.

Since you’re on Standard Edition and don’t have access to the AWR you should have installed Statspack – which gives you nearly everything that you can get from the AWR reports (the most significant difference being the absence of the v$active_session_history – but there are open-source emulators that you can install as a fairly good substitute for that functionality). If there is one statement in your program that does a lot of work then it’s possible that it might be one of the top N statements in a Statspack snapshot.

If this program is a known modular piece of code could you alter the mechanism that calls it to include calls to enable session tracing at the start of the program (and disable it, perhaps, at the end of the progam).  This might be by modifying the code directly, or by modifying the wrapper that drive the program, or by adding a logon trigger if there’s a mechanism that would allow Oracle to recognise the specific session that runs this particular program, or if something sets an identifiable (and unambiguous) module and/or action as part of calling the program then you could use the dbms_monitor package to enable tracing for (e.g.) a particular module and action on a permanent basis.

2) To check the statistics of this specific session,what is the best way to proceed and this is my initial question.One of my friend pointed out to check v$sess_io and he is not sure whether it will return cumulative values because this view has only sid and not having serial#..and sid is not unique

An answer: the stats are for one SID and SERIALl#, whether you’re looking at v$sess_io, v$sesstat, v$session_event, v$sess_time_model and any other v$sesXXX views that I can’t recall off-hand.  In passing, if you can add a wrapper to the calling code, capturing sessions activity stats (v$sesstat) wait time (v$session_event) and time model summary (v$sess_time_model) is a very useful debugging aid.

And an explanation: the “session” array is a fixed size array, and the SID is the array subscript of the entry your session acquired at logon. Since the array is fixed size Oracle has to keep re-using the array entries so each time it re-uses an array entry it increments the serial# so that (sid, serial#) becomes the unique identifier across history[1]. As it acquires the array entry it also resets/re-initializes all the other v$sesXXX arrays/linked list/structures.

The one thing to watch out for when you try to capture any of the session numbers is that you don’t query these v$sesXXX things twice to find a difference unless you also capture the serial# at the same time so that you can be sure that the first session didn’t log off and a second session log on and reuse the same SID between your two snapshots.  (It’s quite possible that this will be obvious even if you haven’t captured the serial#, because you may spot that some of the values that should only increase with time have decreased)



[1] I think there was a time when restarting an instance would reset the serial# to 1 and the serial# has to wrap eventually and the wrap used to be at 65536 because it was stored as a two-byte number – which doesn’t seem to have changed.  Nowadays the serial# seems to be maintained across instance restart (I wonder how that works with carefully timed instance crashes), and given the amount of history that Oracle could maintain in the AWR I suppose there could be some scope for connect two pieces of history that were really from two different sessions.


June 11, 2019

Redo Dumps

Filed under: ASSM,Infrastructure,Oracle,redo,Troubleshooting — Jonathan Lewis @ 12:53 pm BST Jun 11,2019

A thread started on the Oracle-L list-server a few days ago asking for help analysing a problem where a simple “insert values()” (that handled millions of rows per day) was running very slowly. There are many reasons why this might happen, ranging from the trivial (someone has locked the table in exclusive mode), through the slightly subtle (we’re trying to insert a row that collides on a uniqueness constraint with an uncommitted insert from another session) to the subtle (Oracle has to read through the undo to check current versions of blocks against read-consistent versions) ending up at the esoteric (the ASSM space management blocks are completely messed up again).

A 10046 trace of a session doing an insert showed only that there was a lot of time spent on single block reads. Unfortunately, since this was on an Exadata system the waits were reported as “cell single block physical read”. Unfortunately the parameters to this wait event are “cellhash#”, “diskhash#”, and “bytes” and we don’t see the file_id, block_id which can be very helpful for a case like this. The only information we got from the trace file was that the object_id was for the table were rows were being inserted.

Before digging into exotic debugging methods, the OP supplied us with a 1-second session report from Tanel Poder’s snapper script:

>  SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   1070  @2, SYSADMIN, STAT, session logical reads                                     ,         13865,      7.73k,         ,             ,          ,           ,      14.1k total buffer visits
>   1070  @2, SYSADMIN, STAT, user I/O wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait count                                       ,         12230,      6.82k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total IO requests                           ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read requests optimized                          ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes optimized                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes                                 ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, cell physical IO interconnect bytes                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gcs messages sent                                         ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets                                             ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache                                  ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache (fastpath)                       ,          7737,      4.32k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets                                           ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets from cache                                ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin                                       ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin (fastpath)                            ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, logical read bytes from cache                             ,     113541120,     63.34M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads                                            ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads cache                                      ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read IO requests                                 ,          6112,      3.41k,         ,             ,          ,           ,      8.19k bytes per request
>   1070  @2, SYSADMIN, STAT, physical read bytes                                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block changes                                          ,            11,       6.14,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer requested                                     ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, hot buffers moved to head of LRU                          ,           958,     534.39,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer inspected                                     ,          6144,      3.43k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, shared hash latch upgrades - no wait                      ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, blocks decrypted                                          ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo entries                                              ,          6120,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size                                                 ,        465504,    259.67k,         ,             ,          ,           ,          ~ bytes per user commit
>   1070  @2, SYSADMIN, STAT, redo entries for lost write detection                     ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size for lost write detection                        ,        464756,    259.25k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo subscn max counts                                    ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, file io wait time                                         ,       1408659,    785.78k,         ,             ,          ,           ,   230.47us bad guess of IO wait time per IO request
>   1070  @2, SYSADMIN, STAT, gc current blocks received                                ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gc local grants                                           ,          6116,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM cbk:blocks examined                                  ,         12366,       6.9k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L1 bitmaps examined                              ,          2478,      1.38k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L2 bitmaps examined                              ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:reject db                                        ,         12388,      6.91k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, buffer is pinned count                                    ,           230,      128.3,         ,             ,          ,           ,       1.63 % buffer gets avoided thanks to buffer pin caching
>   1070  @2, SYSADMIN, STAT, cell flash cache read hits                                ,          6723,      3.75k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, TIME, background cpu time                                       ,        365192,   203.71ms,    20.4%, [##        ],          ,           ,
>   1070  @2, SYSADMIN, TIME, background elapsed time                                   ,       1273623,   710.45ms,    71.0%, [########  ],          ,           ,      28.95 % unaccounted time
>   1070  @2, SYSADMIN, WAIT, gc current block busy                                     ,           629,   350.87us,      .0%, [          ],         3,       1.67,   209.67us average wait
>   1070  @2, SYSADMIN, WAIT, cell single block physical read                           ,       1557638,   868.88ms,    86.9%, [WWWWWWWWW ],      6746,      3.76k,    230.9us average wait
> --  End of Stats snap 1, end=2019-05-18 12:58:58, seconds=1.8

My first step was simply to read down the list (using a very small font to get the entire width on screen without wrapping) to see if anything stood out as unusual. The report showed two things I rarely see in the session stats:

blocks decrypted                                          ,          6110
redo entries for lost write detection                     ,          6110

These stats tell me that there are two “uncommon” features enabled: db_lost_write_protect, and block level encryption. (So whatever else is going on it’s just possible that mixing in two rarely used – and therefore less frequently tested – features may be confusing the issue.

Lost write protection means Oracle writes a “block read record” (BRR) to the redo log every time it reads a block from disc, so I decided to follow up the 6,110 figure to see what other stats reported similar values.

physical read total IO requests                           ,          6112
physical read requests optimized                          ,          6111
physical reads                                            ,          6111
physical reads cache                                      ,          6111
physical read IO requests                                 ,          6112
free buffer requested                                     ,          6112
redo entries                                              ,          6120
redo entries for lost write detection                     ,          6110
gc local grants                                           ,          6116
cell flash cache read hits                                ,          6723

There’s nothing particularly surprising here – basically we see all the blocks being read as single block reads, into cache. All the necessary global cache (gc) grants are local so it’s possible the table of interest has been remastered to this node. The value for “cell flash cache read hits” look a little odd as the cache is hit more frequently than blocks are read – but dynamic performance views are not read-consistent and this session is hammering away like crazy so this might just be a side effect of the time to gather the data for the report.

We can chase the redo a little further – the number of redo entries is slightly larger than the number of blocks read, so (even though small inconsistencies are not necessarily meaningful) this might tell us something:

redo entries                                              ,          6120
redo size                                                 ,        465504
redo entries for lost write detection                     ,          6110
redo size for lost write detection                        ,        464756 
db block changes                                          ,            11

The number of “redo entries” that were NOT for lost write detection is 10, totalling 748 bytes (not a lot – so indicative of “non-user” activity). The number of “db block changes” is 11 (close enough to 10), and generally it’s changes to db blocks that require redo to be generated. The final significant number is the one that isn’t there – there’s no undo generated, so no user-change to data. This system is working like crazy achieving absolutely nothing at this point.

The next point to ponder is what sort of work it is doing – so let’s check how the physical reads turn into buffer gets.

session logical reads                                     ,         13865
db block gets                                             ,         13860
db block gets from cache                                  ,         13860
db block gets from cache (fastpath)                       ,          7737
consistent gets                                           ,             1
consistent gets from cache                                ,             1
consistent gets pin                                       ,             1
consistent gets pin (fastpath)                            ,             1
hot buffers moved to head of LRU                          ,           958

buffer is pinned count                                    ,           230 

The unusual thing you notice with these figures is that virtually every buffer get is a current get. We’ve also got a number of blocks pinned – this might just be the segment header block, or the segment header and level 2 bitmap block that we keep revisiting. Finally we can see a lot of hot buffers being moved to the head of the LRU; since our session has been doing a lot of work for a long time it seems likely that those buffers are ones that our session is keeping hot – and for a big insert that shouldn’t really be happening unless, perhaps, we were managing to do a lot of maintenance of (well-clustered) indexes.

I’ve isolated the (new in 12.2) “ASSM gsp (get space)” statistics from this output – they’re all about handling blocks, but I wanted to look at them without being distracted by other stats.

ASSM cbk:blocks examined                                  ,         12366 
ASSM gsp:L1 bitmaps examined                              ,          2478 
ASSM gsp:L2 bitmaps examined                              ,             1 
ASSM gsp:reject db                                        ,         12388

We can see that we’ve examined 2,478 “level 1” bitmap blocks. A level 1 block holds the basic “bitmap” that records the state of a number of data blocks (typically 128 blocks once the object gets very large) so our session has worked its way through 2,478 maps trying to find a data block that it could use to insert a row. The “reject db” statistic tells us about data blocks that have been examined and rejected (presumably because the row we want to insert is too large to fit, or maybe because there are no free ITL (interested transaction list) entries available in the block). So we seem to be spending all our time searching for somewhere to insert rows. This shouldn’t really be happening – it’s a type of problem that Oracle has been worrying away at for quite some time: how do you avoid “losing” space by updating bitmap blocks too soon on inserts without going to the opposite extreme and leaving bitmap blocks that claim the space is free when it’s in use by uncommitted transactions.

Note to self: I don’t know how we managed to reject more blocks (12,388) than we’ve examined (12,366) but possibly it’s just one of those timing glitches (the error is less than one fifth of one percent) , possibly it’s something to do with the reject count including some of the L1 bitmap blocks.

Clearly there’s something funny going on with space management – and we need to look at a few blocks that are exhibiting the problems. But how do we find a few suitable blocks? And that’s where, finally, we get to the title of the piece.

We are in the lucky position of having “lost write protection” enabled – so the redo log file will hold lots of “block read records”. We can’t get the file and block addresses we need from the “cell physical read” wait events so let’s ask the redo log to supply them. We just have to pick a log file (online or archived) and tell Oracle to dump some of it – and we can probably get away with a fairly small dump since we want just a single type of redo record over a short period of time. Here’s an example showing the format of two slightly different commands you could execute:

alter system dump logfile '/u01/app/oracle/oradata/orcl12c/redo03.log'
        rba min 2781    1
        rba max 2781    1000
        layer 23 opcode 2

alter system dump redo 
        scn min 19859991 scn max 19964263 
        layer 23 opcode 2

The first command is to dump a log file by name – but you may have to fiddle around a bit to find the names of an archived log file because if you choose this option you need to know the sequence number (sequence# in v$log_history) of the file if you want to restrict the size of the dump. The second command simply dumps redo for (in this example) an SCN range – and it’s easy to query v$log_history to find dates, times, and SCN ranges – Oracle will work out for itself which file it has to access. In both cases I’ve restricted the dump to just those redo records that contain change vectors of type BRR (block read records) which is what the layer 23 opcode 2 line is about.

Here’s an example of a redo record that contains nothing but a single BRR. (It’s from a single-block read, a multi-block read would produce a redo record with multiple change vectors, one vector for each block read.)

REDO RECORD - Thread:1 RBA: 0x000add.00000019.01b0 LEN: 0x004c VLD: 0x10 CON_UID: 2846920952
SCN: 0x00000000025a7c13 SUBSCN:  1 05/23/2019 10:42:51
CHANGE #1 CON_ID:3 TYP:2 CLS:6 AFN:9 DBA:0x00407930 OBJ:40 SCN:0x00000000001a1e2a SEQ:2 OP:23.2 ENC:0 RBL:0 FLG:0x0000
 Block Read - afn: 9 rdba: 0x00407930 BFT:(1024,4225328) non-BFT:(1,31024)
              scn: 0x00000000001a1e2a seq: 0x02
              flags: 0x00000006 ( dlog ckval )
              where: qeilwh05: qeilbk

If you’re wondering about the two interpretations of the rdba (relative datablock address), one is for BFTs (big file tablespaces) and one for non-BFTs. The other thing you’ll notice about the interpretations is that neither file number (1024 or 1) matches the afn (absolute file number). In smaller, non-CDB databases you will probably find that the afn matches the file number in the non-BFT interpretation, but I happen to be testing on a PDB and the first file in my SYSTEM tablespace happens to be the 9th file created in the CDB – connecting as SYS in my PDB I can compare the absolute and “relative” file number very easily:

SQL> select file#, rfile#, name from v$datafile;

     FILE#     RFILE# NAME
---------- ---------- ----------------------------------------------------------------
         9          1 /u01/app/oracle/oradata/orcl12c/orcl/system01.dbf
        10          4 /u01/app/oracle/oradata/orcl12c/orcl/sysaux01.dbf
        11          9 /u01/app/oracle/oradata/orcl12c/orcl/undotbs01.dbf
        12         12 /u01/app/oracle/oradata/orcl12c/orcl/users01.dbf
        13         13 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1991375173370654.dbf
        14         14 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1993195660370985.dbf
        22         22 /u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
        23         23 /u01/app/oracle/oradata/orcl12c/orcl/test_8k.dbf

8 rows selected.

For bigfile tablespaces the “relative” file number is a complete fake and simply reports 1024 – you’re only allowed one file in a bigfile tablespace, so there is no “relativity” involved. (Unless you’re working at CERN and storing data about particle collisions in the LHC.)

The key point to remember when reading BRRs then, is that you should take the file number from the afn and the block number from the (appropriate) interpretation of the rdba. For the example above I would issue: “alter database dump datafile 9 block 31024;”


The originator of the thread hasn’t yet made any public response to the suggestion of dumping and reviewing blocks – possibly they’ve started a private conversation with Stefan Koehler who had suggested a strategy that examined function calls rather than block contents – so we’re unable to do any further analysis on what’s going on behind the scenes.

What we would be looking for is any indication that Oracle is repeatedly re-reading the same bitmap blocks and the same data blocks (by a simple check of block addresses); and if that is the case we would want to get some clue about why that might be happening by examining the contents of the data blocks that are subject to repeated reads without changing their status in the bitmap from “space available” to “full”.  As it is we just have to wait for the OP to tell us if they’ve made any further progress.


May 7, 2019

Execution Plan Puzzle

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 12:45 pm BST May 7,2019

Here’s an execution plan that’s just been published on the ODC database forum. The plan comes from a call to dbms_xplan.display_cursor() with rowsource execution statistics enabled.

There’s something unusual about the execution statistics that I don’t think I’ve seen before – can anyone else see anything really odd, or (better still) anything which they would expect others to find odd but which they can easily explain.

A couple of hints:

  • It’s nothing to do with the fact that E-Rows and A-Rows don’t match – that’s never a surprise.
  • It’s not really about the fact that huge amounts of time seems to appear out of “nowhere” in the A-Time column
  • It is something to do with the relationship between A-Rows and Starts

I’ve inserted a few separator lines to break the plan into smaller pieces that can be examined in isolation. There are two “Load as Select” sections (presumably from “with” subqueries) and the main body of the query.

We don’t, as at time of writing, have the SQL or the Oracle version number that produced this plan. [Update: version now reported as]

| Id  | Operation                                       | Name                         | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
|   0 | SELECT STATEMENT                                |                              |      1 |        |     50 |00:00:18.00 |     367K|     55 |     55 |       |       |          |
|   1 |  TEMP TABLE TRANSFORMATION                      |                              |      1 |        |     50 |00:00:18.00 |     367K|     55 |     55 |       |       |          |
|   2 |   LOAD AS SELECT                                |                              |      1 |        |      0 |00:00:00.55 |   56743 |      0 |     53 |  1040K|  1040K|          |
|   3 |    TABLE ACCESS BY INDEX ROWID                  | OBJECTS                      |   7785 |      1 |   7785 |00:00:00.03 |    8150 |      0 |      0 |       |       |          |
|*  4 |     INDEX UNIQUE SCAN                           | PK_OBJECTS                   |   7785 |      1 |   7785 |00:00:00.01 |     360 |      0 |      0 |       |       |          |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   7785 |      1 |   5507 |00:00:00.05 |   12182 |      0 |      0 |       |       |          |
|*  6 |     INDEX RANGE SCAN                            | UK_ATTR                      |   7785 |      1 |   5507 |00:00:00.03 |    9621 |      0 |      0 |       |       |          |
|*  7 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   7785 |      1 |   5507 |00:00:00.03 |   12182 |      0 |      0 |       |       |          |
|*  9 |     INDEX RANGE SCAN                            | UK_ATTR                      |   7785 |      1 |   5507 |00:00:00.02 |    9621 |      0 |      0 |       |       |          |
|* 10 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|  11 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |   1366 |      1 |   1366 |00:00:00.02 |    4592 |      0 |      0 |       |       |          |
|* 12 |     INDEX RANGE SCAN                            | IDX_ATTR_NDC_OBJECT_VALUE    |   1366 |      1 |   1366 |00:00:00.01 |    3227 |      0 |      0 |       |       |          |
|* 13 |      INDEX RANGE SCAN                           | NCI_NODES_COVERING_IDX       |   1366 |      1 |   1366 |00:00:00.01 |     595 |      0 |      0 |       |       |          |
|* 14 |    VIEW                                         |                              |      1 |     12 |   7785 |00:00:00.41 |   24174 |      0 |      0 |       |       |          |
|* 15 |     FILTER                                      |                              |      1 |        |   7891 |00:00:00.39 |   19582 |      0 |      0 |       |       |          |
|* 16 |      CONNECT BY WITH FILTERING                  |                              |      1 |        |  66134 |00:00:00.37 |   19144 |      0 |      0 |  7069K|  1062K| 6283K (0)|
|  17 |       TABLE ACCESS BY INDEX ROWID               | NODES                        |      1 |      1 |      1 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|* 18 |        INDEX UNIQUE SCAN                        | PK_NODES                     |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  19 |       NESTED LOOPS                              |                              |      9 |     11 |  66133 |00:00:00.19 |   19137 |      0 |      0 |       |       |          |
|  20 |        CONNECT BY PUMP                          |                              |      9 |        |  66134 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 21 |        TABLE ACCESS BY INDEX ROWID BATCHED      | NODES                        |  66134 |     11 |  66133 |00:00:00.15 |   19137 |      0 |      0 |       |       |          |
|* 22 |         INDEX RANGE SCAN                        | NCI_NODES_PARENT_NODE_ID     |  66134 |     11 |  67807 |00:00:00.08 |   12139 |      0 |      0 |       |       |          |
|  23 |         TABLE ACCESS BY INDEX ROWID             | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 24 |          INDEX UNIQUE SCAN                      | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|  25 |       TABLE ACCESS BY INDEX ROWID               | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 26 |        INDEX UNIQUE SCAN                        | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 27 |      TABLE ACCESS BY INDEX ROWID                | OBJECT_TYPES                 |    219 |      1 |      1 |00:00:00.01 |     438 |      0 |      0 |       |       |          |
|* 28 |       INDEX UNIQUE SCAN                         | PK_OBJECT_TYPES              |    219 |      1 |    219 |00:00:00.01 |     219 |      0 |      0 |       |       |          |
|  29 |   LOAD AS SELECT                                |                              |      1 |        |      0 |00:00:02.86 |   37654 |     53 |      2 |  1040K|  1040K|          |
|  30 |    TABLE ACCESS BY INDEX ROWID                  | OBJECTS                      |    316 |      1 |    316 |00:00:00.01 |     603 |      0 |      0 |       |       |          |
|* 31 |     INDEX UNIQUE SCAN                           | PK_OBJECTS                   |    316 |      1 |    316 |00:00:00.01 |     287 |      0 |      0 |       |       |          |
|  32 |    TABLE ACCESS BY INDEX ROWID BATCHED          | ATTRIBUTES                   |    316 |      1 |    316 |00:00:00.01 |     950 |      0 |      0 |       |       |          |
|* 33 |     INDEX RANGE SCAN                            | UK_ATTR                      |    316 |      1 |    316 |00:00:00.01 |     666 |      0 |      0 |       |       |          |
|* 34 |      TABLE ACCESS FULL                          | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|  35 |    HASH UNIQUE                                  |                              |      1 |    148 |    316 |00:00:02.86 |   37650 |     53 |      0 |  1041K|  1041K| 1371K (0)|
|* 36 |     FILTER                                      |                              |      1 |        |   5500 |00:00:02.85 |   36097 |     53 |      0 |       |       |          |
|  37 |      MERGE JOIN CARTESIAN                       |                              |      1 |    148 |   5114K|00:00:02.23 |   34073 |     53 |      0 |       |       |          |
|* 38 |       HASH JOIN                                 |                              |      1 |     12 |    657 |00:00:01.05 |   34016 |      0 |      0 |  1003K|  1003K|  728K (0)|
|  39 |        NESTED LOOPS                             |                              |      1 |     69 |    969 |00:00:00.36 |   20145 |      0 |      0 |       |       |          |
|  40 |         NESTED LOOPS                            |                              |      1 |    132 |    970 |00:00:00.36 |   19975 |      0 |      0 |       |       |          |
|  41 |          VIEW                                   |                              |      1 |     12 |    312 |00:00:00.35 |   19582 |      0 |      0 |       |       |          |
|* 42 |           FILTER                                |                              |      1 |        |    312 |00:00:00.35 |   19582 |      0 |      0 |       |       |          |
|* 43 |            CONNECT BY WITH FILTERING            |                              |      1 |        |  66134 |00:00:00.34 |   19144 |      0 |      0 |  6219K|  1010K| 5527K (0)|
|  44 |             TABLE ACCESS BY INDEX ROWID         | NODES                        |      1 |      1 |      1 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|* 45 |              INDEX UNIQUE SCAN                  | PK_NODES                     |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  46 |             NESTED LOOPS                        |                              |      9 |     11 |  66133 |00:00:00.18 |   19137 |      0 |      0 |       |       |          |
|  47 |              CONNECT BY PUMP                    |                              |      9 |        |  66134 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|* 48 |              TABLE ACCESS BY INDEX ROWID BATCHED| NODES                        |  66134 |     11 |  66133 |00:00:00.15 |   19137 |      0 |      0 |       |       |          |
|* 49 |               INDEX RANGE SCAN                  | NCI_NODES_PARENT_NODE_ID     |  66134 |     11 |  67807 |00:00:00.08 |   12139 |      0 |      0 |       |       |          |
|  50 |               TABLE ACCESS BY INDEX ROWID       | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 51 |                INDEX UNIQUE SCAN                | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|  52 |             TABLE ACCESS BY INDEX ROWID         | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 53 |              INDEX UNIQUE SCAN                  | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|* 54 |            TABLE ACCESS BY INDEX ROWID          | OBJECT_TYPES                 |    219 |      1 |      1 |00:00:00.01 |     438 |      0 |      0 |       |       |          |
|* 55 |             INDEX UNIQUE SCAN                   | PK_OBJECT_TYPES              |    219 |      1 |    219 |00:00:00.01 |     219 |      0 |      0 |       |       |          |
|* 56 |          INDEX RANGE SCAN                       | NCI_NODES_PARENT_NODE_ID     |    312 |     11 |    970 |00:00:00.01 |     393 |      0 |      0 |       |       |          |
|* 57 |         TABLE ACCESS BY INDEX ROWID             | NODES                        |    970 |      6 |    969 |00:00:00.01 |     170 |      0 |      0 |       |       |          |
|* 58 |        VIEW                                     | index$_join$_065             |      1 |     42 |      4 |00:00:00.69 |   13871 |      0 |      0 |       |       |          |
|* 59 |         HASH JOIN                               |                              |      1 |        |    434 |00:00:00.01 |      12 |      0 |      0 |  1519K|  1519K| 1491K (0)|
|  60 |          INDEX FAST FULL SCAN                   | PK_OBJECT_TYPES              |      1 |     42 |    434 |00:00:00.01 |       4 |      0 |      0 |       |       |          |
|  61 |          INDEX FAST FULL SCAN                   | UK_IDX_OBJECT_TYPE_NDC       |      1 |     42 |    434 |00:00:00.01 |       8 |      0 |      0 |       |       |          |
|  62 |       BUFFER SORT                               |                              |    657 |     12 |   5114K|00:00:00.63 |      57 |     53 |      0 |   372K|   372K|  330K (0)|
|  63 |        VIEW                                     |                              |      1 |     12 |   7785 |00:00:00.02 |      57 |     53 |      0 |       |       |          |
|  64 |         TABLE ACCESS FULL                       | SYS_TEMP_0FD9D761B_1445481D  |      1 |     12 |   7785 |00:00:00.02 |      57 |     53 |      0 |       |       |          |
|  65 |      TABLE ACCESS BY INDEX ROWID                | OBJECTS                      |    657 |      1 |    657 |00:00:00.01 |    1068 |      0 |      0 |       |       |          |
|* 66 |       INDEX UNIQUE SCAN                         | PK_OBJECTS                   |    657 |      1 |    657 |00:00:00.01 |     410 |      0 |      0 |       |       |          |
|  67 |      TABLE ACCESS BY INDEX ROWID BATCHED        | ATTRIBUTES                   |    318 |      1 |    318 |00:00:00.01 |     956 |      0 |      0 |       |       |          |
|* 68 |       INDEX RANGE SCAN                          | UK_ATTR                      |    318 |      1 |    318 |00:00:00.01 |     670 |      0 |      0 |       |       |          |
|* 69 |        TABLE ACCESS FULL                        | ATTRIBUTE_TYPES              |      1 |      1 |      1 |00:00:00.01 |      38 |      0 |      0 |       |       |          |
|  70 |   SORT GROUP BY                                 |                              |      1 |      1 |     50 |00:00:14.59 |     273K|      2 |      0 |   619K|   471K|  550K (0)|
|  71 |    VIEW                                         |                              |      1 |      1 |   4375 |00:00:13.31 |     273K|      2 |      0 |       |       |          |
|  72 |     HASH UNIQUE                                 |                              |      1 |      1 |   4375 |00:00:13.31 |     273K|      2 |      0 |  1186K|  1186K| 1400K (0)|
|  73 |      TABLE ACCESS BY INDEX ROWID                | OBJECTS                      |   4606 |      1 |   4606 |00:00:05.59 |   37088 |      0 |      0 |       |       |          |
|* 74 |       INDEX UNIQUE SCAN                         | PK_OBJECTS                   |   4606 |      1 |   4606 |00:00:05.56 |   32472 |      0 |      0 |       |       |          |
|* 75 |      HASH JOIN                                  |                              |      1 |      1 |   4375 |00:00:13.29 |     273K|      2 |      0 |  1410K|  1075K| 1423K (0)|
|  76 |       NESTED LOOPS                              |                              |      1 |      1 |   4375 |00:00:00.07 |   12952 |      2 |      0 |       |       |          |
|  77 |        NESTED LOOPS                             |                              |      1 |      2 |   4375 |00:00:00.06 |   12593 |      2 |      0 |       |       |          |
|  78 |         NESTED LOOPS                            |                              |      1 |      1 |   4375 |00:00:00.05 |   11761 |      2 |      0 |       |       |          |
|* 79 |          HASH JOIN                              |                              |      1 |      1 |   5500 |00:00:00.01 |      60 |      2 |      0 |  1321K|  1321K| 1775K (0)|
|  80 |           VIEW                                  |                              |      1 |     12 |   7785 |00:00:00.01 |      54 |      0 |      0 |       |       |          |
|  81 |            TABLE ACCESS FULL                    | SYS_TEMP_0FD9D761B_1445481D  |      1 |     12 |   7785 |00:00:00.01 |      54 |      0 |      0 |       |       |          |
|  82 |           VIEW                                  |                              |      1 |    148 |    316 |00:00:00.01 |       6 |      2 |      0 |       |       |          |
|  83 |            TABLE ACCESS FULL                    | SYS_TEMP_0FD9D761C_1445481D  |      1 |    148 |    316 |00:00:00.01 |       6 |      2 |      0 |       |       |          |
|  84 |          TABLE ACCESS BY INDEX ROWID BATCHED    | ATTRIBUTES                   |   5500 |      1 |   4375 |00:00:00.04 |   11701 |      0 |      0 |       |       |          |
|* 85 |           INDEX RANGE SCAN                      | IDX_ATTR_NDC_OBJECT_VALUE    |   5500 |      1 |   4375 |00:00:00.02 |    7353 |      0 |      0 |       |       |          |
|* 86 |         INDEX RANGE SCAN                        | NCI_ATTRIBUTE_VALUES_ATTR_ID |   4375 |      2 |   4375 |00:00:00.01 |     832 |      0 |      0 |       |       |          |
|  87 |        TABLE ACCESS BY INDEX ROWID              | ATTRIBUTE_VALUES             |   4375 |      2 |   4375 |00:00:00.01 |     359 |      0 |      0 |       |       |          |
|  88 |       VIEW                                      |                              |      1 |   1730 |   4606 |00:00:13.21 |     260K|      0 |      0 |       |       |          |
|* 89 |        FILTER                                   |                              |      1 |        |   4606 |00:00:00.06 |    2094 |      0 |      0 |       |       |          |
|* 90 |         CONNECT BY WITH FILTERING               |                              |      1 |        |   4922 |00:00:00.05 |    2037 |      0 |      0 |   478K|   448K|  424K (0)|
|  91 |          NESTED LOOPS                           |                              |      1 |    148 |    316 |00:00:00.01 |     953 |      0 |      0 |       |       |          |
|  92 |           NESTED LOOPS                          |                              |      1 |    148 |    316 |00:00:00.01 |     637 |      0 |      0 |       |       |          |
|  93 |            VIEW                                 | VW_NSO_1                     |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  94 |             HASH UNIQUE                         |                              |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |  2170K|  2170K| 2517K (0)|
|  95 |              VIEW                               |                              |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|  96 |               TABLE ACCESS FULL                 | SYS_TEMP_0FD9D761C_1445481D  |      1 |    148 |    316 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|* 97 |            INDEX UNIQUE SCAN                    | PK_NODES                     |    316 |      1 |    316 |00:00:00.01 |     634 |      0 |      0 |       |       |          |
|  98 |           TABLE ACCESS BY INDEX ROWID           | NODES                        |    316 |      1 |    316 |00:00:00.01 |     316 |      0 |      0 |       |       |          |
|  99 |          NESTED LOOPS                           |                              |      2 |   1582 |   4606 |00:00:00.01 |    1081 |      0 |      0 |       |       |          |
| 100 |           CONNECT BY PUMP                       |                              |      2 |        |   4922 |00:00:00.01 |       0 |      0 |      0 |       |       |          |
|*101 |           TABLE ACCESS BY INDEX ROWID BATCHED   | NODES                        |   4922 |     11 |   4606 |00:00:00.01 |    1081 |      0 |      0 |       |       |          |
|*102 |            INDEX RANGE SCAN                     | NCI_NODES_PARENT_NODE_ID     |   4922 |     11 |   4608 |00:00:00.01 |     950 |      0 |      0 |       |       |          |
| 103 |            TABLE ACCESS BY INDEX ROWID          | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|*104 |             INDEX UNIQUE SCAN                   | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
| 105 |          TABLE ACCESS BY INDEX ROWID            | OBJECT_TYPES                 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |      0 |       |       |          |
|*106 |           INDEX UNIQUE SCAN                     | UK_IDX_OBJECT_TYPE_NDC       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |      0 |       |       |          |
|*107 |         TABLE ACCESS BY INDEX ROWID             | OBJECT_TYPES                 |      3 |      1 |      1 |00:00:00.01 |      57 |      0 |      0 |       |       |          |
|*108 |          INDEX UNIQUE SCAN                      | PK_OBJECT_TYPES              |      3 |      1 |      3 |00:00:00.01 |       3 |      0 |      0 |       |       |          |

 Predicate Information (identified by operation id):
   4 - access("O"."OBJECT_ID"=:B1)
   6 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
   9 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  13 - access("NOD"."NODE_ID"=:B1)
  15 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL AND "OBJECT_ID" IS NOT NULL))
  16 - access("N"."PARENT_NODE_ID"=PRIOR NULL)
  18 - access("N"."NODE_ID"=TO_NUMBER(:I_NODE_ID))
  21 - filter("N"."OBJECT_TYPE_ID"<>)
  22 - access("connect$_by$_pump$_029"."PRIOR n.node_id"="N"."PARENT_NODE_ID")
  27 - filter("NAME"=:I_SEARCH_OBJ_TYPE)
  28 - access("OBJECT_TYPE_ID"=:B1)
  31 - access("O"."OBJECT_ID"=:B1)
  33 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  36 - filter(("CN"."CODELIST"= AND "CN"."CODELIST_MUI"=))
  38 - access("N"."OBJECT_TYPE_ID"="OBJECT_TYPE_ID")
  42 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL))
  43 - access("PARENT_NODE_ID"=PRIOR NULL)
  45 - access("NODE_ID"=TO_NUMBER(:I_NODE_ID))
  48 - filter("OBJECT_TYPE_ID"<>)
  49 - access("connect$_by$_pump$_049"."PRIOR node_id "="PARENT_NODE_ID")
  55 - access("OBJECT_TYPE_ID"=:B1)
  56 - access("N"."PARENT_NODE_ID"="NODE_ID")
  57 - filter("N"."DELETION_DATE"='01-Jan-1900')
  59 - access(ROWID=ROWID)
  66 - access("O"."OBJECT_ID"=:B1)
  68 - access("A"."OBJECT_ID"=:B1 AND "A"."ATTRIBUTE_TYPE_ID"=)
  79 - access("CN"."CODELIST_MUI"="CL"."MUI_VALUE")
  86 - access("A"."ATTRIBUTE_ID"="AV"."ATTRIBUTE_ID")
  89 - filter(("DELETION_DATE"='01-Jan-1900' AND  IS NOT NULL))
  90 - access("N"."PARENT_NODE_ID"=PRIOR NULL)
  97 - access("N"."NODE_ID"="NODE_ID")
 101 - filter("N"."OBJECT_TYPE_ID"<>)
 102 - access("connect$_by$_pump$_082"."PRIOR n.node_id "="N"."PARENT_NODE_ID")
 108 - access("OBJECT_TYPE_ID"=:B1)

Any observations welcome. I’m not expecting many people to see the anomaly I see (and there may be further anomalies I haven’t even looked for that others do see straight away), but it’s possible that the pattern is one that some people frequently see and find totally unsurprising.

Update – where’s the anomaly

The anomaly is the presence of operations 73 and 74.

There are two different observations that make these lines stand out. First, operation 72 is a hash unique which is a “single child” operation that calls its child to supply a rowsource and then reduces that rowsource to a distinct set using a hashing mechanism. But in this plan we can see that operation 72 appears to have two child operations – numbers 73 and 75 – so clearly the plan isn’t following the pure “standard” pattern.

Secondly, notice that operations 73 and 74 both report 4,606 Starts. An operation that reports “N” starts has to have a parent operation calling it N times, which means the parent operation must have reported (at least) N rows  under the A-Rows heading. But we know that the hash unique operation will call its child operation exactly once – and we can see that the hash unique here has only been called once. So something else much be causing the 4,606 Starts.

Fortunately we remember that “scalar subqueries in the select list” will report their execution plans above the part of the plan that describes the main body of the query. In fact we can see this several times in the two “load as select” parts of this plan; operations (3,4), (5,6,7), (8,9,10), (11,12,13) describe 4 scalar subqueries that must be embedded in the select list of the first “with” subquery that is described by operations 14 – 28.

So we could assume, for the moment, that operations 73 and 74 are in some way an inline scalar subquery in a select list – and that leads to the next step in the problem. A scalar subquery will operate at most once for each row returned in the main rowsource – though the number of starts might be reduced by the effects of scalar subquery caching. Operations 73 and 74 start 4,606 times; the rowsource that we feel it ought to be associated with is the hash join immediately below it (operation 75) which returns 4,375 rows, moreover the first child of the hash join returns 4,375 rows – so we’re not seeing enough rows returned to justify our second attempt at interpreting the plan.

So where can we find something that returns 4,606 (or more) rows that would allow us to say “maybe that’s were the scalar subquery goes” ?

Look further down the plan – operation 88 (the view operation that constitutes the second child of the hash join) reports A-Rows = 4,606. Operations 73,74 really ought to be reported between operations 88 and 89.  There’s a filter at operation 89 that reduces the 4,922 rows produced by operation 90 to 4,606 and it’s after that that the scalar subquery runs to add a column to the rowsource before passing it upwards. (We might be able to see this if we had the projection information for the plan)

Corroborating this claim we can look at the A-Time for operation 88: it’s 13.21 seconds and there’s nothing below it that accounts for that time; but if we insert operations 73 and 74 just below operation 88 we suddenly have 4,606 subquery calls which report 5.59 seconds and that’s a step in the right direction for identifying the 13.21 seconds that appeared “from nowhere” – especially when you notice that the predicate for operation 74 (or 88a) calls a PL/SQL packaged procedure that is either calling three more Pl/SQL procedures or 3 user-defined types and probably using a fair amount of unrecorded time.


Scalar subqueries in select lists can dump their execution plans in places you don’t expect. We know that the plan for a scalar subquery in the select list of a simple query will report itself above the main body of the query plan. Here we have an example of a scalar subquery that reports itself an extra step further out than we intuitively suspect and, quite possibly, if we hadn’t had the rowsource execution statistics to guide us, we wouldn’t have been able to work out what the plan was really trying to tell us.


Since Andy Sayer had been commenting on the same ODC thread I emailed him a brief version of the above notes last night, and he created, and emailed to me, a very simple example to reproduce this behaviour which I’ve also tested on

May 3, 2019


Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 1:34 pm BST May 3,2019

Before you comment – I do know that the title has a spelling mistake in it. That’s because the Oracle code uses exactly this spelling in one of the little-used features of tracing.

I write a note a few years ago about enabling sql_trace (and other tracing events) system-wide for a single SQL statement. In the note I suggested that you could enable tracing for a few minutes then disable it to minimise the impact on the system while still capturing a reasonable number of statement traces. A recent ODC thread, however, described a problem where a particular statement executed in the order of 1,000,000 times per hour – which is getting on for about 300 executions per second, and you probably don’t want to leave a system-wide trace running for any length of time when things are operating at that rate. Fortunately we can refine the method with the occurence filter to capture a small and limited number of executions, spread over as many sessions as are running. Here’s an example of the syntax:

rem     Script: trace_occur.sql
rem     Author: Jonathan Lewis
rem     Dated:  April 2019

define m_sql_id = 'gu1s28n6y73dg'
define m_sql_id = 'fu0ftxk6jcyya'

alter system set events 
        {occurence: start_after 101, end_after 496}

pause   Run the test script here and press return when it ends

alter system set events 

All I’ve done, compared to the earlier note, is include in curly brackets, just after identifying the SQL ID, the text: “{occurence: start_after 101 , end_after 496}”. Roughly speaking this means that every session will start counting calls to the given statement and on the hundred and first it will start dumping the trace file, and for a total of 496 calls it will continue dumping the trace file. So it’s possible to make sure that a session does trace but doesn’t dump a huge volume of trace data. Of course I do still execute a call to switch tracing off for the statement otherwise every session that subsequently logs on will still start tracing and dump a few executions into their trace file.

There is, unfortunately, a catch. I don’t know how Oracle is counting for the start_after/end_after values – but it’s not executions of the statement, and it varies with working environment, and it changes as the trace is enabled, and it changes with version, and is probably dependent on the session_cached_cursors parameter, and it behaves differently when interacting with the PL/SQL cursor cache. It is perhaps easiest to show an example.

I have table called test_lobs (id, bytes …) with a unique index on (id) for this test.

create table test_lobs (
        id        number(8,0) primary key,
        bytes     number(8,0)

insert into test_lobs values(-1,999):

execute dbms_stats.gather_table_stats(user,'test_lobs')

And one of my test scripts is as follows:

rem     This generates a statement with SQL_ID = gu1s28n6y73dg

        m_result number;
        for i in 1..1000 loop
                        select bytes into m_result from test_lobs where id = i;
                        when others then null;
        end loop;

Running 18.3 I start the trace script from one session, then start the test script from another session. As it stands the SQL statement embedded in the PL/SQL loop will have the SQL_ID I am tracing, so the second session will start dumping a trace file. The big question is: which executions of the statement will it dump? Since I’ve enabled bind variable dumping and the bound value is a simple loop counter it will be easy (!) to find the answer to this question.

To stabilise the results I did the following:

  • Session 1: Create the table.
  • Session 1: Start the trace event
  • Session 2: Connect to the database and run the test
  • Session 1: End the trace event
  • Session 1: Start the trace event again
  • Session 2: Connect to the database again and run the test a second time
  • Session 1: End the trace event

I’ll explain the need for looking at the results of the second cycle in a moment.

The trace file I produced started with the first three lines below, and then repeated the 10 line highlighted fragment a number of times:

PARSING IN CURSOR #140126713239784 len=43 dep=1 uid=104 oct=3 lid=104 tim=168304257545 hv=233016751 ad='63b8f0c0' sqlid='gu1s28n6y73dg'

BINDS #140126713239784:

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
  oacflg=03 fl2=1206001 frm=00 csi=00 siz=24 off=0
  kxsbbbfp=7f71cb0c67c0  bln=22  avl=02  flg=05
EXEC #140126713239784:c=0,e=57,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2054437130,tim=168304262695
FETCH #140126713239784:c=0,e=3,p=0,cr=1,cu=0,mis=0,r=0,dep=1,og=1,plh=2054437130,tim=168304262729
CLOSE #140126713239784:c=0,e=1,dep=1,type=3,tim=168304262772

Notice the bind value report. A key feature that we are interested in is the first “value=” reported and the last “value=”. In my case the low/high were 26 and 87, for a total of 62 executions. A little arithmetic (and a few corroborating checks) highlight the significance of the following:

  • I started tracing after the 25th execution, and 25 * 4 + 1 = 101, my start_after value.
  • I traced 62 executions and 62 * 8 = 496, my end_after value.

Oracle is counting something whenever it hits the SQL_ID we’ve specified but (for SQL inside a PL/SQL loop) it’s counting something which happens 4 times for each execution; then when it hits the start_after and starts tracing whatever it counts happens twice as often each time around the loop.

My general response to this type of detail is: “Argh!!!” – by the way. Maybe a call to Frits or Stefan asking them to trace C functions is in order.

By this time you may have guessed why I examined the trace file from the second run of the test. The counting seems to include counts of database calls that take place in the recursive SQL needed to optimise / hard parse the query – anything that relates to the SQL_ID we specify may be included in the counts. So on the first test I got a load of garbage in the trace file then saw a partial dump of the trace data for value=2 and the trace file ended partway through the trace data for value=17.

As a further test, I had a go with pure SQL calls in a test script:

set serveroutput off
variable b1 number;

exec :b1 := 100
select bytes from test_lobs where id = :b1;

exec :b1 := 101
select bytes from test_lobs where id = :b1;


exec :b1 := 129
select bytes from test_lobs where id = :b1;

After getting a stable result, versions and behaved differently;

  • – counted 5 for every execution, so start_after = 16 skipped the first 3 executions and started tracing for value = 103
  • – counted 7 for the first execution and 5 thereafter, so start_after=8 skipped one execution, start_after=13 skipped two and so on.
  • Both versions counted 10 for every execution while tracing was enabled, so end_after = 30 traced 3 executions in both cases.

It’s possible, of course, that some differences in the way session_cached_cursors works would for the small difference – but I suspect I could have spent a couple of days trying to sort out minor variations due to slight parameter and implementation changes between versions. It’s also possible that some of my guesses are wrong and there is more method to the madness than I have spotted.


It is possible to enable tracing system-wide for a limited number of executions per session of a given statement; however the number of executions that might take place before tracing starts and the number of executions actually traced depends on a variety of details of which some may be outside your control.

As a baseline, it looks as if the number of executions before tracing starts is going to be about one-fifth of the value you set for start_after, and the number of executions trace will be about one-tenth of the end_after; however recursive SQL (perhaps even including dynamic sampling) can get caught up in the counts, potentially reducing the number of executions of the target statement that you see.

March 21, 2019

Lost time

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:51 pm BST Mar 21,2019

Here’s a little puzzle that came up in the ODC database forum yesterday – I’ve got a query that has been captured by SQL Monitor, and it’s taking much longer to run than it should but the monitoring report isn’t telling me what I need to know about the time.

Here’s a little model to demonstrate the problem – I’m going to join a table to itself (the self join isn’t a necessary feature of the demonstration, I’ve just been a bit lazy in preparing data). Here’s a (competely truthful) description of the table:

SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 MOD_1000                               NUMBER
 V1                                     VARCHAR2(40)
 V2                                     VARCHAR2(40)
 PADDING                                VARCHAR2(100)

SQL> select num_rows, blocks from user_tables where table_name = 'T1';

---------- ----------
    400000       7798

1 row selected.

And here’s the text version of the output I get from dbms_monitor.report_sql_monitor() for the query running on

Global Information
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TEST_USER (261:36685)
 SQL ID              :  g6j671u7zc9mn
 SQL Execution ID    :  16777218
 Execution Started   :  03/21/2019 08:54:56
 First Refresh Time  :  03/21/2019 08:54:56
 Last Refresh Time   :  03/21/2019 08:55:17
 Duration            :  21s
 Module/Action       :  MyModule/MyAction
 Service             :  SYS$USERS
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  2

Global Stats
| Elapsed |   Cpu   |    IO    | PL/SQL  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Time(s) | Calls |  Gets  | Reqs | Bytes |
|      21 |      21 |     0.04 |    0.07 |     2 |     3M |  215 | 180MB |

SQL Plan Monitoring Details (Plan Hash Value=83896840)
| Id |       Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
|  0 | SELECT STATEMENT      |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  1 |   FILTER              |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  2 |    HASH JOIN OUTER    |      |     400 | 2014 |        21 |     +1 |     1 |      400 |      |       |   2MB |          |                 |
|  3 |     TABLE ACCESS FULL | T1   |     400 | 1005 |         1 |     +1 |     1 |      400 |   62 |  60MB |     . |          |                 |
|  4 |     TABLE ACCESS FULL | T1   |    400K | 1001 |        21 |     +1 |     1 |     400K |      |       |     . |          |                 |

As you can see the total elapsed time 21 seconds of which the CPU time is the whole 21 seconds.

This seems a little large for a simple hash join so we should look at the “Activity Detail” section of the plan because that will report any ASH samples that can be found for this execution of this query. And that’s where the problem lies: despite using 21 CPU seconds in 21 seconds there are no ASH samples for the execution! Moreover – looking for other numerical oddities – the plan says we read 60MB in 62 read requests (that’s the first tablescan of of the 7,798 blocks of t1), but the summary says we read 180MB – where did we lose (or gain) 120MB ?

It might help to see the query (which is why I didn’t show it in the output above) and it might help to see the predicate section (which SQL Monitor doesn’t report). So here’s the query and its plan, with the resulting predicate section, pulled from memory:

SQL_ID  g6j671u7zc9mn, child number 0
select  /*+ monitor */  t1a.mod_1000, t1b.mod_1000 from  t1 t1a,  t1
t1b where  t1a.mod_1000 = 500 and = and
nvl(t1b.mod_1000,0) + f2( + f3( > 0

Plan hash value: 83896840

| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT    |      |       |       |  2014 (100)|          |
|*  1 |  FILTER             |      |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |      |   400 |  7200 |  2014   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   400 |  3600 |  1005   (3)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |   400K|  3515K|  1001   (3)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter(NVL("T1B"."MOD_1000",0)+"F2"("T1A"."ID")+"F3"("T1A"."ID")>0)
   2 - access("T1B"."ID"="T1A"."ID")
   3 - filter("T1A"."MOD_1000"=500)

Notice the two function calls that are part of the FILTER operation. That’s where the time is going – but I don’t think you can infer that from the SQL Monitor report (which rather suggests that the problem might be with the second full tablescan of t1 at operation 4).

Functions f2() and f3() both do a table scan of another table that is as large as t1 – and it looks as if they’re both going to be called 400 times, which is a lot of work. The 120MB of “lost” reads is the first table scan of each of the two tables, the 3M buffer gets (in the summary, if you didn’t notice it) is from the repeated tablescans as each row comes out of the hash join and the filter is applied.

Why doesn’t the report show us the CPU (and disk) samples? Because it’s querying ASH (v$active_session_history) by the SQL_ID of the principle SQL statement – and the work is being done by two other statements with different SQL_IDs.

Since it’s PL/SQL functions doing all the work why isn’t the 21 CPU seconds showing up in the “PL/SQL time(s)” summary figure? Because the time isn’t being spent in PL/SQL, it’s being spent in the SQL being run by the PL/SQL.

Just to finish off, let’s try to find the SQL being run by the PL/SQL. I’ll post the script to recreate the example at the end of the posting but for the moment I’ll just show you the query against v$sql that I ran to extract the SQL that’s embedded in the functions:

select  sql_id, executions, disk_reads, buffer_gets, sql_text
from    V$sql 
where   sql_text like 'SELECT%T2%' 
or      sql_text like 'SELECT%T3%'

------------- ---------- ---------- ----------- --------------------------------------------------
12ytf1rry45d9        400       7683     3072817 SELECT MAX(ID) FROM T2 WHERE MOD_1000 = :B1
85dmsgqg3bh4w          1       7680        7698 SELECT MAX(ID) FROM T3 WHERE MOD_1000 = :B1

As you can see, there are roughly 120MB of disk I/O and 3M buffer gets due to these two statement – and one of them has run the 400 times we expected. It looks as if Oracle has done a cute little optimisation with the other function, though. If you look at the predicate it says:

  • NVL(“T1B”.”MOD_1000″,0)+”F2″(“T1A”.”ID”)+”F3″(“T1A”.”ID”)>0

It looks as if Oracle has either cached the result of the f3() function call (in which case why didn’t it also cache the f2() result) or it’s stopped evaluating the predicate as soon as the running total exceeded zero (but that might be a problem since f3() could return a negative number !). I don’t know exactly why Oracle is doing what it’s doing – but if I reverse the order of the functions in the predicates the f3() query will run 400 times and the f2() query will run once.


Following the comment below from “anonymous”, pointing out that the “Active” SQL Monitor (and, of course, the equivalent OEM screen) has a section showing the impact of everything the session has been doing while the query ran I re-ran my call to dbms_monitor.report_sql_monitor() with the “type” parameter set to ‘ACTIVE’ rather than ‘TEXT’. After spooling this to a text file (set linesize 255, set pagesize 0) with the suffix “html” and trimming the top and bottom off so that the file started and ended with opening and closing html tags, I opened it in Firefox.

You have to have Adobe Flash Player for this to work, and you have to be connected to the Internet as the file contains content that will call to Oracle’s home domain. After getting Adobe flash to work, here’s a snapshot of the resulting display:

The print may be a little small, but in the lower half of the screen (which usually shows the plan with execution statistics) I’ve selected the “Activity” tab, and this has produced a graphic display of all the samples captured for the session while the query was running. I took this report after re-running the test case, but I had swapped the order of the two functions in the interim, so this output is showing that (from Oracle’s perspective) all the work done during the execution of the query was done by a statement with SQL_ID 85dmsgqg3bh4w.


If you want to re-run some tests here’s the code I used to create the demo:

create table t1
with generator as (
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
        rownum                          id,
        mod(rownum-1,1000)              mod_1000,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
        generator       v1,
        generator       v2
        rownum <= 4e5 -- > comment to avoid WordPress format issue

-- alter table t1 add constraint t1_pk primary key(id);

create table t2 nologging as select * from t1;
create table t3 nologging as select * from t1;

create function f2 (i_in number) return number
        m_ret number;
        select max(id)
        into    m_ret
        from    t2
        where   mod_1000 = i_in

        return m_ret;

create function f3 (i_in number) return number
        m_ret number;
        select  max(id)
        into    m_ret
        from    t3
        where   mod_1000 = i_in

        return m_ret;

                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 1'

alter system flush buffer_cache;

variable b1 number
variable b2 number

set timing on

        /*+ monitor */
        t1a.mod_1000, t1b.mod_1000
        t1      t1a,
        t1      t1b
        t1a.mod_1000 = 500
and =
and     nvl(t1b.mod_1000,0) + f3( + f2( > 0

One interesting little detail, if you try the tests, is that the join has to be an outer join for the FILTER operation to appear.

Next Page »

Powered by