Oracle Scratchpad

October 8, 2020

Direct Path

Filed under: Infrastructure,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:29 pm BST Oct 8,2020

This is a little addendum to a note I wrote a couple of days ago about serial direct path reads and KO (fast object checkpoint) enqueue waits.

The original note was prompted by a problem where someone had set the hidden parameter “_serial_direct_read” to ‘always’ because there were running 11g and wanted some “insert as select” statements to use direct path reads on the select portion and 11g wasn’t co-operating.

Serial direct path reads were introduced as a possibility in (at least) the 8.1.7.4 timeline, but the parameter was set to false until 11gR2 where it changed to auto. (Legal values are: false, true, never, auto, always.)

In 11.2, though, even though a simple select statement could use serial direct path reads for segment scans, Oracle would not use the mechanism for “insert as select”.

This note is just a little piece of code to demonstrate the point.  Run it on 11g and (unless your buffer cache is large enough to make the test table “small”) Oracle will use direct path reads on the select, but scattered reads to cache for the insert. Upgrade to 12.1 and Oracle will use direct path reads on both.

rem
rem     Script:         serial_fail.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem

create table t1
as
select
        ao.*
from
        all_objects     ao,
        (select rownum from dual connect by level <= 16) mult
/

create table t2
as
select  *
from    t1
where   rownum = 0
/

alter system flush buffer_cache;

prompt  =============
prompt  Simple Select
prompt  =============

execute snap_events.start_snap
select * from t1 where object_id = 98765;
execute snap_events.end_snap

prompt  ================
prompt  Insert as select
prompt  ================

execute snap_events.start_snap
insert into t2
select * from t1 where object_id = 98765;
execute snap_events.end_snap

prompt  =====================
prompt  Insert as select with
prompt  _serial_direct=always
prompt  =====================

alter session set "_serial_direct_read"=always;

execute snap_events.start_snap
insert /* serial direct */ into t2
select * from t1 where object_id = 98765;
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;

The calls to the snap_events package are the to produce the change in v$session_event for my session during the SQL.

You’ll notice I’ve included three main SQL statements rather than two – the third statement (2nd execution of the insert) is to demonstrate that it is possible to get direct path reads on the insert by setting the hidden parameter to ‘always’.

One detail to remember when testing this particular feature (and the same guideline applies to some other features), the “direct / not direct” becomes an attribute of the cursor, it’s not an attribute of the execution plan. This is why I’ve added a comment to the 2nd insert; if I hadn’t done so Oracle would have reused the (identical text) cursor from the first insert, which would have resulted in scattered reads being used instead of direct path reads. This distinction between cursor and plan explains why there is not hint that will allow you to force direct path reads for a specific query (not even the infamous opt_param() hint).

Here are the three sets of results from a system running 11.2.0.4:

=============
Simple Select
=============

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               1           0           0.10        .100           4
direct path read                                    114           0          20.86        .183           6
SQL*Net message to client                             4           0           0.00        .000           0
SQL*Net message from client                           4           0           0.11        .028     174,435

================
Insert as select
================

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                              22           0           0.60        .027           4
db file scattered read                              130           0          35.97        .277           5
SQL*Net message to client                             4           0           0.01        .002           0
SQL*Net message from client                           4           0           0.10        .025     174,435

=====================
Insert as select with
_serial_direct=always
=====================

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
direct path read                                    108           0          17.44        .161           6
SQL*Net message to client                             4           0           0.00        .000           0
SQL*Net message from client                           4           0           0.09        .022     174,435


Note the db file scattered read waits in the mddle test. If you re-run the test on 12.1.0.x (or later) you’ll find that the middle set of results will change to direct path read waits.

For reference, this limitation is covered by MOS note13250070.8: Bug 13250070 – Enh: Serial direct reads not working in DML. The actual bug note is not public.

Footnote (a couple of hours later):

A response from Roger MacNicol to my publication tweet has told us that the bug note says only that direct path reads had been restricted unnecessarily and the restriction has been removed.

October 5, 2020

Direct Path

Filed under: Oracle,Performance,Problem Solving,RAC,Troubleshooting,Tuning — Jonathan Lewis @ 11:29 am BST Oct 5,2020

Here’s a note that I might have written once already – but I can’t find it and I’ve just been reminded about what it (might have) said by a posting that came up on the Oracle database forum in the last few days.

The posting in question is asking why, after setting the hidden parameter _serial_direct_read to ‘always’ a particular query is now taking hours to complete when it used to complete in a minute or so.

The answer is partly “because you’ve forced direct path serial reads”, partly “because you’re running on RAC” and (most directly) because the optimizer is using a really bad execution plan for that query and the direct path reads have had a massive impact as a consequence. (It turns out, after modelling, that the answer might also include “because you’re running 11.2.0.4”)

I’m going to demonstrate the issue by forcing a very simple query to take a very bad execution plan.

rem
rem     Script:         ko.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

create table t2
as
select  *
from    all_objects
where   rownum <= 10000  -- > comment to avoid wordpress format issue
;

select  table_name, blocks 
from    user_tables
where   table_name in ('T1','T2')
;

alter system flush buffer_cache;

execute snap_events.start_snap
execute snap_my_stats.start_snap
alter session set "_serial_direct_read"=always;


select
        /*+ 
                leading(t1 t2)
                use_nl(t2)
                full(t2)
        */
        t1.object_type,
        t2.object_type
from
        t1, t2
where
        t2.object_id = t1.object_id + 0.5
;

execute snap_my_stats.end_snap
execute snap_events.end_snap

alter session set "_serial_direct_read"=auto;


My query very carefully ensures that it’s not going to return any rows; but it’s going to do a lot of work finding no data because I’ve forced Oracle into doing a tablescan of t2 for every row in t1 – so 10,000 scans of a table of 140 – 190 (depending on version) data blocks.

The snap_my_stats and snap_events packages are a couple of my simple diagnostic packages that allows me to find the change in some v$ content between the start and end snapshots. In this case it’s v$mystat and v$session_event for the session.

On a test using 11.2.0.4 the query ran for about 41 seconds with the following wait events reported:

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.11        .054           1
direct path read                                  6,527           0         368.30        .056           6
SQL*Net message to client                            13           0           0.00        .000           0
SQL*Net message from client                          13           0      10,689.26     822.251      10,689

The 6,500 “direct path read” waits corresponded to 1.33M “physical reads direct” reported in the session activity stats. Although the t2 table was fairly small Oracle was forced to use direct path reads for every single cycle through the nested loop. As a quick comparison, here are the figures if I don’t force direct path serial scans.

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               2           0           0.09        .047           2
db file scattered read                               34           0           2.45        .072           0
SQL*Net message to client                            11           0           0.01        .001           0
SQL*Net message from client                          11           0         174.36      15.851      82,849



We’ve waited for only 34 “db file scattered reads” and 2.45 centiseconds as we read the t2 (and the t1) tables into the cache for the first time, but then we’ve been able to revisit the blocks in the cache. We also saw a reduction in CPU usage and the total run time dropped from 41 seconds to about 22 seconds.

In this tiny example it hasn’t made a staggering difference to the overall run time, but the OP wasn’t that lucky with his “couple of minutes” to “hours”.

If you look at the fragment of the SQL Monitor report supplied by the OP you’ll see that they have an operation which shows:

=====================================================================================================================================================================================================================================
| Id    |                  Operation                    |            Name              | Rows    | Cost |  Time     |  Start | Execs |    Rows  | Read | Read  | Cell   | Mem | Activity |            Activity Detail               |
|       |                                               |                              | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload |    |    (%)   |              (# samples)                 |
=====================================================================================================================================================================================================================================
| -> 25 |       INDEX STORAGE FAST FULL SCAN            | TMRC_IX1                     |      1  |      |     10745 |     +0 |  268K |     112K | 267K |   2GB |  96.53% | 1M |    99.74 | enq: KO - fast object checkpoint (4783)  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | Cpu (753)                                |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | gcs drm freeze in enter server mode (25) |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | latch free (1)                           |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | reliable message (3505)                  |
|       |                                               |                              |         |      |           |        |       |          |      |       |         |    |          | cell smart index scan (1635)             |
=====================================================================================================================================================================================================================================

The “index fast full scan” is an example of a “segment scan” and will be subject to direct path reads, just like a tablescan. We don’t really know how big this index is – but we can see that we have read it 268K times (Execs column) reading approximately 2GB after 267K read requests. This suggests the index is at most 1MB in size since it can be read in a single read request, and may consist of only one populated block (2,000,000,000/267,000 = 7,490 bytes. Despite this small size the total time sampled for all these scans is more than 10,600 seconds – roughly 39 millisecond per scan. That’s not very good.

Looking more closely at the sampled time we notice 3 key features:

============================================
|            Activity Detail               |
|              (# samples)                 |
============================================
| enq: KO - fast object checkpoint (4783)  |
| Cpu (753)                                |
| gcs drm freeze in enter server mode (25) |
| latch free (1)                           |
| reliable message (3505)                  |
| cell smart index scan (1635)             |
============================================
  • A massive fraction of the time was spent on “enq: KO – fast object checkpoint”
  • A large fraction of the time was spent on “reliable message”
  • There was some time (relatively small, but large in absolute terms) for “gcs drm freeze …”

The last of these three is about global cache services, distributed resource manager” and is a clear indication that the system is running RAC and we have some hefty competition for “object mastering” between instances. But the actual time lost there is relatively small – though associated chatter between instances could be significant.

But what’s the “KO enqueue”? Every time an instance starts a direct path segment scan it has to get a message to the database writer (possibly via the checkpoint process) – hence the “reliable message” waits – to copy every dirty block for that segment from the buffer cache down to disc and it has to wait for the write to complete. This is necessary to ensure that the tablescan doesn’t miss any changes that have been made in memory without yet being written to disc.

The KO enqueue synchronises this activity – I haven’t worked out the complete chain of events, but the enqueue is negotiated between the session and the checkpoint process – and if you’re running RAC every instance has to write any dirty blocks it is holding for the segment, so you have to have a degree of cross-instance chatter to make this happen.

Thanks to the enforced serial direct reads the OP’s plan – which, surely, shouldn’t expect to do 267K index fast full scans – has a massive overhead thanks to the need for the repeated object checkpoints.

You may ask, at this point, why I didn’t see any KO enqueue waits in my test results – the answer is simple, I’d flushed the buffer cache before I started the test, so there were no dirty blocks for the session to worry about. Let’s see what happens if I introduce a little activity to keep dirtying a few blocks in the t2 table. Here’s a little loop that will update a few rows once per second:

begin
        for i in 1..1 loop
                update t2 set data_object_id = 0 where mod(object_id,1000) = i;
                dbms_output.put_line(sql%rowcount);
                commit;
                dbms_lock.sleep(1);
        end loop;
end;
/

You’ll notice the code runs through the loop just once – I started with a loop count of 30, and discovered it wasn’t necessary, but the option remains for further testing.

If I execute this anonymous block (which updates about 10 rows each time through the loop) from another session just after SQL*Plus reports my “alter session”, so that it runs just after the query starts, this is what the session event report looks like if I run the test against 11.2.0.4 (the version reported by the OP):

Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: KO - fast object checkpoint                  7,645           0         290.34        .038           6
db file sequential read                               2           0           0.13        .066           2
direct path read                                 10,714           0         675.50        .063           6
SQL*Net message to client                            14           0           0.00        .000           0
SQL*Net message from client                          14           0       1,556.23     111.160     101,653
events in waitclass Other                         5,607           0         218.04        .039           8


Suddenly we see a lot of time spent on the KO enqueue waits and the “events in waitclass Other” (which turn out to be “reliable message” waits). Apparently the session keeps finding dirty t2 blocks in the cache and telling the database writer they need to be written to disc before the next tablescan of t2 can take place.

There’s something odd here, though and I’ll introduce it with this comment: when I repeated the test on 19.3 (even with with the constant trickle of updates once per second), we only see a tiny number of KO enqueues and reliable message waits – the fact that we see a huge number of them in 11g is a defect in the design of the code.

Think about what’s happening with the KO enqueue: when you start the first tablescan of t2 you force every dirty block for that segment to be copied from the cache to the disc.

As the tablescan proceeds you may have to apply some undo change vectors to the blocks you’re reading to take them back to the SCN as at the start of query execution, but you know that any data that had been committed before the query started is on disc (even if it has been over-written by committed changes made after the query started, or by uncommitted changes made before the query started). What’s on the disc right now will be usable to get the correct read-consistent version of the data for the duration of the query run, no matter how many newer changes are made, whether or not they over-write the disc blocks before the query ends. There is no need to force write any dirty blocks as the tablescan is repeated and, it seems, by 19.3 the code has been adjusted to accomodate that fact.

Footnote

After I had posted this comment on the forum, the OP raised the question of whether or not the fix might apply to 12c as well – so I ran up a VM of 12.1.0.2 and 12.2.0.1 and re-ran the tests. The results were initially promising – neither version reported an extreme number of KO enqueue waits or reliable message waits.

However when I changed the loop counter from 1 back to 30 I found that the waits re-appeared – though the numbers were significantly less than those from 11g – so perhaps there’s a timing element involved that might need further investigation and stress testing even for 19.3.

Footnote 2

Another snapshot I took in testing was from the v$enqueue stats – which showed that (approximately) for every KO enqueue wait my session reported, the instance reported about 10 – 12 KO enqueue requests.

When checking v$enqueue_stats it’s important to remember that session activity stats (v$sesstat) report both “enqueue requests” and “enqueue conversions”. In v$enqueue_stats the conversions aren’t reported separately the view simply adds the two figures together under “requests”.

In the 11g test the session reported 7,645 KO enqueue waits, but the session activity stats reported 19,399 enqueue requests and 38,796 enqueue conversions; while v$enqueue_stats reported 96,990 KO enqueue requests. The remaining 38,796 KO enqueue requests were made by the checkpoint process (CKPT) – and it was only my session that repoted any waits for KO enqueue requests.

Without further low-level investigation this is what leads me to believe that the session sends CKPT a message that it wants an object-level checkpoint performed and waits for the message to be acknowledged (reliable message) before trying to convert a low-level KO enqueue to an exclusive one. But CKPT has acquired and converted the same KO enqueue before acknowledging the message from the session and will only release the enqueue when the checkpoint is complete. So this mechanism ensures that the session will have to wait until the checkpoint is complete and CKPT releases the enqueue before it can continue processing.

(P.S. If you enable event 10704 you will find that there seem to be two KO enqueues involved – one with id2 = 1, one with id2 = 2.)

August 31, 2020

Tracing Errors

Filed under: Oracle,Problem Solving,trace files,Troubleshooting — Jonathan Lewis @ 10:16 am BST Aug 31,2020

This is a little lesson in trouble-shooting. It assumes you have the privilege to generate and edit trace files, and all I’m going to do is show how I worked out the answer to a fairly simple question that appeared recently on the Oracle Developer Community forum in a thread with the title  Cannot drop table after start dropping unused columns checkpoint.

I have a table t1 which is reasonably large (1M rows) with a column v30, and I’ve issued the command.

rem
rem     Script:         drop_column.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2003
rem

alter table t1 set unused column v30;

After this I’ve issued another command, pressed return, and immediately hit ctrl-C – I’ve got a brief window for this interrupt as the command is going to generate roughly 230MB of redo. If you want to try the experiment it might take a couple of attempts to get the timing right.

alter table t1
        drop unused columns
        checkpoint 1000
/
{ctrl-C}

Then I’ve tried to drop the table with the following result:

drop table t1
           *
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-12986: columns in partially dropped state. Submit ALTER TABLE DROP COLUMNS CONTINUE

This emulates the problem that appeared on the forum but the OP didn’t really want to issue the “continue” command because their table was “large” and the drop had been running for 24 hours when it was interrupted. It’s worth noting that four columns had been specified as unused, which means the drop command was probably generating roughly 1KB of redo per row. It’s also worth mentioning that the table was 600 columns wide – so there may have been a few performance side effects due to the multiple (minimum 3) row-pieces per row and row-chaining.

Ignoring any questions about the possible impact of having 600 columns, the key question in this case is:

  • Why isn’t it possible to drop the table (the manuals suggest it should be possible at this point)
  • Is there a way to bypass the problem?

This is a repeatable error – we can try to drop the table as many times as we like and we will get the same error reported in fractions of a second. so an obvious strategy is to enable tracing and see if the trace file can tell us anything about why the error is happening. This is a particularly sensible strategy to follow since error ORA-00604 is telling us that there’s something wrong in the recursive SQL, which means there’s a very good chance that we will actually find an SQL statement in the trace file that is the rescursive statement triggering the error.

So, enable sql_trace (or set event 10046), or do whatever you like to do to enable basic tracing (no need for anything above level 1 just yet); try to execute the drop; then search the trace file for the word ERROR at the start of a line (“^ERROR”). Here’s what I found as the first match in my trace file:

ERROR #139987889121800:err=12986 tim=424276176462

Note that the err= value is the 12986 that was reported as the error under the ORA-00604 error. Sometimes it’s necessary to search backwards in the trace file until you find the matching cursor number (#139987889121800), but in this case it was already visible just a few lines further up the file. So here’s the fragment of the file around that ERROR line:


PARSING IN CURSOR #139987889121800 len=72 dep=1 uid=0 oct=15 lid=0 tim=424276175961 hv=1894278903 ad='75c06e38' sqlid='4wv7gq1sfhtrr'
ALTER TABLE "TEST_USER"."T1" RENAME TO "BIN$rhxBELdQGMXgUwEAAH93eQ==$0"
END OF STMT
PARSE #139987889121800:c=2896,e=3035,p=0,cr=4,cu=0,mis=1,r=0,dep=1,og=4,plh=0,tim=424276175960
EXEC #139987889121800:c=186,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=0,tim=424276176418
ERROR #139987889121800:err=12986 tim=424276176462
CLOSE #139987889121800:c=10,e=9,dep=1,type=0,tim=424276176712
EXEC #139987891478792:c=34686,e=35859,p=0,cr=73,cu=12,mis=0,r=0,dep=0,og=1,plh=0,tim=424276176774
ERROR #139987891478792:err=604 tim=424276176808

The error has come from an SQL statement that is trying to rename my table to some wierd and wonderful name which starts with the characters “BIN$” – that’s a “drop” command trying to move a table into the recycle bin by renaming it – and you’re not allowed to rename a table that is in a partially dropped state. So that’s why we get the error; and the obvious way to bypass it is: “drop table t1 purge;” – which works.

You’ll notice that I’ve include a couple of lines after the first ERROR. This is to show you the line that generated the ORA-00604 (err=604) error. It comes from cursor #139987891478792, and seraching backwards up the file for that cursor number I get to:

PARSING IN CURSOR #139987891478792 len=13 dep=0 uid=107 oct=12 lid=107 tim=424276140765 hv=202649412 ad='7f517dd3fe00' sqlid='d47kdkn618bu4'
drop table t1
END OF STMT

That’s not a suprise, of course, but it is important to cross-check that you haven’t been chasing the wrong error. There are some cases where the Oracle code does something to see if an error will occur but has an exception handler that means the error doesn’t end up reaching the application, so you do need to do a check that the error you found first was the one that floated up to the top of the stack.

Footnote

From Oracle 12.2.0.1 you could arrange to read your own trace file – while your session is connected – through the dynamic performance view v$diag_trace_file_contents.

 

August 24, 2020

Flashback Bug

Filed under: 18c,Bugs,Oracle,redo,Troubleshooting — Jonathan Lewis @ 9:34 am BST Aug 24,2020

Here’s a problem with the “flashback versions” technology that showed up at the end of last week. There’s a thread about it on the Oracle Developer community forum, and a chain of tweets that was my initial response to a twitter alert about it that Daniel Stein posted.

The problem appears somewhere in the 18c timeline – it doesn’t seem to be present in 12.2.0.1 – so if you’re running any versions 18+ here’s a modified version of the test case supplied by Daniel Stein to demonstrate the issue.

rem
rem     Script:         flashback_bug.sql
rem     Author:         Jonathan Lewis / Daniel Stein
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0        Wrong result
rem             12.2.0.1        Correct result
rem

create table t1 (n1 number(4)) 
segment creation immediate
;

prompt  =============================================================
prompt  Sleeping 10 seconds after create table to avoid subsequent
prompt  ORA-01466: unable to read data - table definition has changed
prompt  =============================================================


execute dbms_lock.sleep(10)

prompt  =========
prompt  Start SCN
prompt  =========

column current_scn new_value scn_vorher2 
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

insert into t1(n1) values (1);
insert into t1(n1) values (2);
insert into t1(n1) values (3);
insert into t1(n1) values (4);
insert into t1(n1) values (5);
insert into t1(n1) values (6);
insert into t1(n1) values (7);
insert into t1(n1) values (8);
insert into t1(n1) values (9);
insert into t1(n1) values (10);
insert into t1(n1) values (11);
insert into t1(n1) values (12);

delete from t1 where n1 in (2, 5, 8, 11);

commit;

prompt  =======
prompt  End SCN
prompt  =======

column current_scn new_value scn_nachher
select to_char(current_scn,'9999999999999999') current_scn from V$DATABASE;

prompt  =============
Propmt  Version Query 
prompt  =============

column VERSIONS_STARTSCN        format 9999999999999999 heading "Start SCN"
column VERSIONS_ENDSCN          format 9999999999999999 heading "End SCN"
column VERSIONS_OPERATION       format A9               heading "Operation"
column SCN_RANGE                format A35              heading "SCN Range"

select 
        n1, rowid, 
        versions_operation, versions_startscn, versions_endscn,
        &scn_vorher2||' and '||&scn_nachher scn_range
from
        t1 versions between scn &scn_vorher2 and &scn_nachher
order by 
        rowid, versions_startscn, versions_operation, versions_endscn
;

prompt  ==================
prompt  Current Table data
prompt  ==================

select n1,rowid from t1;


alter system dump redo scn min &scn_vorher2 scn max &scn_nachher;

I’ve created a table then inserted a few rows and deleted some of them in the same transaction. I’ve captured the database SCN at the start and end of the transaction and then tried to run a “versions between” query across that range of SCNs. Here are the results from the “versions between” query and the final query of the current contents of the table on a test on a database running 19.3:

        N1 ROWID              Operation         Start SCN           End SCN SCN Range
---------- ------------------ --------- ----------------- ----------------- -----------------------------------
         1 AAAXFOAATAAAACDAAA I            12670408139684                   12670408139679 and 12670408139687
         3 AAAXFOAATAAAACDAAC I            12670408139684                   12670408139679 and 12670408139687
         4 AAAXFOAATAAAACDAAD I            12670408139684                   12670408139679 and 12670408139687
         5 AAAXFOAATAAAACDAAE I            12670408139684                   12670408139679 and 12670408139687
         6 AAAXFOAATAAAACDAAF I            12670408139684                   12670408139679 and 12670408139687
         7 AAAXFOAATAAAACDAAG I            12670408139684                   12670408139679 and 12670408139687
         8 AAAXFOAATAAAACDAAH I            12670408139684                   12670408139679 and 12670408139687
         9 AAAXFOAATAAAACDAAI I            12670408139684                   12670408139679 and 12670408139687
        10 AAAXFOAATAAAACDAAJ I            12670408139684                   12670408139679 and 12670408139687
        11 AAAXFOAATAAAACDAAK I            12670408139684                   12670408139679 and 12670408139687

10 rows selected.

        N1 ROWID
---------- ------------------
         1 AAAXFOAATAAAACDAAA
         3 AAAXFOAATAAAACDAAC
         4 AAAXFOAATAAAACDAAD
         6 AAAXFOAATAAAACDAAF
         7 AAAXFOAATAAAACDAAG
         9 AAAXFOAATAAAACDAAI
        10 AAAXFOAATAAAACDAAJ
        12 AAAXFOAATAAAACDAAL

8 rows selected.

There’s a trap in this demonstration. If you follow the link to the forum thread you’ll find that Daniel Stein says there’s a problem with Oracle 19.6 and upwards (but not 12.2) – then various other people test on their versions and find that 19.7, 18.5, and 19.3 are broken. That last one is what made me run the test on the 19.3 I had at hand – and I got the right result, unlike the results above.

If you look at the output from this test run you’ll see that the versioning query says that in the SCN range I’ve inserted 10 rows, and that rows 2 and 12 have never been inserted. Given the SQL I’d run I was expecting to see 16 rows in the output, 12 of them saying I’d inserted values 1 through 12, and then 4 saying I’d deleted 2, 5, 8 and 11 – but (when it works correctly) Oracle manages to combine the insert and delete in the same transaction to cancel the entries. So the versioning query ought to be showing exactly the 8 rows that appear in the current data query.

So I had two puzzles:

  • What’s gone wrong for everyone else ?
  • Why hasn’t it gone wrong for me ?

To answer the second question first: I took a look at my instance parameters and found that I had set statistics_level to all. Clearly this ought to have nothing to do with an error in flashback version queries, but I changed it to typical anyway and re-ran the test: and got the (now expected) wrong results. If you read the twitter thread you’ll find that Oracle support had had an SR opened on this problem for a few days but had been unable to reproduce the problem – perhaps they, too, had a system with statistics_level set to all.

Getting back to the first question: what’s gone wrong. I couldn’t think of a good reason why setting the statistics_level should make a difference, so I took the view that the versioning query is running through the redo log to find all the changes that took place in the range, so maybe the answer will appear in the redo. That’s why the last line in the demo above is a command to dump the redo generated between the two SCNs.

Short-circuiting the details of what I looked at I’m just going to show you the layer 11 (table) redo change vectors for my DML, which I extracted from the trace file using the command grep “OP:11” {filename}.

First the set when statistics_level = all:


CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3cb8 SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1c SEQ:11 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:2 OP:11.3 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000

What you can see are 13 op codes “11.2” – which is “insert row piece”. Most of these are for OBJ 94554, which is the object_id for my table, and one of them if for OBJ 720, which is a table owned by sys called exp_head$ (which is something to do with “expression tracking”). After the inserts there are 4 op Ccodes “11.3” which is “delete row piece”. So there’s no obvious error or complication in the redo.

And now when we set statistics_level to typical and re-run the script (which means we get a new OBJ number):

CHANGE #3 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8a SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:2 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:3 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:4 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:5 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:6 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:7 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:8 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:9 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:10 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #3 CON_ID:3 TYP:2 CLS:1 AFN:10 DBA:0x01009c57 OBJ:720 SCN:0x00000b860f2d3d1d SEQ:1 OP:11.2 ENC:0 RBL:0 FLG:0x0000
CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000

In this case we get 13 “insert row piece” Op Codes, and then a single “11.12” which is “delete multiple rows”. So there IS a difference in the redo, and something about that difference is probably behind the error coming and going.

Taking a closer look at the 11.12 and one of the 11.2 redo change vectors, this is what we see:


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94545 SCN:0x00000b860f2d3d8b SEQ:11 OP:11.12 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400a24.0672.3c
KDO Op code: QMD row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 4
slot[0]: 1
slot[1]: 4
slot[2]: 7
slot[3]: 10


CHANGE #2 CON_ID:3 TYP:0 CLS:1 AFN:19 DBA:0x04c00083 OBJ:94544 SCN:0x00000b860f2d3d1d SEQ:3 OP:11.3 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x02  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: C  uba: 0x04400467.0595.2c
KDO Op code: DRP row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x04c00083  hdba: 0x04c00082
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 10(0xa)

The 11.12 Op Code shows us a list of 4 entries in the block’s “row directory” for “table 0”. THe 11.3 Op Code shows us just one – it’s the last of the four deletes so it’s reporting entry 10 in the “row directory”, the previous three Op Code 11.3 vectors were reporting slots 1, 4, and 7 respectively.

Obviously we can’t draw any firm conclusions about what’s going wrong, we simply have a few observations that might give us some ideas of the type of error; and after a few more experiments I decided that the code handling flashback versioning mis-interpreting the “delete multiple row” record. It consistently seemed to delete the first row identified the slot[0] entry and then jumped straight to the last slot but add one to the index entry before attempting to apply the vector.

Footnote

I’ve used dbms_lock.sleep() to pause for 10 seconds after creating the table. The sleep() procedure was copied into the dbms_session package in recent versions of Oracle so that end-user code could access it without having to receive privileges on the rather more dangerous dbms_lock package.

I’ve referenced the statistics_level parameter as the one affecting the generation of the “delete multiple row” redo vector. In fact you can get the same effect by setting the hidden parameter “_rowsource_execution_statistics” to true. Setting statistics_level to all has the effect of enabling rowsource execution statistics so this isn’t surprising.

I pointed out that the flashback versioning code seemed to “add 1” to the final slot[n] identifier before producing the flashback output. Think about that that means if you change the demonstration to delete the LAST row in the list of rows inserted. Here’s what mning query did in a test when I changed 11 to 12 in my delete statement:


ERROR at line 2:
ORA-00600: internal error code, arguments: [ktrvtgr_2], [79691907], [0], [11], [], [], [], [], [], [], [], []

79691907 is the block number of the block where the table rows are stored.
11 is the correct row directory index for the last row (12) in the table – buy my hypothesis is that the flashback code was trying to find (non-existent) directory index 12 because it adds one to the supplied index entry.

Update

In the interval between completing the final draft yesterday and publishing the note this morning, Oracle support replied to the the forum thread with a bug number (28910586 – not yet publicly visible) and a note that a patch could be available on request if an SR were raised giving the Oracle version and platform. It will be interesting to see whether the patch blocks the creation of the 11.12 op codes or whether it corrects the interpretation of 11.12 codes by the flashback versions code.

August 18, 2020

Explain Rewrite

Filed under: Materialized view,Oracle,Troubleshooting — Jonathan Lewis @ 7:07 pm BST Aug 18,2020

This is one of those notes that I thought I’d written years ago. It answers two questions:

  • what can I do with my materialized view?
  • why isn’t the optimizer using my materialized view for query rewrite?

I’ve actually supplied an example of code to address the first question as a throwaway comment in a blog that dealt with a completely different problem, but since the two questions above go together, and the two answers depend on the same package, I’m going to repeat the first answer.

The reason for writing this note now is that the question “why isn’t this query using my materialized view” came up on the Oracle Developer community forum a few days ago – and I couldn’t find the article that I thought I’d written.

Note: a couple of days after I started drafting this note Frank Pachot tweeted the links to a couple of extensive posts on materialized views that included everything I had to say (and more) about “what can my materialized view do”. Fortunately he didn’t get on to second question – so I decided to publish the whole of my note anyway as the two questions go well together.

The key feature is the dbms_mview package, and the two procedures (both overloaded) explain_mview() and explain_rewrite(). To quote the 12.2 “PL/SQL Supplied Packages” manual page, the first procedure:

“explains what is possible with a materialized view or potential [ed: my emphasis] materialized view”

note particularly that the materialized view doesn’t need to have been created before you run the package – the second procedure:

“explains why a query failed to rewrite or why the optimizer chose to rewrite a query with a particular materialized view or materialized views”.

Here’s the relevant extract from the SQL*Plus describe of the package –

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             VARCHAR2                IN
 STMT_ID                        VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             CLOB                    IN
 STMT_ID                        VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             VARCHAR2                IN
 MSG_ARRAY                      EXPLAINMVARRAYTYPE      IN/OUT

PROCEDURE EXPLAIN_MVIEW
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 MV                             CLOB                    IN
 MSG_ARRAY                      EXPLAINMVARRAYTYPE      IN/OUT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          VARCHAR2                IN
 MV                             VARCHAR2                IN     DEFAULT
 STATEMENT_ID                   VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          CLOB                    IN
 MV                             VARCHAR2                IN     DEFAULT
 STATEMENT_ID                   VARCHAR2                IN     DEFAULT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          VARCHAR2                IN
 MV                             VARCHAR2                IN     DEFAULT
 MSG_ARRAY                      REWRITEARRAYTYPE        IN/OUT

PROCEDURE EXPLAIN_REWRITE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 QUERY                          CLOB                    IN
 MV                             VARCHAR2                IN     DEFAULT
 MSG_ARRAY                      REWRITEARRAYTYPE        IN/OUT

As you can see there are 4 overloaded versions of explain_mview() and 4 of explain_rewrite(): both procedures take an input of an SQL statement (parameter mv for explain_mv(), parameter query for explain_rewrite()) – which can be either a varchar2() or a CLOB, and both procedures supply an output which can be written to a table or written to an “in/out” pl/sql array.

Two possible input options times two possible output options gives 4 overloaded versions of each procedure. In this note I’ll restrict myself to the first version of the two procedures – varchar2() input, writing to a pre-created table.

Here’s a simple demo script. Before we do anything else we need to call a couple of scripts in the $ORACLE_HOME/rdbms/admin directory to create the target tables. (If you want to do something a little clever you could tweak the scripts to create them as global temporary tables in the sys schema with a public synonym – just like the plan_table used in calls to explain plan.)

rem
rem     Script:         c_explain_mv.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem

@$ORACLE_HOME/rdbms/admin/utlxmv.sql
@$ORACLE_HOME/rdbms/admin/utlxrw.sql

-- @$ORACLE_HOME/sqlplus/demo/demobld.sql

create materialized view dept_cost
refresh complete on demand
enable query rewrite
as
select 
        d.deptno,sum(e.sal) 
from 
        emp e,dept d
where 
        e.deptno = d.deptno
group by 
        d.deptno
;
 
set autotrace traceonly explain
 
select 
        d.deptno,sum(e.sal) 
from 
        emp e,dept d
where 
        e.deptno = d.deptno
and     d.deptno=10
group by 
        d.deptno
;

set autotrace off

/*

Execution Plan
----------------------------------------------------------
Plan hash value: 3262931184

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |     1 |     7 |     2   (0)| 00:00:01 |
|*  1 |  MAT_VIEW REWRITE ACCESS FULL| DEPT_COST |     1 |     7 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("DEPT_COST"."DEPTNO"=10)

*/

My original script is rather old and in any recent version of Oracle the Scott tables are no longer to be found in demobld.sql, so I’ve added them at the end of this note.

After creating the emp and dept tables I’ve created a materialized view and then enabled autotrace to see if a suitable query will use the materialized view – and as you can see from the execution plan the materialized view can be used.

So let’s do a quick analysis of the view and the rewrite:


column audsid new_value m_audsid
select sys_Context('userenv','sessionid') audsid from dual;

begin
        dbms_mview.explain_mview(
--              mv      => 'DEPT_COST',
                mv      => q'{
                        create materialized view dept_cost
                        refresh complete on demand
                        enable query rewrite
                        as
                        select 
                                d.deptno,sum(e.sal) 
                        from 
                                emp e,dept d
                        where 
                                e.deptno = d.deptno
                        group by 
                                d.deptno
                }',
                stmt_id         => '&m_audsid'
        );
end;
/
 
set linesize 180

column cap_class noprint
column capability_name format a48
column related_text format a15
column short_msg format a90

break on cap_class skip 1
 
select
        substr(capability_name,1,3) cap_class,
        capability_name, possible, related_text, msgno, substr(msgtxt,1,88) short_msg
from
        mv_capabilities_table
where
        mvname       = 'DEPT_COST'
and     statement_id = '&m_audsid'
order by
        substr(capability_name,1,3), related_num, seq
;


I’ve captured the session’s audsid because the mv_capabilities_table table and the rewrite_table table both have a statement_id column and the audsid is a convenient value to use to identify the most recent data you’ve created if you’re sharing the table. Then I’ve called dbms_mview.explain_mview() indicating two possible strategies (with one commented out, of course).

I could pass in a materialized view name as the mv parameter, or I could pass in the text of a statement to create a materialized view (whether or not I have previously created it). As you can see, I’ve also used a substitution variable to pass in my audsid as the statement id.

After setting up a few SQL*Plus format options I’ve then queried some of the columns from the mv_capabilitie_table table, with the following result:


CAPABILITY_NAME                                  P RELATED_TEXT    MSGNO SHORT_MSG
------------------------------------------------ - --------------- ----------------------------------------
PCT_TABLE                                        N EMP             2068 relation is not a partitioned table
PCT_TABLE_REWRITE                                N EMP             2068 relation is not a partitioned table
PCT_TABLE                                        N DEPT            2068 relation is not a partitioned table
PCT_TABLE_REWRITE                                N DEPT            2068 relation is not a partitioned table
PCT                                              N

REFRESH_FAST_AFTER_ONETAB_DML                    N SUM(E.SAL)      2143 SUM(expr) without COUNT(expr)
REFRESH_COMPLETE                                 Y
REFRESH_FAST                                     N
REFRESH_FAST_AFTER_INSERT                        N TEST_USER.EMP   2162 the detail table does not have a materialized view log
REFRESH_FAST_AFTER_INSERT                        N TEST_USER.DEPT  2162 the detail table does not have a materialized view log
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2146 see the reason why REFRESH_FAST_AFTER_INSERT is disabled
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2142 COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ONETAB_DML                    N                 2143 SUM(expr) without COUNT(expr)
REFRESH_FAST_AFTER_ANY_DML                       N                 2161 see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT                                 N                 2157 PCT is not possible on any of the detail tables in the materialized view

REWRITE                                          Y
REWRITE_FULL_TEXT_MATCH                          Y
REWRITE_PARTIAL_TEXT_MATCH                       Y
REWRITE_GENERAL                                  Y
REWRITE_PCT                                      N                 2158 general rewrite is not possible or PCT is not possible on any of the detail tables


20 rows selected.

If you want the complete list of possible messages, the msgno is interpreted from $ORACLE_HOME/rdbms/mesg/qsmus.msg (which is shared with the explain_rewrite() procedure. Currently (19.3) it holds a slightly surprising 796 messages. A call to “oerr qsm nnnnn” will translate the number to the message text.

In a similar vein we can call dbms_mview.explain_rewrite(). Before we do so I’m going to do something that will stop the rewrite from taking place, then call the procedure:

update emp set ename = 'JAMESON' where ename = 'JAMES';
commit;

begin
        dbms_mview.explain_rewrite ( 
                query => q'{
                        select 
                                d.deptno,sum(e.sal) 
                        from 
                                emp e,dept d
                        where 
                                e.deptno = d.deptno
                        and     d.deptno=10
                        group by 
                                d.deptno
                        }',
                mv => null,
                statement_id => '&m_audsid'
        );
end;
/

column message format a110

select
        sequence, message 
from 
        rewrite_table
where
        statement_id = '&m_audsid'
order by
        sequence
/

You’ll notice that there’s an input parameter of mv – there may be cases where the optimizer has a choice of which materialized view to use to rewrite a query, you could supply the name of a materialized view for this parameter to find out why Oracle didn’t choose the materialized view you were expecting. (In this case mv has to supply a materialized view name, not the text to create a materialized view)

In my example I get the following results:

  SEQUENCE MESSAGE
---------- --------------------------------------------------------------------------------------------------------------
         1 QSM-01150: query did not rewrite
         2 QSM-01106: materialized view, DEPT_COST, is stale with respect to some partition(s) in the base table(s)
         3 QSM-01029: materialized view, DEPT_COST, is stale in ENFORCED integrity mode

3 rows selected.

The 2nd and 3rd messages are a bit of a clue – so let’s change the session’s query_rewrite_integrity parameter to stale_tolerated and re-execute the procedure; which gets us to:


  SEQUENCE MESSAGE
---------- --------------------------------------------------------------------------------------------------------------
         1 QSM-01151: query was rewritten
         2 QSM-01033: query rewritten with materialized view, DEPT_COST

Footnote:

You’ll notice that I’ve quote-escaping in my inputs for the materialized view definition and query. This is convenience that let’s me easily cut and paste a statement into the scripts – or even use the @@script_name mechanism – writing a query (without a trailing slash, semi-colon, or any blank lines) in a separate file and then citing the script name between the opening and closing quote lines, e.g.

begin
        dbms_mview.explain_rewrite ( 
                query        => q'{
@@test_script
                                }',
                mv           => null,
                statement_id => '&m_audsid'
        );
end;
/

It’s worth noting that the calls to dbms_mview.explain_mv() and dbms_mview.explain_rewrite() don’t commit the rows they write to their target tables, so you ought to include a rollback; at the end of your script to avoid any confusion as you test multiple views and queries.

Footnote:

Here’s the text of the demobld.sql script as it was when I copied it in the dim and distant past. I’m not sure which version it came from – but I don’t think it’s the original v4/v5 release which I’m fairly sure had only the emp and dept tables. (For reference, and hash joins, there used to be a MOS note explaining that EMP was the big table and DEPT was the small table ;)


CREATE TABLE EMP (
        EMPNO           NUMBER(4) NOT NULL,
        ENAME           VARCHAR2(10),
        JOB             VARCHAR2(9),
        MGR             NUMBER(4),
        HIREDATE        DATE,
        SAL             NUMBER(7, 2),
        COMM            NUMBER(7, 2),
        DEPTNO          NUMBER(2)
);

insert into emp values
        (7369, 'SMITH',  'CLERK',     7902,
        to_date('17-DEC-1980', 'DD-MON-YYYY'),  800, NULL, 20);

insert into emp values
        (7499, 'ALLEN',  'SALESMAN',  7698,
        to_date('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);

insert into emp values
        (7521, 'WARD',   'SALESMAN',  7698,
        to_date('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);

insert into emp values
        (7566, 'JONES',  'MANAGER',   7839,
        to_date('2-APR-1981', 'DD-MON-YYYY'),  2975, NULL, 20);

insert into emp values
        (7654, 'MARTIN', 'SALESMAN',  7698,
        to_date('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);

insert into emp values
        (7698, 'BLAKE',  'MANAGER',   7839,
        to_date('1-MAY-1981', 'DD-MON-YYYY'),  2850, NULL, 30);

insert into emp values
        (7782, 'CLARK',  'MANAGER',   7839,
        to_date('9-JUN-1981', 'DD-MON-YYYY'),  2450, NULL, 10);

insert into emp values
        (7788, 'SCOTT',  'ANALYST',   7566,
        to_date('09-DEC-1982', 'DD-MON-YYYY'), 3000, NULL, 20);

insert into emp values
        (7839, 'KING',   'PRESIDENT', NULL,
        to_date('17-NOV-1981', 'DD-MON-YYYY'), 5000, NULL, 10);

insert into emp values
        (7844, 'TURNER', 'SALESMAN',  7698,
        to_date('8-SEP-1981', 'DD-MON-YYYY'),  1500,    0, 30);

insert into emp values
        (7876, 'ADAMS',  'CLERK',     7788,
        to_date('12-JAN-1983', 'DD-MON-YYYY'), 1100, NULL, 20);

insert into emp values
        (7900, 'JAMES',  'CLERK',     7698,
        to_date('3-DEC-1981', 'DD-MON-YYYY'),   950, NULL, 30);

insert into emp values
        (7902, 'FORD',   'ANALYST',   7566,
        to_date('3-DEC-1981', 'DD-MON-YYYY'),  3000, NULL, 20);

insert into emp values
        (7934, 'MILLER', 'CLERK',     7782,
        to_date('23-JAN-1982', 'DD-MON-YYYY'), 1300, NULL, 10);

CREATE TABLE DEPT(
        DEPTNO  NUMBER(2),
        DNAME   VARCHAR2(14),
        LOC     VARCHAR2(13) 
);

insert into dept values (10, 'ACCOUNTING', 'NEW YORK');
insert into dept values (20, 'RESEARCH',   'DALLAS');
insert into dept values (30, 'SALES',      'CHICAGO');
insert into dept values (40, 'OPERATIONS', 'BOSTON');

CREATE TABLE BONUS
        (
        ENAME VARCHAR2(10)      ,
        JOB VARCHAR2(9)  ,
        SAL NUMBER,
        COMM NUMBER
);

CREATE TABLE SALGRADE
      ( GRADE NUMBER,
        LOSAL NUMBER,
        HISAL NUMBER 
);

INSERT INTO SALGRADE VALUES (1,700,1200);
INSERT INTO SALGRADE VALUES (2,1201,1400);
INSERT INTO SALGRADE VALUES (3,1401,2000);
INSERT INTO SALGRADE VALUES (4,2001,3000);
INSERT INTO SALGRADE VALUES (5,3001,9999);
COMMIT;

 

August 6, 2020

Case and Aggregate bug

Filed under: 12c,Bugs,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 12:43 pm BST Aug 6,2020

The following description of a bug appeared on the Oracle Developer Community forum a little while ago – on an upgrade from 12c to 19c a query starting producing the wrong results on a simple call to the average() function. In fact it turned out to be a bug introduced in 12.2.0.1.

The owner of the thread posted a couple of zip files to build a test case – but I had to do a couple of edits, and change the nls_numeric_characters to ‘,.’ in order to get past a formatting error on a call to the to_timestamp() function. I’ve stripped the example to a minimum, and translated column name from German (which was presumably the source of the nls_numeric_characters issue) to make it easier to demonstrate and play with the bug.

First the basic data – you’ll notice that I’ve tested this on 12.1.0.2, 12.2.0.1 and 19.3.0.0 to find out when the bug appeared:

rem
rem     Script:         case_aggregate_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem     Purpose:        
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create table test(
        case_col        varchar2(11), 
        duration        number(*,0), 
        quarter         varchar2(6), 
        q2h_knum_b      varchar2(10)
   )
/

insert into test values('OK',22,'1.2020','AB1234');
insert into test values('OK',39,'1.2020','AB1234');
insert into test values('OK',30,'1.2020','AB1234');
insert into test values('OK',48,'1.2020','AB1234');
commit;

execute dbms_stats.gather_table_stats(user,'test')

create or replace force view v_test
as 
select 
        q2h_knum_b,
        case 
                when b.case_col not like 'err%'
                        then b.duration 
        end     duration,
        case 
                when b.case_col not like 'err%' 
                        then 1 
                        else 0 
        end     status_ok
from
        test b
where
        substr(b.quarter, -4) = 2020
;


break on report
compute avg of duration on report
select * from v_test;

---------------------------------------------

Q2H_KNUM_B   DURATION  STATUS_OK
---------- ---------- ----------
AB1234             22          1
AB1234             39          1
AB1234             30          1
AB1234             48          1
           ----------
avg             34.75


I’ve created a table, loaded some data, gathered stats, then created a view over the table. The view includes a couple of columns that use a simple case expression, and both expressions are based in the same way on the same base column (this may, or may not, be significant in what’s coming). I’ve then run off a simple query with a couple of SQL*Plus commands to report the actual content of the view with the average of the duration column – which is 34.75.

So now we run a couple of queries against the view which aggregate the data down to a single row – including the avg() of the duration – using the coalesce() function – rather than the older nvl() function – to convert any nulls to zero.


select
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg,
        coalesce(sum(status_ok), 0)     ok_count
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

DURATION_COUNT DURATION_MED DURATION_AVG   OK_COUNT
-------------- ------------ ------------ ----------
             4         34.5            0          4

You’ll notice that the duration_avg is reported as zero (this would be the same if I used nvl(), and would be a null if I omitted the coalesce(). This is clearly incorrect. This was the output from 19.3; 12.2 gives the same result, 12.1.0.2 reports the average correctly as 34.75.

There are several way in which you can modify this query to get the right average – here’s one, just put the ok_count column first in the select list:


select
        coalesce(sum(status_ok), 0)     ok_count,
        coalesce(count(duration), 0)    duration_count,
        coalesce(median(duration), 0)   duration_med,
        coalesce(avg(duration), 0)      duration_avg
from
        v_test  v1
where
        instr('AB1234', q2h_knum_b) > 0
/

---------------------------------

  OK_COUNT DURATION_COUNT DURATION_MED DURATION_AVG
---------- -------------- ------------ ------------
         4              4         34.5        34.75


There’s no obvious reason why the error should occur, but there’s a little hint about what may be happening in the Column projection information from the execution plan. The basic plan is the same in both cases, so I’m only show it once; but it’s followed by two versions of the projection information (restricted to operation 1) which I’ve formatted to improve:

Plan hash value: 2603667166

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |     2 (100)|          |
|   1 |  SORT GROUP BY     |      |     1 |    20 |            |          |
|*  2 |   TABLE ACCESS FULL| TEST |     1 |    20 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((INSTR('AB1234',"Q2H_KNUM_B")>0 AND
              TO_NUMBER(SUBSTR("B"."QUARTER",(-4)))=2020))

Column Projection Information (Operation 1 only):  (Wrong result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22]



Column Projection Information (Operation 1 only):  (Right result)
-----------------------------------------------------------------
PERCENTILE_CONT(.5) WITHIN GROUP ( ORDER BY CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22],
COUNT(CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN "B"."DURATION" END)[22], 
SUM  (CASE  WHEN "B"."CASE_COL" NOT LIKE 'err%' THEN 1 ELSE 0 END)[22]

As you can see, to report avg() Oracle has projected sum() and count().

When we get the right result the sum() for duration appears immediately after the count().

When we get the wrong result the sum() for ok_count comes between the count() and sum() for duration.

This makes me wonder whether Oracle is somehow just losing track of the sum() for duration and therefore dividing null by the count().

This is purely conjecture, of course, and may simply be a coincidence – particularly since 12.1.0.2 gets the right result and shows exactly the same projection information.

Readers are left to experiment with other variations to see if they can spot other suggestive patterns.

Update (Aug 2020)

This is now logged as Bug 31732779 – WRONG RESULT WITH CASE STATEMENT AGGREGATION , though it’s not yet publicly visible.

July 9, 2020

Execution Plans

Filed under: Execution plans,extended stats,Histograms,Oracle,Performance,Problem Solving,Statistics,Troubleshooting — Jonathan Lewis @ 4:54 pm BST Jul 9,2020

This is an example from the Oracle Developer Community of using the output of SQL Monitor to detect a problem with object statistics that resulted in an extremely poor choice of execution plan.

A short time after posting the original statement of the problem the OP identified where he thought the problem was and the general principle of why he thought he had a problem – so I didn’t have to read the entire execution plan to work out a strategy that would be (at least) a step in the right direction of solving the performance problem.

This note, then, is just a summary of the five minute that I spent confirming the OP’s hypothesis and explaining how to work around the problem he had identified. It does, however, give a little lead-in to the comments I made to the OP in order to give a more rounded picture of what his execution plan wass telling us.

So here’s the top half of the plan (which is the first subquery of a “concatenation”) with the first few predicates:

===============================================================================================================================================================================================================================  
| Id |                 Operation                  |            Name            |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Cell   |  Mem  | Activity |             Activity Detail             |  
|    |                                            |                            | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | Offload | (Max) |   (%)    |               (# samples)               |  
===============================================================================================================================================================================================================================  
|  0 | SELECT STATEMENT                           |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  1 |   CONCATENATION                            |                            |         |      |     12086 |     +5 |     1 |     864K |      |       |         |       |          |                                         |  
|  2 |    FILTER                                  |                            |         |      |     12191 |     +4 |     1 |     864K |      |       |         |       |     0.03 | Cpu (4)                                 |  
|  3 |     FILTER                                 |                            |         |      |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.01 | Cpu (1)                                 |  
|  4 |      NESTED LOOPS                          |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.02 | Cpu (3)                                 |  
|  5 |       NESTED LOOPS                         |                            |     241 | 251K |     12191 |     +4 |     1 |      26M |      |       |         |       |     0.07 | Cpu (8)                                 |  
|  6 |        NESTED LOOPS                        |                            |     241 | 251K |     12232 |     +4 |     1 |      26M |      |       |         |       |     0.05 | Cpu (6)                                 |  
|  7 |         NESTED LOOPS                       |                            |    5407 | 233K |     12242 |     +4 |     1 |      86M |      |       |         |       |          |                                         |  
|  8 |          MERGE JOIN CARTESIAN              |                            |       1 |   35 |     12242 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
|  9 |           TABLE ACCESS BY INDEX ROWID      | REF1                       |       1 |    3 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 10 |            INDEX RANGE SCAN                | REF1_PK                    |       1 |    2 |     12242 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 11 |           BUFFER SORT                      |                            |      84 |   32 |     12242 |     +4 |     1 |     1000 |      |       |         |  104K |          |                                         |  
| 12 |            TABLE ACCESS BY INDEX ROWID     | STAGE                      |      84 |   32 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 13 |             INDEX RANGE SCAN               | STAGE_IDX1                 |      84 |    4 |         1 |     +4 |     1 |     1000 |      |       |         |       |          |                                         |  
| 14 |          PARTITION RANGE ITERATOR          |                            |    8292 | 232K |     12232 |     +4 |  1000 |      86M |      |       |         |       |          |                                         |  
| 15 |           TABLE ACCESS STORAGE FULL        | TAB1                       |    8292 | 232K |     12245 |     +1 |  1000 |      86M | 103M | 521GB |   1.96% |    7M |    51.81 | gc buffer busy acquire (1)              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: cache buffers chains (1)         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (1196)                              |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (2) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | reliable message (5)                    |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (2827)  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell smart table scan (1977)            |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | read by other session (304)             |  
| 16 |         PARTITION RANGE ITERATOR           |                            |       1 |   12 |     12191 |     +4 |   86M |      26M |      |       |         |       |     0.42 | Cpu (51)                                |  
| 17 |          TABLE ACCESS BY LOCAL INDEX ROWID | TAB2                       |       1 |   12 |     12191 |     +4 |   86M |      26M |   4M |  28GB |         |       |    32.14 | gc cr grant 2-way (20)                  |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc cr request (2)                       |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gc remaster (6)                         |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | Cpu (319)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | gcs drm freeze in enter server mode (4) |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | latch: gc element (2)                   |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (3563)  |  
| 18 |           INDEX RANGE SCAN                 | TAB2_IX1                   |     166 |    3 |     12210 |     +2 |   86M |      26M |   1M |  11GB |         |       |    15.17 | Cpu (292)                               |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1557)  |  
| 19 |        INDEX UNIQUE SCAN                   | MTD_PK                     |       1 |    1 |     12242 |     +4 |   26M |      26M |  292 |   2MB |         |       |     0.17 | Cpu (20)                                |  
|    |                                            |                            |         |      |           |        |       |          |      |       |         |       |          | cell single block physical read (1)     |  
| 20 |       TABLE ACCESS BY INDEX ROWID          | REF2                       |       1 |    2 |     12191 |     +4 |   26M |      26M |    7 | 57344 |         |       |     0.11 | Cpu (13)                                |  
| 21 |      TABLE ACCESS BY INDEX ROWID           | CONTROLTAB                 |       1 |    1 |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 22 |       INDEX UNIQUE SCAN                    | CONTROLTAB_PK              |       1 |      |         1 |     +4 |     1 |        1 |      |       |         |       |          |                                         |  
| 23 |     MINUS                                  |                            |         |      |       102 |     +4 |    25 |        3 |      |       |         |       |          |                                         |  
| 24 |      TABLE ACCESS BY INDEX ROWID           | CUST_ORG_PK                |       1 |    3 |       942 |     +4 |    25 |       10 |      |       |         |       |          |                                         |  
| 25 |       INDEX UNIQUE SCAN                    | MC_PK                      |       1 |    2 |       942 |     +4 |    25 |       25 |      |       |         |       |          |                                         |  
| 26 |      SORT UNIQUE NOSORT                    |                            |       1 |    4 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 27 |       TABLE ACCESS BY INDEX ROWID          | REF1                       |       1 |    3 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  
| 28 |        INDEX RANGE SCAN                    | REF1_PK                    |       1 |    2 |         8 |     +4 |    25 |        9 |      |       |         |       |          |                                         |  


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter( EXISTS ( (SELECT /*+ INDEX_RS_ASC ("CUST_ORG_PK" "MC_PK") */ "CUST_ID" FROM "USER1"."CUST_ORG_PK"  "CUST_ORG_PK" 
               WHERE "CUST_ID"=:B1 AND "CUST_ORG_PK"."INDICATR"='Y') MINUS (SELECT /*+ INDEX_RS_ASC ("REF1" "REF1_PK") */ 
               TO_NUMBER("VAL") FROM "USER1"."REF1" "REF1" WHERE "PUSER"='ZZZ' AND "EDATE" .ge. TRUNC(SYSDATE@!) AND TO_NUMBER("VAL")=:B2  
               AND "SDATE" .le. TRUNC(SYSDATE@!))))  
   3 - filter( EXISTS (SELECT /*+ INDEX_RS_ASC ("CONTROLTAB" "CONTROLTAB_PK") */ 0 FROM  "USER2"."CONTROLTAB" "CONTROLTAB" WHERE
              "CONTROLTAB"."CNTRLID"=9999 AND  NVL("CONTROLTAB"."STATUS",'F')='S'))  
   9 - filter("REF1"."EDATE" .ge. TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE" .le. TRUNC(SYSDATE@!))  
  13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')  

Note: various inequality symbols changed to .le. / .ge. to avoid WordPress format issue.

The first thing to note is that the “Time (active)” shown at the top line is about 12,000 seconds – so it’s a long running query. Your next observation – before you look at the shape of the plan – might be to note that operations 15, 17 and 18 between them record thousands of seconds of time, mostly I/O time but including 1,200 seconds of CPU time. This might draw your eye to the part of the plan that tells you what you are doing at these operations and why you are doing it.

Looking at the detail – operation 15 is a full tablescan that is the child of a partition range iterator (operation 14), and that iterator is the second child of a nested loop join (operation 7). Why is the optimizer so out of balance that it thinks a table scan of (possibly) multiple partitions of a partitioned table is a good candidate for the second child of a nested loop?! The answer comes from the first child – the  Merge Join Cartesian at operation 8 has been given a cardinality estimate of just one row. When the cardinality estimate is one for the first table in a join then it doesn’t matter whether Oracle uses a nested loop join or a hash join, whatever happens next is only supposed to happen once.

Unfortunately when we compare “Rows (Estim)” and “Rows (Actual)” for the operation we see that the Merge Join Cartesian produced 1,000 rows – so the partition tablescan was going to happen 1,000 times (which we can confirm from the “Execs” column of operation 14). As a first step, then, we need to ensure that the optimizer gets a better estimate of the number of rows produced by the Cartesian join. So lets look at its child operations.

  • Operation 9 (Table Access of REF1) is predicted to return one row – with “Rows (actual)” = 1.
  • Operation 11 (Buffer Sort of data from STAGE1) is predicted to return 84 rows – with “Rows (actual)” = 1,000

Since the number of rows produced by a Cartesian join should be the product of the number of rows of the two inputs this tells us that the optimizer’s estimate of the number of rows from REF1 has been rounded up to 1 from a very small fraction (less than 1/84). If we can correct this then we may get Oracle to change the awful nested loop to an acceptable hash join. Wven if we can’t fix this mis-estimate we may be able to do something that improves the estimate for STAGE1 to something sufficienlty large that it will trigger the switch to a hash join. So let’s look at the predicates for these two tables.

REF1 predicates

   9 - filter("REF1"."EDATE">=TRUNC(SYSDATE@!))  
  10 - access("REF1"."PUSER"='XXX' AND  "REF1"."NAME"='CODE' AND "REF1"."SDATE"<=TRUNC(SYSDATE@!))  

The index range scan is based on an access predicate (with no filter predicate), so it looks like there’s a nearly perfect three-column index on REF1, but the optimizer can’t use the number of distinct keys in the index to get a good estimate of cardinality because one of the predicates is range-based. So the arithmetic will look at the three predicates separately and multiply up their selectivities. (It’s possible, of course, that this might be the first three columns of a 4, or more, column index.)

It’s a reasonable guess that the number of distinct combinations of (puser, name) will be much smaller than num_distinct(puser) * num_distinct(name) – so one strategy that might help increase the table’s cardinality estimate is to create extended statistics on the column group (puser, name).

Another reasonable guess is that the number of distinct values for the two columns is (relatively) small, with some skew to the distribution (name = ‘CODE’ looks particularly susceptible to being a commonly occurring value) – so perhaps we need a histogram on one or both of the columns (which would then require a histogram to be created on the column group as well if we wanted the optimizer to use the column group). We’d also have to make sure that the queried values didn’t fall outside the known low/high values for the columns if we wanted the column group to be used.

STAGE1 Predicates

13 - access("STAGE"."NAME"='XXX' AND "STAGE"."STAT"='I')

This is the access(-only) predicate for the index stage_idx1, and there are no filter predicates when we reach the table. So stage_idx1 might be a two-column index on the table that we are using completely, or it may be an index with more columns that we are using only incompletely. We can see that the cardinality estimate is out by a factor of 12 (84 predicted, 1,000 actual) so if this is the complete index (which would allow Oracle to use the distinct_keys value to estimate cardinality) there must be an uneven data distribution in the values; but if this is just the first two columns of a longer index then we might benefit from extended stats (viz: another column group) on this pair of columns.

Again, even if we create a column group, or take automatic advantage of the distinct_keys figure, the predicate STAT=’I’ (is that state, status?) looks a little dangerous – status columns tend to have a small number of distinct values with a signficant skew to the distribution of values – so we may want to create a histogram on the STAT column, which would then require us to create a histogram on the column group if we also wanted the column group stats to have an effect.

What Happened Next?

I made the suggestions about column groups and histogram to the OP – without the level of detail that I’ve given in the explanations above – and got the following response:

You are spot on.

There does exists frequency histogram on column NAME(having 14 distinct value) and STAT(having 7 distinct values) of table STAGE. Also there already exists a frequency histogram on column PUSER and height balanced histogram on column NAME of table REF1. But still they were not helping the existing query.

By creating a column group on both on ref1 (puser, name) and stage(name, stat) with histogram for both i.e. ‘FOR COLUMNS SIZE 254’. The merge Cartesian removed automatically from the path by optimizer and its doing a hash join now with TAB1 finishing the query in quick time.

Summary

When the cardinality (rows) estimate for an operation drops to one (which might mean much less than one and rounded up) then all hell can break loose and the choice of parent operation – and its cardinality estimate – might be insanely stupid, leading to a catastrophically poor execution plan.

Very low cardinality estimates are often the result of multiplying individual column selectivities to produce an estimated selectivity that is unrealistic (much too small) when compared with the actual number of distinct combinations that exist in the table. In such cases creating a column group, (possibly with a histogram) may be all you need to do to get a better cardinality estimate and a vastly improved execution plan.

 

June 5, 2020

Analytic cost error

Filed under: CBO,Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:06 pm BST Jun 5,2020

Here’s a surprising costing error that was raised on the Oracle Developer Forum a few days ago. There’s a glitch in the cost atributed to sorting when an analytic over() clause – with corresponding “window sort” operation – makes a “sort order by” operation redundant. Here’s a script to generate the data set I’ll use for a demonstration with a template for a few queries I’ll be running against the data.


rem
rem     Script:         window_sort_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                    id,
        cast(lpad(rownum,30,'0') as varchar2(30)) vc30,
        cast(lpad(rownum,65,'0') as varchar2(65)) vc65,
        lpad('x',100,'x')                         padding
from
        generator
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

select
--      row_number() over (order by vc65) rn1,
--      row_number() over (order by vc30) rn2,
--      vc30,
--      vc65,
        id
from
        t1
-- order by
--      vc65
--      vc30
/


I’m (optionally) using the row_number() analytic function over the entire data set and for each row_number() I include in the select list Oracle will have to sort the data; I’ve also got an (optional) order by on the two columns that appear in the row_number() functions and that may introduce some sorting as well. Here, for example, are a few examples of the queries I might run:


prompt  ===========================================
prompt  Select vc30, order by vc30
prompt  ===========================================

explain plan for
select 
        vc30,
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ==========================================
prompt  Select row_number over vc30 - no ordering
prompt  ==========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ===========================================
prompt  Select row_number over vc30 - order by vc65
prompt  ===========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

In the first query we select and sort vc30 so (approximately) we sort 10,000 rows x 30(-ish) bytes for 300K of sort space. In the second query we generate the row_number() based on sorting vc30 – the size of the output is much smaller (it’s only 10,000 numbers between 1 and 10,000) but to generate those numbers we do have to select and sort vc30, so the workload (predicted and actual) will probably be similar to that of the firsrt query. In the final query we have to select and sort vc30 to generate the row_number() but we also have to select (without reporting) and sort vc65 in order to report the results in the right order – so we should expect the workload to be roughly 3 times the size (approximately 10,000 * (30 + 65) bytes). Here, from 12.2.0.1, are the execution plans (with a little cosmetic tidying):

===========================================
Select vc30, order by vc30
===========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  SORT ORDER BY     |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

==========================================
Select row_number over vc30 - no ordering
==========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

===========================================
Select row_number over vc30 - order by vc65
===========================================
-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  SORT ORDER BY      |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

As expected, the execution plans are roughly consistent with the estimates I gave for volume of data – the agreement between the query with order by vc30 and the query with over(order by vc30), and the increased load of ordering by vc65 when selecting the row_number(over vc30) is good.

So let’s see what the plan looks like when we select row_number(over vc30) and then order by vc30. If the optimizer is smart it will recognise that it’s possible to adopt a strategy that allows it to take advantage of the sorting from the over() clause to avoid a separate sort order by:


explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|    42   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|    42   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|    42   (5)|
----------------------------------------------------------------

The plan shows us that Oracle has used the 10gR2 “order by elimination” feature to bypass the need for a “sort order by” operation because it knows the data will be arriving in the right order from the “Window Sort” operation. Ynfortunately it also shows us that Oracle has lost the cost of doing the Window Sort!

Let’s try another experiment – let’s generate two different row_number() columns, with and without ordering:


prompt  =====================================
prompt  Select Both row_numbers - no ordering
prompt  =====================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ======================================
prompt  Select Both row_numbers order by vc30
prompt  ======================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ============================================
prompt  Select Both row_numbers order by vc65
prompt  ============================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

You’ll notice that I’ve included a request for the projection information in the plans for these examples so that you can see what columns are passed up from each operation to its parent. Again, though, we’ll start by focusing on just the costs:


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC65"[VARCHAR2,65], "VC30"[VARCHAR2,30],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC30")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC65")[22]
   2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22],
       "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The first query – without an order by” clause – reports a cost of 463; add an order by clause and the cost drops to 253 (and the “order by” clause doesn’t appear as a sort order by operation in the plan). The cost differential between the ordered and “unordered” plans , by the way, is 210 (and from there down to the base tablescan is another 211) – and here’s another way to see that number (+/- 1) appearing:


explain plan for
select
        vc65,
        id
from
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   986K|       |   253   (3)|
|   1 |  SORT ORDER BY     |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
------------------------------------------------------------------------

The cost of the query with no “order by” clause is basically the cost of a table scan plus two sorts of (vc65, vc30, plus a few bits). When you add in an “order by” clause the optimizer discards the “order by” clause and then subtracts one of the sort costs as well.

CBO trace file

Every time I say something about 10053 (CBO) trace files I feel compelled to remind everyone that I rarely look at them, and then it’s usually because I think I know there’s a bug and where I’ll find it in the trace. That’s exactly the case here.

I’m expecting to see two differences in the trace files between the “no order” query, and a query where I’ve added in an “order by” clause. One difference is that one trace file will have an “OBYE” (eliminate order by) comment which won’t be in the other trace, one trace file will have a “cost for SORT” calculation which won’t be in the other.

So here are the relevant bits – first from the query without the order by clause:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE bypassed: no order by to eliminate.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 463.384707  card: 10000.000000  bytes: 1010000.000000
***********************

And from one of the plans with an order by:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE performed.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 252.512458  card: 10000.000000  bytes: 1010000.000000
***********************

As you can see, the first (no order) trace file has two sort calculations under WiF sort, (Window Function?) while the second (order by) trace file reports “OBYE performed” and then loses one of its WiF sorts.

Note: If I had ordered by ID I would have seen two calculations of cost under the heading of WiF sort and a third calculation below that with the heading ORDER BY sort. Unfortunately when I ran the test to check this I also found that the OBYE report said: “OBYE performed” when it wasn’t relevant and there wasn’t an “order by” available for elimination.

Interestingly I tested to see if I could change the optimizer’s behaviour by adding the no_eliminate_oby(@sel$1) hint to the “order by” queries but the GENERAL PLANS section didn’t change, even though the trace file report: “OBYE: OBYE bypassed: hinted”, and the “Hint Report” from the Oracle 19i execution plan acknowledge the hint as legal and used.

Summary

If you have an “order by” clause in a query block that includes analytic functions and the optimizer decides that it can eliminate the “order by” and rely on the side effect of an analytic over() clause you may find that the cost of the query block is reduced by the cost of one of the Window Sort operations. (Technically this might lead to cases where the optimizer then made some poor choices in overall shape of the execution plan – though such cases might be very rare given that this costing error doesn’t affect the cardinality estimates.)

Lagniappe

In the last set of tests I added in the projection information as a simple example of a case where it can help you understand a little more of what the plan is supposed to achieve.  If you examine the last two query plans carefully (select both row_number() values and order by vc30 / vc65 respectively) Operation 2 of the first plan reports:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

while operation 2 of the second plan reports:

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

It’s not until you look at these two lines that the plans show any differences – operations 1 and 2 simply say “Window Sort” without giving any clue about which window sort is for which over() clause. The projection information, though, tells you which way around the over() clauses operated – in the first query the over(order by vc65) is applied to the result of the tablescan first, while in the second query it’s the over(order by vc30) that is applied first.

Lagniappe 2

There’s another little oddity you might spot when you look at the projection information and think about the WiF sort costs from the unordered query. The columns passed from operation 3 to operation 2 are:

 3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The columns passed from operation 2 to operation 1 are one of:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

Operation 2 sorts the data from operation 3, and operation 1 sorts the data from operation 2 – but the columns arriving from operation 2 have an extra row_number()[22] added to them. So when you look in the trace file at the two Wif Sort calculations why do they both show:

Blocks to Sort: 150 Row size: 122 Total Rows: 10000

Shouldn’t one of them show a Row size that’s (at least) 22 longer than the other ?

 

 

 

 

 

 

May 5, 2020

Execution Plans

Filed under: Execution plans,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:36 pm BST May 5,2020

Table Of Contents

1.0 Introduction
2.0 Overview
3.0 The Main Course
4.0 Simplify
5.0 Filling the Gaps
6.0 Looking at the numbers
7.0 Predicate Information
8.0 Resolution
9.0 Summary
Footnote


 

1.0 Introduction

1.1 In a comment to a recent post on reading a non-trivial execution someone asked me to repeat the exercise using a plan I had published a few days previously in a post about tweaking the hints in an outline. The query in question involved a number of subqueries and transformations of different types, which means it’s going to take a little work explaining the details, and it’s probably going to be a fairly long read.

1.2 Here’s the query that produced the plan we’re going to examine. I’ve done some cosmetic alteration  to make it a little easier to read (though it’s still not perfect according to my standards). I’ve also made one very important addition to the query to make it easier to follow my walkthrough of the execution plan; the original text didn’t specify any query block names (/*+ qb_name() */ hints) even though it starts off with 9 separate query blocks, so I’ve walked through the text very carefully adding in the query block names that Oracle would have used (sel$NN) for each query block. In this case I got lucky because there were no views of other recursive problems involved so all I had to do was find each occurence of the word “select” in literal text order and increment the NN in sel$NN for each one.


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id,
                academic_year,
                applicant_gender,
                medium_of_study,
                education_type,
                college_id,
                course_id,
                medium_id,
                hostel_required,
                preference_order,
                status_flag,
                attribute7,  -- Added on 7-mar-20
                college_status_flag,
                percentage,
                caste_category,
                alloted_category,
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        adt.applicant_id,
                        lmt_gender.lov_code applicant_gender,
                        adt.medium_of_study,
                        act.college_id,
                        lmt_education_type.lov_code education_type,
                        act.course_id,
                        act.medium_id,
                        act.hostel_required,
                        act.preference_order,
                        act.status_flag,
                        act.attribute7, -- Added on 7-mar-20
                        adt.college_status_flag,
                        adt.academic_year,
                        adt.percentage,
                        adt.applicant_dob,
                        adt.legacy_appln_date,
                        adt.caste_category,
                        act.attribute1 alloted_category,
                        DECODE (lmt_pass.lov_code,  'ATTFIRST', 1,  'COMPARTL', 2,  3) order_of_pass,
                        DECODE (late_entry_flag,  'N', 1,  'Y', 2,  3)      order_of_entry,
                        DECODE (lmt_appearance.lov_code,  'REGULAR', 1,  'PRIVATE', 2,  3) order_of_appearance,
                        DECODE (adt.is_ttd_employ_ward,  'Y', 1,  'N', 2,  3) order_of_ttd_emp,
                        DECODE (adt.is_balbhavan_studnt,  'Y', 1,  'N', 2,  3) order_of_schooling,
                        act.attribute3 course_qe_priority,
                        adt.is_local_canditature_valid,
                        adt.is_ttd_emp_ward_info_valid,
                        adt.is_sv_bm_student_info_valid,
                        adt.is_social_ctgry_info_valid,
                        DECODE(adt.college_status_flag,'B',1,'O',2,'N',3) order_of_status
                FROM 
                        xxadm.xxadm_applicant_details_tbl    adt,
                        xxadm.xxadm_applicant_coursprefs_tbl act,
                        xxadm.xxadm_college_master_tbl       cmt,
                        xxadm.xxadm_course_master_tbl        crmt,
                        xxadm.xxadm_medium_master_tbl        mmt,
                        xxadm.xxadm_lov_master_tbl           lmt_gender,
                        xxadm.xxadm_lov_master_tbl           lmt_pass,
                        xxadm.xxadm_lov_master_tbl           lmt_appearance,
                        xxadm.xxadm_lov_master_tbl           lmt_religion,
                        xxadm.xxadm_lov_master_tbl           lmt_education_type
                WHERE
                        adt.applicant_id = act.applicant_id
                AND     act.college_id = cmt.college_id
                AND     act.course_id = crmt.course_id
                AND     act.medium_id = mmt.medium_id
                AND     adt.applicant_gender = lmt_gender.lov_id
                AND     adt.pass_type = lmt_pass.lov_id
                AND     adt.appearance_type = lmt_appearance.lov_id
                AND     adt.religion = lmt_religion.lov_id
                AND     cmt.education_type = lmt_education_type.lov_id
                AND     adt.status = 'Active'
                AND     1 = (CASE 
                                WHEN act.hostel_required = 'Y'
                                        THEN (CASE
                                                     WHEN    adt.distance_in_kms >20
                                                     AND     lmt_religion.lov_code = 'HINDU'
                                                     AND     adt.caste_category NOT IN (
                                                                     SELECT  /*+ QB_NAME(SEL$4) */
                                                                             category_id
                                                                     FROM    xxadm.xxadm_category_master_tbl
                                                                     WHERE   category_code IN ('BACKWRDC', 'BACKWRDE')
                                                             )
                                                             THEN 1
                                                             ELSE 2 
                                              END
                                             )
                                        ELSE 1 
                               END
                              )
                AND     1 =  (CASE 
                                WHEN act.hostel_required  = 'Y'
                                        THEN    (CASE 
                                                        WHEN    (    lmt_education_type.lov_code = 'COEDUCOL' 
                                                                 AND mt_gender.lov_code = 'FEMALE'
                                                                )
                                                                THEN 2
                                                                ELSE 1 
                                                 END
                                                )
                                        ELSE 1 
                               END
                              )
                AND     adt.course_applied_for = 'DEG' 
                AND     (adt.college_status_flag IS NULL OR adt.college_status_flag IN ('N','T','C','B','O')) 
                AND     act.preference_order <= NVL( -- > comment to avoid WordPress format issue
                                (SELECT  /*+ QB_NAME(SEL$5) */ 
                                         preference_order 
                                 FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                 WHERE   act1.applicant_id = adt.applicant_id 
                                 AND     status_flag IN('B','T','C','O') 
                                 ), act.preference_order 
                        )
                AND     act.preference_order >=  NVL(
                                (SELECT /*+ QB_NAME(SEL$6) */
                                        preference_order
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act2 
                                WHERE   act2.applicant_id = adt.applicant_id
                                AND     status_flag  = 'C'
                                ), act.preference_order
                        )
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$7) */
                                        act3.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act3
                                WHERE   act3.applicant_id = adt.applicant_id 
                                AND     act3.status_flag  = 'O'
                        ) 
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$8) */
                                        act1.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                WHERE   act1.applicant_id = adt.applicant_id 
                                AND     act1.status_flag IN ('C','B')
                                AND     act1.attribute1 IN (
                                                SELECT  /*+ QB_NAME(SEL9) */
                                                        category_id 
                                                FROM    xxadm.xxadm_category_master_tbl 
                                                WHERE   category_code IN ('OPENMERT')
                                        ) 
                                AND     NVL(act1.attribute7,'N') = 'N'
                        ) 
                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 1-1

1.3 This query first came to light in a thread on the Oracle Developer forums with an extract from a tkprof file showing that it had executed 842,615  times. That number should be ringing alarms and flashing warning lights, but if we assume that there really is no way of doing some sort of batch processing to get through the data we need to do a little bit of arithmetic to see how much of a threat this query is and how much is matters.

1.4 For every extra 0.01 seconds it takes to execute this query the total run-time goes up by8,426 seconds, which is 2 hours and 20 minutes. If the average execution time is a mere 0.06 seconds you’ll be at it all night long – and it will be a long, long night.


 

2.0 Overview

2.1 Before we look at the execution plan let’s take a moment to pick out a few points from the query. You may want to re-open this post in a separate window so that you can switch easily between the SQL and my comments.

2.2 We start off with a simple select from an inline view – and if we replace the inline view the simple “object name” V_THING we get the following query:


select  count(applicant_id)
from    V_THING
where   applicant_id = :p_applicant_id
;

Figure 2-1

2.3 This should prompt two questions

  • First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply, possibly the nature of the view is such that the optimizer could do a simple filter pushdown to apply the predicate very early. That still leaves (or leads on to) the question of whether the optmizer might then be able to generate further uses of the predicate through transitive closure.
  • Secondly, if the view V_THING is a multiable view will we be able to work out which table applicant_id comes from by the time it becomes visible in the view.  It’s possible that changing the table from which applicant_id comes will change the execution plan.

2.4 Digging down one layer we see that our V_THING is also a simple select from an inline view – let’s call it V_ANOTHER – so if we again forget about the complexity of the inner view we’re looking at a query that goes:


select  /*+ QB_NAME(SEL$1) */
        count(applicant_id)
from    (
        select  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                null    allotment_type
        from
                V_OTHER
        where
                rownum <=  :p_seats
        )       V_THING
where 
        applicant_id = :p_applicant_id
;

Figure 2-2

2.5 A couple of details hit the eye when you look at this: Why are we selecting 17 columns from a complex view, and then counting only one of them and discarding the rest. Let’s hope the optimizer is smart enough to discard the excess columns at the earliest possible moment (which might allow it to do some index-only accesses instead of visiting tables for columns we don’t really need).

2.6 Stranger still, one of those columns is a delberately generated NULL! This hints at the possibility that the client code is doing something like “count how many query X will give me, then run query X”– giving us the pattern “select count(*) from (inlne query X); execute query X” Maybe this whole query is a waste of time, but if it can’t be avoided maybe it should be edited down to the smallest  query that will get the correct count.

2.7 Another thought about this layer of the query, the predicate “rownum <= :bind_variable” may be pushing the optimizer into first_rows(n) optimization and this might be enough to make it choose a bad execution plan. I’d have to check, and check for specific versions, but off the top of my head I think that when comparing rownum with a bind variable the optimizer will optimizer for first_rows(10) unless there’s some other reason for choosing anything else.)

2.8 I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

2.9 Let’s dig one layer deeper before we get into the complex stuff. Here’s a version of the code that expands V_OTHER in an extremely stripped down form:


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        {lots of columns}
                FROM 
                        {lots of tables}
                WHERE
                        {lots of predicates}
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                )  
        WHERE
                ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 2-3

2.10 At this point we can start to see reasons for the layering of inline views – we need to select data in the right order before we apply the rownum predicate; as for the excess columns in the select list – even if we selected only the applicant_id in the outer layers the optimizer would still have to acquire the five columns in the order by clause.

2.11 But this emphasises the oddity of the query. If we’re only counting applicant_id to see whether we got :p_seats or fewer rows for a specific applicant_id why does the order matter – surely the order will only matter when we “repeat” the query to get the actual rows (if that’s what we do). As it is, to count a small number of rows this query might have to fetch and sort a large number, then discard most of them. (Some statistics from other posts by the OP indicated that the underlying query might fetch anything between a few hundred and a couple of thousand rows. This particular run showed the query finding 171 rows to sort and then restricting the rowsource to the first two sorted rows)


 

3.0 The Main Course

3.1 To make it a little easier to discuss the detail of the execution plan I’ve laid it out in a small number of sections corresponding to the final (outline_leaf) query blocks the optimizer generated. To do this I applied two sets of information – the Query Block / Object Alias information (which follows the body of the plan) and any appearances of the VIEW operation in the plan.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                                |      1 |        |   574 (100)|      1 |00:00:00.02 |    3822 |       |       |         |
|   1 |  SORT AGGREGATE                             |                                |      1 |      1 |            |      1 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select count(applicant_id) - above
select where rownum less than - below
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  2 |   VIEW                                      |                                |      1 |      1 |   574   (2)|      0 |00:00:00.02 |    3822 |       |       |         |
|*  3 |    COUNT STOPKEY                            |                                |      1 |        |            |      2 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   4 |     VIEW                                    |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |       |       |         |
|*  5 |      SORT ORDER BY STOPKEY                  |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |  2048 |  2048 | 2048  (0)|
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |       |       |         |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |       |       |         |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |       |       |         |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
join index transformation query block SEL$082F290F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Continuation of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |  1245K|  1245K| 1277K (0)|
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |       |       |         |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |       |       |         |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |       |       |         |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |       |       |         |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |       |       |         |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unnested subquery SEL$A75BE177 (from sel$8, sel$9)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |       |       |         |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$6
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery SEL$F665FE1B (from sel$4 with tranform for index join)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2        / from$_subquery$_001@SEL$1
   3 - SEL$2
   4 - SEL$7E0D484F / from$_subquery$_002@SEL$2
   5 - SEL$7E0D484F
  14 - SEL$082F290F / LMT_GENDER@SEL$3
  15 - SEL$082F290F
  16 - SEL$082F290F / indexjoin$_alias$_001@SEL$082F290F
  17 - SEL$082F290F / indexjoin$_alias$_002@SEL$082F290F
  21 - SEL$7E0D484F / CMT@SEL$3
  22 - SEL$7E0D484F / CMT@SEL$3
  23 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  24 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3
  27 - SEL$7E0D484F / ACT3@SEL$7
  28 - SEL$7E0D484F / ACT3@SEL$7
  29 - SEL$A75BE177 / VW_SQ_1@SEL$67DC521B
  30 - SEL$A75BE177
  31 - SEL$A75BE177 / ACT1@SEL$8
  32 - SEL$A75BE177 / ACT1@SEL$8
  33 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  34 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  35 - SEL$7E0D484F / LMT_PASS@SEL$3
  36 - SEL$7E0D484F / LMT_PASS@SEL$3
  37 - SEL$7E0D484F / LMT_APPEARANCE@SEL$3
  38 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  39 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  40 - SEL$5        / ACT1@SEL$5
  41 - SEL$5        / ACT1@SEL$5
  42 - SEL$6        / ACT2@SEL$6
  43 - SEL$6        / ACT2@SEL$6
  44 - SEL$F665FE1B / XXADM_CATEGORY_MASTER_TBL@SEL$4
  45 - SEL$F665FE1B
  46 - SEL$F665FE1B / indexjoin$_alias$_001@SEL$F665FE1B
  48 - SEL$F665FE1B / indexjoin$_alias$_002@SEL$F665FE1B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_dsdir_usage_control' 0)
      OPT_PARAM('_optimizer_adaptive_plans' 'false')
      OPT_PARAM('_optimizer_gather_feedback' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F665FE1B")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$A75BE177")
      PUSH_PRED(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B" 16 15)
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$9D10C90A")
      UNNEST(@"SEL$9")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$C04829E0")
      ELIMINATE_JOIN(@"SEL$3" "CRMT"@"SEL$3")
      ELIMINATE_JOIN(@"SEL$3" "MMT"@"SEL$3")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$7E0D484F" "CMT"@"SEL$3" ("XXADM_COLLEGE_MASTER_TBL"."COLLEGE_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      INDEX_JOIN(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_CODE") ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "ACT3"@"SEL$7" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      NO_ACCESS(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      LEADING(@"SEL$7E0D484F" "CMT"@"SEL$3" "LMT_EDUCATION_TYPE"@"SEL$3" "ACT"@"SEL$3" "ADT"@"SEL$3" "LMT_GENDER"@"SEL$3" "ACT3"@"SEL$7" "VW_SQ_1"@"SEL$67DC521B"
              "LMT_PASS"@"SEL$3" "LMT_APPEARANCE"@"SEL$3" "LMT_RELIGION"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT3"@"SEL$7")
      USE_NL(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      USE_NL(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      NLJ_BATCHING(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      PQ_FILTER(@"SEL$7E0D484F" SERIAL)
      INDEX_RS_ASC(@"SEL$A75BE177" "ACT1"@"SEL$8" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      INDEX_RS_ASC(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID"))
      LEADING(@"SEL$A75BE177" "ACT1"@"SEL$8" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      USE_NL(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      INDEX_RS_ASC(@"SEL$6" "ACT2"@"SEL$6" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."STATUS_FLAG"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$6" "ACT2"@"SEL$6")
      INDEX_RS_ASC(@"SEL$5" "ACT1"@"SEL$5" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."COLLEGE_ID"
              "XXADM_APPLICANT_COURSPREFS_TBL"."COURSE_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."MEDIUM_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."HOSTEL_REQUIRED"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$5" "ACT1"@"SEL$5")
      INDEX_JOIN(@"SEL$4" "XXADM_CATEGORY_MASTER_TBL"@"SEL$4" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID") ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_CODE"))
      END_OUTLINE_DATA
  */

Figure 3-1

3.2 There’s no rigid rule I can give you about an approach for looking for query blocks and transformations, but it’s worth checking to see which of your original query blocks still exist in the final execution plan and which have disappeared thanks to some transformation.

3.3 If we look down the Query Block Name list above we can see that sel$1, sel$2, sel$5 and sel$6 have “survived” the machinations of the optimizer. We’ve already noted that sel$1 and sel$2 are simply “select from {inline view}” as far as the optimizer is concerned; sel$5 and sel$6 are simple subqueries that appeared as filter subqueries in the original query text and have kept that status by the end of the optimizer’s transformation stage.

3.4 Tracking down the other query blocks that we named we can see the following:

  • sel$3 – most of its tables appear in a new query block called SEL$7E0D484F but one of them appears in a query block called SEL$082F290F; a closer look at SEL$082F290F shows us that it ranges from operations 14 to 17 and holds a “single table” transformation where the optimizer has chosen to use an index join of two indexes on the xxadm_lov_master_tbl rather than doing a tablescan. The index join is represented as a VIEW of a hash join, hence the separate query block. Another little detail we note – the xxadm_lov_master_tbl appears five times in the query, so we need to know which occurrence this is: fortunately the Object Alias information tells us at operation it’s the LMT_GENDER alias.
  • sel$4 is the scalar subquery inside a CASE expression, involving table xxadm_category_master_tbl. We can find the table name (which hasn’t been given an alias) and the query block name in the Object Alias information at operation 44 in a query block called SEL$F665FE1B. There are two points of interest about this query block – it has come into existence because it’s another example where the optimizer has used an index join to avoid a full tablescan; and it has been used in a filter subquery (the parent of operation 44 is the FILTER at operation 6).
  • sel$7 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl with an alias of act3. The Query Block / Object Alias information tells us that ACT3@SEL$7 appears at operations 27 and 28 – and when we track up the plan from operation 27 we see that it is the second child of operation 12 which is a nested loop anti. The optimizer has unnested the subquery and turned it into an anti join as one of the steps that produced query block SEL$7E0D484F
  • sel$8 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl aliased as act1 (OUCH – that’s the second time the alias act1 has appeared in this query!). But the subquery had it’s own subquery, named sel$9, against xxadm_category_master_tbl which didn’t have an alias (more OUCH!). When we search for ACT1@SEL$8 and XXADM_CATEGORY_MASTER_TBL@SEL$9 in the Query Block / Object Alias information we find that they both appear in a query block called SEL$A75BE177 which ranges from operations 29 to 34, and a check of the plan shows that operation 29 is a view pushed predicate of a view called VW_SQ_1 – a name that identifies the view as an internally generated non-mergeable view that was created as the result of unnestingn a subquery. The view contains a join between xxadm_applicant_coursprefs_tbl and xxadm_category_master_tbl, so we can say that the optimizer has unnested our sel$9 to create a NOT IN subquery that is a join subquery, then it has unnested again to produce an inline non-mergeable view, and it has then allowed “join predicate pushdown (JPPD)” so that the non-mergeable view can be the second table of a nested loop. To confirm the last comment we track up the plan to discover that operation 29 is the second child of operation 11 which is, indeed, a nested loop and (since the subquery was a NOT IN subquery) a nested loop anti.
  • sel$9 – see sel$8.

3.5 As you can see, when you’re using the execution plan output to identify what’s happened to the individual query blocks from your original query you’re likely to jump around from the query to the plan body, to the Query Block / Object Alias information in a fairly arbitrary way.

3.6 I’ll close this chapter of the analysis with a quick look at the Outline Data – in particular two of the hint types that appear: OUTLINE() and OUTLINE_LEAF() – which I’ve extracted and sorted for ease of reading:

      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")

      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$7E0D484F")
      OUTLINE(@"SEL$9D10C90A")
      OUTLINE(@"SEL$C04829E0")

      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      OUTLINE_LEAF(@"SEL$A75BE177")
      OUTLINE_LEAF(@"SEL$F665FE1B")

Figure 3-2

3.7 In this context an OUTLINE() is a query block that existed at some point in the optimization sequence that got us to the final execution plan but did not appear as a query block in the final plan. In the previous paragraphs we described how the original query blocks sel$3, sel$4, sel$7, sel$8 and sel$9 disappeared through transformation so (apart from sel$4 which is a bit of an anomaly that I’ll pick up in a moment) they appear in an   OUTLINE() hint. I described how sel$9 would have been unnested into sel$8 to create a join which would still have been a filter subquery until that too was unnested – that join subquery would have been one of the five OUTLINE() query blocks above with the 8 character hexadecimal names.

3.8 An OUTLINE_LEAF() is a “final” query block – one that is present in the final execution plan. If you ignore sel$4, you’ll see that the other 8 query blocks correspond to the 8 query block names that appear in the Query Block Name / Object Alias information. The appearance of sel$4 as an OUTLINE_LEAF() looks like an anomaly to me; I can’t think of a good reason why it should be in the list.


 

4.0 Simplify

4.1 We’re just about ready to do a full read-through of the execution plan. We’ve taken the two outer layers off the query/plan because they represent such simple in-line views, and we’ve discussed the disappearance of some of our initial query blocks and identified and explained all the different query blocks that have appeared in the final plan. So with the extra bits of information in hand let’s take a couple more steps in simplifying the execution plan.

4.2 First I’ll replace each of the three VIEW operations and their descendants with a single line that says “this is a rowsource”. I’ll distinguish between the two variants of the operation (VIEW and VIEW PUSHED PREDICATE) by calling them BULK ROWSOURCE and PRECISION ROWSOURCE respectively: it’s not a perfect description but broadly speaking we expect a VIEW to be called once by its parent to produce a “large” data set and we expect a VIEW PUSHED PREDICATE to be called many times by its parent to produce (each time) a “small” data set using extremely efficient methods.

4.3 Then I’ll remind you that a multichild FILTER operation calls the first child once to supply a rowsource then, for each row returned, calls the other child operations in turn to determine whether or not to keep the row from the first child. This means we can examine just the first child in isolation to see how the optimizer wants to get the driving bulk of the data (and we can examine the other children later, bearing in mind how often they might need to be called and checking how efficient each call is likely to be).

4.4 Finally I’ll note that query block SEL$7E0D484F (the “real main query” as I labelled it in the plan above) starts: “VIEW -> SORT ORDER BY STOPKEY -> FILTER” – after we’ve filtered our data we simply sort it with the intention of keeping only the “top few” rows. That part of the plan is so simple we’ll ignore those lines of the plan and focus on just the first child of the FILTER- leaving the core plan looking like this:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Figure 4-1

4.5 We need to examine a plan of only 25 lines with no complicated bits (because we’ve hidden any bits that looked complicated and will get back to them later). The thing now looks like a single query block which means we can think “First Child First”, so:

  • operation 7 calls operation 8 which calls operation 9 which calls operation 10 which calls operation 11 which calls operation 12 which calls operation 13 which calls operation 14 which is the first operation to produce a rowsource (though we don’t care how at present).
  • Operation 13 is a hash join, so the rowsource from operation 14 becomes its “build” table, and we call operation 18 to supply the “probe” table.
  • Operaion 18 calls operation 19 which calls operation 20 which calls operation 21 which is a table access by rowid that has to call operation 22 to get rowids. So operation 22 supplies the second rowsource (in our collapsed plan). It’s an INDEX UNIQUE SCAN of the index that appears (judging from its name)to be the primary key index of a table, so operation 22 will produce at most one rowid that is passed up to operation 21 that will use that rowid to get the one row from the table. (Operation 21 supplies the 3rd rowsource).
  • Operation 21 passes a row up to operation 20 which calls operation 23 which calls operation 24 (4th rowsource) to do another unique scan of a unique index to get a rowid to pass up to operation 23 to find (and test) a row from the table (5th rowsource) which it passes up to operation 20 to do the join and pass the result up (6th rowsource) to operation 19.
  • Operation 19 calls its second child (operatiomn 25) for each row it receives – but because of the primary key/unique scans the optimizer knows that the first child will return at most one row and sees no problem with using a full tablescan as the second child of the nested loop. So the tablescan of XXADM_APPLICANT_COURSPREFS_TBL is the 7th rowsource. Any rows survinging the join are passed up to operation 18 (making operation 19 the 8th rowsource).
  • Operation 18 uses the incoming rowsource to build its in-memory hash table, and calls operation 26 to supply its second (probe table) rowsource. Operation 26 is the the 9th rowsource, executing a full tablescan of XXADM_APPLICANT_DETAILS_TBL and passing the results up to operation 18, which performs the join and passes the results up to its parent, making it the 10th rowsource.
  • Operation 18 was the second child of the hash join at operation 13, which now uses the incoming data as the probe table to generate the 11th rowsource and pass the results up to operation 12.
  • Operation 12 is a nested loop anti-join and operation 13 has just supplied it with its first child rowsource, so operation 12 will now call its second child once for each row in the first child. Its second child is operation 27 (table access by rowid) which calls its first child (operation 28 index range scan) which fetches rowids from the index passes them up to operation 27 which fetches table rows and passes them up to operation 12. So operation 28 supplies the 12th rowsource, operation 27 the 13th. Since operation 12 is an ANTI join a row from the first child survives if operation 27 doesn’t find a row to return. Operation 12 passes any survivors (14th rowsource) up to operation 11.
  • Operation 11 is another ANTI-join nested loop so for each row from operation 12 it will call its second child, passing in values from its first child to drive an efficient access path and forwarding any rows from the first child where the second child returns no rows. Its second child is operation 29 – our “precision rowsource” – and we can postpone worrying about the exact details of how that works. Operation 29 produces the 15th rowsource in our reduced plan, which it passes up to operation 11.
  • Operation 11 is the first child of the nested loop at operation 10 – and from this point onwards we have 4 nested loop joins and we can simply list through the order of operation. Operation 11 produces the 16th rowsource, then Operation 10 calls its second child (operation 35) which calls operation 36 which passes rowids (17th rowsource) up to operation 35 which passes rows (18th rowsource) up to operation 10.
  • Operation 10 passes its data (19th rowsource) up to operation 9 which calls operation 37 as its second child. Operation 37 (20th rowsource) passes index entries up to operation 9 which performs the join and passes the result (21st rowsource) up to operation 8.
  • Operation 8 calls operation 38 as its second child. Operation 38 (22nd rowsource) passes index entries up to operation 8 which performs the join and passes the result (23rd rowsource) up to operation 7.
  • Operation 7 calls operation 39 as its second child. Operation 39 (24th rowsource) passes index entries up to operation 7 which performs the join and that’s the final (25th) rowsource as far as our reduced execution plan is concerned.

4.6 Here’s the reduced plan, cut back to minimum width, with the order of rowsource generation included:


-----------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           |  Order |
-----------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |     25 |
|   8 |         NESTED LOOPS                        |                                |     23 |
|   9 |          NESTED LOOPS                       |                                |     21 |
|  10 |           NESTED LOOPS                      |                                |     19 |
|  11 |            NESTED LOOPS ANTI                |                                |     16 |
|  12 |             NESTED LOOPS ANTI               |                                |     14 |
|* 13 |              HASH JOIN                      |                                |     11 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |
|* 18 |               HASH JOIN                     |                                |     10 |
|  19 |                NESTED LOOPS                 |                                |      8 |
|  20 |                 NESTED LOOPS                |                                |      6 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      3 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      2 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      5 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      4 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      7 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      9 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |     13 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |     12 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |     15 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |     18 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |     17 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |     20 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |     22 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |     24 |
-----------------------------------------------------------------------------------------------

Figure 4-2

4.7 Once we’ve got this image sorted out we still have a few details to fill in before we’ve gpt the full picture of the execution plan.

  • What does Oracle do to generate the “bulk rowsource” at operation 14
  • What does Oracle do on every call to the “precision rowsource” at operation 29
  • We know that the reduced plan above is the first child of a FILTER operation and if we refer back to previous “real main query” we know that there are three further children to the FILTER that might have to execute once for each row produced by the first child. So that’s another 3 (small) query blocks we need to examine in detail.
  • We need to bring in the predicates to see how the optimizer has used them
  • We need to look at the Starts and A-Rows to compare what happened with the optimizer’s expectation
  • We need to look at disk reads and buffer gets to see how much excess work we did to acquire the data


 

5.0 Filling the Gaps

5.1 After getting the overall shape of the query’s execution we can go back and examine the bits we have so far postponed view. There are three pieces to consider

  • the “bulk rowsource” at operation 14 that was the first child of a hash join.
  • the “precision rowsource” at operation 29 that was the second child of a nested loop anti-join
  • the filter subqueries that were the 2nd, 3rd and 4th children of the explicit FILTER at operation 6

5.2 We start with the “bulk rowsource” that was a VIEW with a highly suggestive name of index$_join$_008. This shows Oracle finding a way of selecting data from a table without visiting the table, using a couple of indexes as if they were skinny tables that could be scanned and joined. In effect Oracle has transformed “select key1, key2 from tableX” into something like:


select  ix1.key1, ix2.key2
from
        (select key1, rowid r1 from tableX) ix1,
        (select key2, rowid r2 from tableX) ix2
where
        ix1.r1 = ix2.r2
;

5.3 This strategy can only be used when Oracle knows that every relevant row will appear in the two indexes – which basically means you have to be careful about NULLs. In the simplest case you might have to have a NOT NULL constraint on at least one column in each of the target indexes; or a predicate in each inline view that ensures that Oracle can use just the index without losing some of the rowids that it needs. After acquiring key values and rowids from each index in turn, Oracle then joins the two sets of data using a hash join. Technically there is no limit to the number of indexes that Oracle can join in this fashion, the choice of strategy depends largely on how big the table is compared to the sum of the sizes of the indexes that could be used instead; practically (as in our main query) it’s rare to see more than two indexes used for this “index join” mechanism.


join index transformation query block SEL$082F290F, with parent
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.4 Moving on to the “precision rowsource” that appears in the original plan as a VIEW PUSHED PREDICATE. This means that Oracle has optimised a non-mergeable view allowing for an input value from its parent. If you take operations 30 to 34 in complete isolation it’s just a simple nested loop join and you might wonder why the view is non-mergable. But when you look back at the parent you discover that it’s an ANTI-join, so Oracle has to say (for each driving row) “join these two tables and see if you get any rows as a result”, it doesn’t have a generic mechanism for doing two separate but consecutive (anti-)join operations at this point.


Unnested subquery SEL$A75BE177 (from sel$8, sel$9) with parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             Driving Rowsource               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.5 Finally we have the three filter subqueries, which I’ve shown with their parent FILTER and a one-liner for the driving rowsource. For each row in operation 7 we call operations 40, 42 and 44 in turn although the parent filter may decide after calling operation 40 that it doesn’t need to call the other two and can simply move on to the next row from operation 7. Similarly the filter might call operations 40 and 42 and not need to call operation 44. It’s also possible that, thanks to scalar subquery caching, Oracle can say “I’ve already called operation 40 for this value, so I know the result and don’t need to call it again”. When we look at the Starts and A-Rows columns for the three operations we will get some idea of how the “notional” execution sequence turned into an actual workload.


Filter subqueries SEL$5, SEL$6 and SEL$F665FE1B, with their parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        "Simplify" Plan                      |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.6 The operations for the first two subqueries (40,41) and (42,43) shouldn’t need any explanation. The third set of operations (44 to 48) is a little more complex. In fact it’s similar to the mechanism that appeared in our first “bulk rowsource” – we have Oracle turning a table access into an “index (only) join”, collecting key values and rowids from two different indexes and doing a hash join on the rowids. The plan is just a little more subtle, though – instead of getting their data from “index fast full scans”, one index is accessed by a range scan (possibly using a predicate value passed in by the filter operation) and the other is using an INLIST ITERATOR with unique scan. In the second case we must be handling a predicate of the form: “key_column in {list of values}”. (Taking a first look at the Starts column we can see that the INLIST ITERATOR runs 6 times calling the INDEX UNIQUE SCAN a total of 12 times so it’s fairly obvious that there are exactly two elements in the list in this case.)


 

6.0 Looking at the Numbers

6.1 Rather than walking through the entire plan again putting the pieces together I’m going to assume that I can carry on to the next stage of analysis, and assume that the pieces will fall into place as we talk about some of the critical numbers.

6.2 It’s probably best to open a copy of the note in a separate window so that you can examine the plan and read my comments at the same time. Starting at the top we apply “first child first”, noting in passing that the examples of the VIEW operations we have in this plan don’t have a significicant impact on the order of operation as we work down the plan – they simply remind us that there are “non-mergeable” parts of the plan. Eventually we get to operation 13 which is a hash join and operations 14 through 17 give us the build (first) table – a quick check shows 127 rows for estimated (E-rows) and actual (A-rows); then we see the probe (second) table is itself a hash join returning 182 rows (estimate 487, so in the right ballpark) and the hash join at operation 13 produces a result set of 182 rows.

6.3 At this point a quick check back UP the plan tells us that the 182 rows survive all the way up to operation 6, where a FILTER eliminates just a few of them; then the result set drops to just 2 rows at operation 5. Then a quick check of operation 5 (with a cross reference to the query) reminds us that we have an inline view that does an “order by” followed by a “rownum < :bind” predicate – so the sort order by stop key at operation 5 is sorting all the data but only passing on the first two rows: so there’s no way we could have modified the join order to eliminate the redundant rows sooner.

6.4 So we see that we get the right volume of data at about the right moment in the plan, and probably can’t do much to avoid the volume of data access – but let’s check how we got the 182 rows at operation 18. Using “first child first” – we see a nested loop joining two “single row by unique index” rowsources, then a nested loop to a full tablescan of XXADM_APPLICANT_COURSPREFS_TBL. The knee-jerk reaction to a “full tablescan as the second table in a nested loop” is that it must be bad – but in this case we know that it will happen no more than once, so we don’t need to panic immediately. Applying a little more thought (and arithmetic): we note that the tablescan returns 209 rows (estimated 478) using 879 buffer gets; that’s not an extreme number of buffer gets per row (especially if the 478 is a reasonably accurate average for the operation). We’ll postpone worrying about the tablescan for the moment but take note that it might be worth revisiting.

6.5 The second child to the hash join at operation 18 is another full tablescan (of XXADM_APPLICANT_DETAILS_TBL) which requires 1,141 buffer gets. Again, though this might not be a bad move, since the alternative would be an actual 209 index probes (or an estimated average 478 index probes). The workload is, again, in the right ballpark but, again, something we might come back to. In fact in both tablescans it might be more important to worry about the work done at each row by the row predicates rather than worrying about the fact that the operation is a tablescan; a predicate involving a CPU-intensive PL/SQL function might be the thing that makes 478 index probes to 478 rows a better option than a tablescan of (say) 100,000 rows.

6.6 From this point onwards (operation 13) we have 6 nested loop joins (though the first two are anti-joins) so it’s “call the second child for each row in the first child” all the way down, and we’ve seen that we don’t eliminate any data as we go. If we want to make the execution plan any faster by “local” tweaking we’ll just have to make sure each “second child” operation is as efficient as possible, which tends to mean looking for cases where we supply a lots of rows (rowids) from an index range scan but find that we then discard the table rows after visiting the table. So …

6.6.1 Operation 12 – nested loop anti join – calls 28/27 (table access by unique index). We find an index entry on each call, but the table row doesn’t qualify – which is what we want for a “successful” anti-join. We could make this a little more efficient by adding a column to the (already unique) index and avoid visiting the table.

6.6.2 Operation 11 – nested loop anti join – calls operation 29 (view with pushed predicate) which operates a high-precision nested loop join at operation 30. The first child of operation 30 is a table access by index, but the table never returns a row (which is nice) so we never call the second child of operation 30. Again we could make this view access a little more efficient by adding extra columns to (already unique) indexes to avoid any need to visit tables.

6.6.3 Operations 10, 9, 8, 7 – nested loop joins – operating very efficiently, some not even visiting tables to acquire data. The order of operation at this point is: 11, 36, 35, 10, 37, 9. 38, 8, 19, 7. And then we get to the FILTER operation, which has 3 subqueries to operate in turn,

6.7 Operation 6 executes in turn the two subqueries we named sel$5 and sel$6 respectively, the first one 182 times, the second one 171 times. Since operation 6 produces 171 rows it seems likely that the initial 182 rows dropped to 171 rows as a consequence of the sel$5 subquery resulting in the smaller number of calls to the sel$6 subquery. It’s worth noting here that operation 41, the index range scan, returned 1,450 rowids, but the subsequent table accesses returned a total of only 29 rows after an additional 316 buffer gets (507 – 191). There may be an opportunity here (yet again) for adding an extra column to the index so that we visit the table only 29 times rather than visiting it 1,450 times. In fact, though it’s not obvious in this SQL Monitor report, the indications from other examples from the same query suggested that this subquery was the single more resource intensive part of the plan.

6.8 The last subquery executed by Operation 6 is the one identified by query block sel$4. The sub-plan starts with a VIEW operation because the table (identified as originating in sel$4 in the query block / object alias information) is “accessed” by way of an index hash join. This subquery is executed only 6 times. Given that there are (at least) 171 rows for which this subquery could be called this means one of two things.  First we can from the query text that this subquery is part of a complex CASE expression – so maybe the simple conditions in the expression mean we rarely need to call the subquery;. secondly it could mean the run-time engine has managed to take advantage of scalar subquery caching and the query doesn’t have many distinct inputs for this subquery – and when we check the predicate section we can see the relevant predicate for a query against the XXADM_CATEGORY table was “category_id = {correlation variable}” which has the look of a table with a few rows and distinct ids..

6.9 In summary, then, there may be a few “localised” tweaks that we can to do improve performance of this plan – largely by adding columns to existing indexes and using them effectively. There are indications that one of the filter subqueries might be a particularly good target for this type of tweak; after which we might want to look at what could be done with the two tablescans which are in that grey area where it’s not easy to decide whether an indexed access or a tablescan is the better option. We have to remember, though, that this query was originaly reported as executing 842,000 times – so maybe we need to do much better than just a little tweaking.


 

7.0 Predicate Information

7.1 Why are we running a query 842,000 times in a batch? The right way to find an answer to that question is to ask the right person – if you can find them. A slightly more long winded way is to find out what is driving the 842,000 executions – and you might be able to do that if you have the full tkprof output from the trace file. (Hint: statement X runs 842,000 times, and if statement Y executes 13 times and produces 842,000 rows maybe Y is driving X.) Sometimes, though, you don’t have the people, or the full data set, or the access you need, so you might take a look at the query and the predicates and start making some reasonable guesses.

7.2 Here’s the tail end of the query, conveniently capturing all the input bind variables:

                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id

7.3 There are two things we can note about these predicates – first they don’t follow the pattern of “:Bnnn” so they’re not from a statement statically embedded in PL/SQL, secondly the names are intelligible and meaningful, so we might draw some tentative conclusions from them, in particular how many distinct values there might be and how this lead to 842,000 executions of the query.

7.4 The variable name that stands out is :p_applicant_id. We seem to be looking at a query about applicants for courses at colleges – and the latter pair probably gives us a relatively small number of combinations. The variable :p_hostel_required is surely just going to be a “yes/no/maybe/null” option. The :p_medium_id is a bit of a puzzle but scanning through the query it looks like it’s the id for the “medium of study” so probably another variable with a small number of values. So where in the plan do these variables get used? Here’s the full list of predicates from the plan, followed by an extra few lines showing just the predicates that reference the bind variables:

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   6 - filter((    "ACT"."PREFERENCE_ORDER"<=NVL(,"ACT"."PREFERENCE_ORDER") -- > comment added to avoid wordpress format issue
               AND "ACT"."PREFERENCE_ORDER">=NVL(,"ACT"."PREFERENCE_ORDER") AND CASE "ACT"."HOSTEL_REQUIRED"
              WHEN 'Y' THEN CASE  WHEN ("ADT"."DISTANCE_IN_KMS">20 AND "LMT_RELIGION"."LOV_CODE"='HINDU' AND  IS NULL) THEN 1 ELSE 2 END  ELSE 1 END =1))
  13 - access("ADT"."APPLICANT_GENDER"="LMT_GENDER"."LOV_ID")
       filter(CASE "ACT"."HOSTEL_REQUIRED" WHEN 'Y' THEN CASE  WHEN ("LMT_EDUCATION_TYPE"."LOV_CODE"='COEDUCOL' AND "LMT_GENDER"."LOV_CODE"='FEMALE') THEN 2 ELSE 1 END
               ELSE 1 END =1)
  15 - access(ROWID=ROWID)
  18 - access("ADT"."APPLICANT_ID"="ACT"."APPLICANT_ID")
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  24 - access("CMT"."EDUCATION_TYPE"="LMT_EDUCATION_TYPE"."LOV_ID")
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))
  26 - filter(("ADT"."STATUS"='Active' AND "ADT"."COURSE_APPLIED_FOR"='DEG' AND (INTERNAL_FUNCTION("ADT"."COLLEGE_STATUS_FLAG") OR "ADT"."COLLEGE_STATUS_FLAG" IS
              NULL)))
  27 - filter("ACT3"."STATUS_FLAG"='O')
  28 - access("ACT3"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT"."PREFERENCE_ORDER"="ACT3"."PREFERENCE_ORDER")
  31 - filter((INTERNAL_FUNCTION("ACT1"."STATUS_FLAG") AND NVL("ACT1"."ATTRIBUTE7",'N')='N'))
  32 - access("ACT1"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT1"."PREFERENCE_ORDER"="ACT"."PREFERENCE_ORDER")
  33 - filter("CATEGORY_CODE"='OPENMERT')
  34 - access("CATEGORY_ID"=TO_NUMBER("ACT1"."ATTRIBUTE1"))
  36 - access("ADT"."PASS_TYPE"="LMT_PASS"."LOV_ID")
  37 - access("ADT"."APPEARANCE_TYPE"="LMT_APPEARANCE"."LOV_ID")
  38 - access("ADT"."RELIGION"="LMT_RELIGION"."LOV_ID")
  40 - filter(("STATUS_FLAG"='B' OR "STATUS_FLAG"='C' OR "STATUS_FLAG"='O' OR "STATUS_FLAG"='T'))
  41 - access("ACT1"."APPLICANT_ID"=:B1)
  43 - access("ACT2"."APPLICANT_ID"=:B1 AND "STATUS_FLAG"='C')
  44 - filter(("CATEGORY_ID"=:B1 AND INTERNAL_FUNCTION("CATEGORY_CODE")))
  45 - access(ROWID=ROWID)
  46 - access("CATEGORY_ID"=:B1)
  48 - access(("CATEGORY_CODE"='BACKWRDC' OR "CATEGORY_CODE"='BACKWRDE'))


   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))

Figure 7-1

7.5 Apart from the predicates in the final shortlist you probably noticed further bind variables at operation 41, 43, 44, and 46 – but these are all named :B1, which is Oracle flagging up the need to pass correlating values into filter subqueries.

7.6 Operation 25 (where we test almost all the predicates) is one of the first operations to drive the query while operation 2 (where we test the :p_applicant_id) is close to the very last thing we do in the execution plan. So we generate a load of data for a college, course, and couple of other predicates, sort it then – at the last moment – decide that we only want a few (:p_seats) rows and count how many rows we’ve found for the specific applicant – and we do that a very large number of times. This takes me back to section 2 where I asked a couple of critical questions:

7.6.1 (2.3) First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply,

7.6.2 (2.8) I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

7.7 We now know the answer to the first query – that predicate isn’t going anywhere, and we recognise why not, of course: it’s a consequence of the “rownum” pseudo-column which has to be evaluated for all the generated data before the rownum restriction can be applied: “select for the applicant then apply the rownum” is very different from “apply the rownum (column and predicate) then restrict to the applicant”. That brings us to the second question – why would you generate all the data, then order it, then restrict it to the first N, and then count how many times a specific applicant appeared? And there’s one “valid” answer to the last bit of the question – what if you’re not really trying to count how many times the applicant appeared, you’re only trying to find out whether the count is zero or non-zero.

7.8 The intent of the query is to answer the question: “does this applicant appear in the first N candidates”. Once you’ve realised this the underlying performance problem with the query becomes clear. In the monitored example show here the query found 171 applicants that matched the initial predicates – and at some point in the batch it’s going to do the same work all over again for each of the other 170 applicants that we’ve discarded. For each combination of the initial predicates (excluding applicant id and seat count) we run the query N times if there are N candidates that match that combination. It’s bad enough that this query took 0.02 seconds to run and would have run 172 times (for a total of 3,4 seconds) but another sample run took 0.05 seconds to run identifying 1,835 applicants (which means another 1,834 executions for a total of 91 seconds run time).


 

8.0 Resolution

8.1 There is a serious flaw in the design of this application. We are seeing a piece of code running once per applicant_id when (with some minor variations) it looks as if it should be running no more than once per set of distinct combinations of (course_id, college_id, medium_id, hostel_required). In fact, if the set of distinct combinations could be generated by a simple query, you could imagine the entire required result set as a join between two non-mergable views, with a little row-by-row post processing – but that might be too ambitious a change to implement in the short term.

8.2 Realistically (as a low-risk strategy) it might be possible to keep a very large percentage of the existing code structure for whatever this task does but precede it with a PL/SQL loop that steps through each of the distinct combinations required, populating a table (perhaps an IOT) with {applicant_id, (combination columns), “rownum”); and then replace our problem query by a simple primary key look up to find the saved “rownum” for each applicant and combination, to check whether the stored “rownum” was fell within the required seat count.


 

9.0 Summaryi

9.1 For a DBA working on-site, or a consultant on a short-term visit, the analysis shown in this post is probably not how things would have progressed. I could imagine the sequence of events being more like:

9.1.1 This “start of year / start of term” batch job takes too long

9.1.2 What is it trying to achieve (business overview) – sketch an outline of the process (technical overview)

9.1.3 Trace the job and discover most of the time went on this query

9.1.4 Investigate the logic of this query and why it is run for every applicant_id

9.1.5 Recognise the fundamental design threat then choose between three possible strategies:

9.1.5.1 make the query much faster

9.1.5.2 re-engineer this part of the batch completely

9.1.5.3 subvert this section of the batch to pre-build a “materialized result” and use a much simpler query to query it

9.2 Effectively, however, we’ve come in at 9.1.5.1 and run the consultation backwards. As we did so we raised an early question about the applicant_id and pushing predicates and the oddity of counting a limited list, and we finally came back to those points towards the end of the post with an educated guess about what the query was trying to achieve and how it should be reduced from “once per applicant” to “once per combination”.

9.3 Despite the post starting at the wrong place it’s quite possible that we would have reached 9.1.5.1 by following a sensible order of problem analysis, and still want to think about how the query might run more quickly – so this investigation wasn’t a total waste of time and it’s allowed us to work through a real-world query and plan in a realistic way which we can sum up in the following stages:

9.3.1 Simplify: cross-referencing between the overall plan shape, the Query Block / Object Alias information, and original query we can take out sections of the plan (sub-plans) and analyse them separately. In particular we can identify and reduce to a minimum the core of the plan that generates the final result set, calling on the various sub-plans as it goes.

9.3.2 Follow the workload: in this case we didn’t get much help from the timing information, but buffer gets, disk reads and A-rows also supply clues to where most work is done. Critically we noted that the volume of data we picked up early on in the query was needed all the way through the query – until the last moment – and we didn’t waste resources carrying and processing unnecessary rows. Along the way, of course, we compare Oracle’s predictions of data volume with the actual data volume (A-Rows = Starts * E-Rows as a guideline). We also noted a couple of opportunities where modifying indexes might eliminate table visits, potentially reducing I/O and buffer gets.

9.3.3 Check the predicates: which goes hand in hand with following the workload – how and where are our predicates used. What predicates have been generated (or eliminated) by transitive closure; which predicates are (or could be) pushed further down the plan tree to eliminate data earlier; will multiple predicates result in bad optimizer estimates followed by bad choices for access paths.

9.4 It would be nice to think that there was a simple progression, a fixed sequence of steps that one could follow to interpret an execution plan quickly and accurately. Unfortunately (like the optimisation process itself) interpretation requires a measure of looping and recursion. It’s probably always best to start with simplifying – but how much you simplify and how you pick which subplan to simplify (or start analysing in detail) depends on being able to spot where the biggest workload appears; and before you get stuck too deeply into a sub-plan you might glance down at the use of predicates because a change in one predicate might make the optimizer completely re-engineer its choice of plan. And maybe, before anything else, you’ll see a single operation which you know should (for exanple) generate about 10 rows when the optimizer is predicting 25,000 rows (or vice versa) and you’ll want to check why there’s such a bad estimate at one point in the plan before you tackle any of the harder questions.

9.5 The bottom line with execution plans is simply this: the more you practice the faster you get at spotting the clues that are worth pursuing; and the faster you spot the clues the less time you waste unpicking every little detail, and the less time you spend on the preripheral pieces of the plan the easier it becomes to keep the big picture in your mind and see how the optimizer got to where it is, and how you might want to redirect it. So pick a couple of random queries each week that produce plans of about 20 lines and use them to exercise your interpretation skills; and increase the complexity of the queries every couple of weeks.

The End

 
 

 

Footnote

I think I’ve spent more than 20 hours writing a detailed description of something that would normally take me a few minutes to do [and some poeple wonder why I’ve not yet written another book on the optimizer]. In part the difference in time is because with practice the “intuitive” skill grows and the pattern of reading is more like –

  • How does it start (ignoring the “trivial” bits around the edges)
  • How does it end ( ditto )
  • Where do we do most work reading and discarding data

What I’ve done in this note is talk about every single query block and every single line whereas in real-life I might have scanned the plan, examined about 10 lines, and done a quick check on the corresponding predicates as the first step to deciding whether or not the plan was reasonably efficient.

April 23, 2020

date_to_date

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:00 pm BST Apr 23,2020

Every now and again someone posts a piece of SQL on the Oracle Developer Forum that includes a predicate with an expression like to_date(date_column). This is a problem for several reasons – not the least being the type of performance problem that showed up in a post from a couple of years back that has just been resurrected.

Before I examine the performance detail, here’s a simple demo of the “wrong data” problem that can go unnoticed, cut-n-paste from a 12.2.0.1 session of SQL*Plus:


SQL> create table t1 (d1 date);

Table created.

SQL> insert into t1 values(sysdate);

1 row created.

SQL> select * from t1 where d1 = to_date(d1);

no rows selected

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

Session altered.

SQL> select * from t1 where d1 = to_date(d1);

D1
--------------------
22-apr-2020 15:12:36

1 row selected.


Note particularly how changing the nls_date_format can change the result of a query! (There’s another simple example on the referenced blog note.)

What’s going on? to_date(date_col) is equivalent to to_date(to_char(date_col)) using the nls_date_format to do the two conversions, and the most common default format is one that truncates the date column to date-only. So in may systems  to_date(date_col) is nearly (though doing it an expensive way) the same as trunc(date_col).

But let’s go further and see how we wreck the benefit of an index, even if we’ve made sure that we still get the correct results. The following is a minimalist model of a common billing requirement: conversion between currencies:


rem
rem     Script:         date_to_date.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        'GBP'                           from_currency,
        'USD'                           to_currency,
        trunc(sysdate - 1000) + rownum  conversion_date,
        'Corporate'                     conversion_type,
        round(
                1.25 + dbms_random.value/4,
                6
        )                               conversion_rate,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 1000 -- > comment to avoid WordPress format issue
;

create unique index t1_i1 on t1(from_currency, to_currency, conversion_date, conversion_type)
/
alter table t1 add constraint t1_pk primary key(
        from_currency, to_currency, conversion_date, conversion_type
)
/

create table driver(
        invoice_currency        varchar2(3),
        billing_currency        varchar2(3),
        client_type             varchar2(10),
        invoice_date            date
);

insert into driver values(
        'GBP', 'USD','Corporate',trunc(sysdate)
);

commit;

execute dbms_stats.gather_table_stats(null,'driver');

I’ve created table t1 to model the exchange rates between US dollars and UK pounds over a range of about three years ending “today”. There’s also a conversion_type column in the unique key to this table that allows us to have multiple reasons for exchanges, allowing multiple exchange rates on the same day. I really ought to have a check constraint on this table that says something like: check (conversion_date = trunc(conversion_date)).

I’ve also created a “driver” table that holds the data that might be exactly the data we need to extract an exchange rate for a single invoice. So let’s run the SQL that gets the appropriate exchange rate for this one invoice:


set serveroutput off
alter session set statistics_level = all;

prompt  =====================================
prompt  First run with simple date comparison
prompt  =====================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     t1.conversion_date = driver.invoice_date
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  =================================
prompt  Now run with to_date(date_column)
prompt  =================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     to_date(t1.conversion_date) = to_date(driver.invoice_date)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

In the first case I’ve used the correct join predicate between these tables; in the second I’ve put in a redundant to_date() function call at both ends of the predicate. (If you think this is unrealistic – it’s an exact match for the production code I reported in the blog note I cited above).

Here are the two execution plans – with their rowsource execution stats:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      10 |      8 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      10 |      8 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |       9 |      8 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX UNIQUE SCAN         | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       2 |      8 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_DATE"="DRIVER"."INVOICE_DATE"
              AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")


----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      15 |      4 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      15 |      4 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |      14 |      4 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX RANGE SCAN          | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       7 |      4 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")
       filter((TO_DATE(INTERNAL_FUNCTION("T1"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("DRIVE
              R"."INVOICE_DATE")) AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE"))

Three things to take note of:

  1. The index unique scan at operation 4 has changed to an index range scan.
  2. The predicate information for operation 4 has changed from a pure access predicate to an access predicate plus a very messy filter predicate
  3. The range scan that now appears at operation 4 gets 7 buffers (that’s one root block and 6 leaf blocks) to find the one rowid we need, and will have applied the messy filter predicate to all 1,000 index entries where the two currency codes were GBP/USD.

In the case of the older article the problem query was spending almost all of its time in a join like this, doing 59 buffer gets to find a single rowid for every invoice presented – possibly doing thousands of comparisons as it went.

 

April 22, 2020

Eureka!

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:08 pm BST Apr 22,2020

I woke up last night with a brilliant solution to a problem that’s been bugging me for more than a year. How does a call to report_sql_monitor() manage to produce output like this:

SQL Plan Monitoring Details (Plan Hash Value=4262489872)
======================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  3 |     NESTED LOOPS                |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      5M | 4686 |        13 |     +2 |     1 |       4M |  1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        14 |     +1 |    5M |       4M |  5494 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        13 |     +2 |    5M |       4M | 17743 | 260MB |          |                 |
======================================================================================================================================================


If you’re wondering why this is a problematic output take a look at the A-Rows column for operation 4 (and 5), and the Execs colukmn for operations 5 (and 6). Given the nature of a nested loop join the number of Execs of operation 5 should match the number of rows (A-rows) generated by operation 4, and the number of Execs of operation 6 should match the number of rows generated by operation 5.

But Oracle claims to have generated 4 million rows then executed 5 million times in both cases. How does that happen?

The very simple thought I had last night was “formatting” – if the actual value is 4,500,000 maybe the A-rows rounds down and the Execs rounds up, so I set up a little test to check the hypothesis. Since the version where I’d first seen the anomaly was 12.1.0.2 that’s the version I tested first.

rem
rem     Script:         report_sql_monitor_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

drop table t2 purge;
drop table t1 purge;

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',25,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4500100 -- > comment to avoid WordPress format issue
/

create index t1_i1 on t1(id);

create table t2 as select * from t1;
create index t2_i1 on t2(id);

alter system flush shared_pool;
alter system flush buffer_cache;

select  /*+
                monitor
                qb_name(main)
                leading(@inline t1@inline t2@inline)
                use_nl_with_index(@inline t2@inline)
        */
        count(n2)
from    (
        select
                /*+
                        qb_name(inline)
                */
                t1.n1, t2.n1 n2
        from    t1, t2
        where   t1.id <= 4499999
--      where   t1.id <= 4500000
--      where   t1.id <= 4500001
        and     t2.id =  t1.n1
        )
;

prompt  =======================================
prompt  Now run report_sql_monitor.sql from SYS
prompt  =======================================

You’ll note that I’ve allowed three possible values for the number of rows I get from t1 to drive the index probe into t2 – I thought I’d have to do several experiments to make the oddity appear. In fact it appeared on the very first attempt: A-Rows reported 4M, and Execs report a clearly incorrect 5M. (Repeating the test on 12.2.0.1 the anomaly disappeared – all the output reported 4M for 4,499,999; and 5M for 4,500,000.)

You’ll notice the “flush buffer_cache” command in this script, this came about because of some secondary testing. Because I wanted to check a few variations I commented out the “drop table” commands. The consequence of not dropping the tables was that the anomaly disappeared – in some way it was connected to reading blocks from disc and didn’t appear once the table and index were cached. In fact when I set the driving count to 4,998,000 in one of my tests I got a result that looked slightly more counter-intuitive than the original:

=====================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
=====================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  3 |     NESTED LOOPS                |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      4M | 4686 |        11 |     +2 |     1 |       4M | 1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        12 |     +1 |    4M |       4M |  345 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        11 |     +2 |    5M |       4M | 1048 | 262MB |          |                 |
=====================================================================================================================================================


Notice how I’ve (“correctly”) acquried 4M index rowids at operation 5, but (apparently) managed to access the table 5M times!

As I noted above, the oddity seems to have disappeared in 12.2.0.1 so it’s not worth pursuing further. I think it might relate either to Oracle double counting the Exec when its first access is a phyiscal read, or maybe it’s managing to increment the wrong counter when it visits a space management block to find the next batch of blocks to read. The detail is really not very important.

What might be important, though, is that sometimes the number of Execs you see is truthfully greater than ought to be possible because of the strange effects that can appear with read-consistency. For a very long time I believed that the oddity I’ve show here was something to do with read-consistency and query restart, but I couldn’t come up with a hypothesis to explain how that could be happening. So I was very relieved last night to come up with a simple explanation (which actually turned out to be wrong, but did at least help me recognise how the anomaly could happen in perfectly harmless circumstances).

Footnote:

What’s the difference between 4 million and 5 million – sometimes it’s one (just one, not one million). When testing under 12.2.0.1 I only had to add one row to the driving rowsource (changing 4,499,999 to 4,500,000) to see the A-rows and Execs change from 4M to 5M. If you forget the implicit rounding errors in reports like this you can end up chasing a tiny anomaly because it looks like a huge anomaly.

 

April 10, 2020

raw timestamp

Filed under: Histograms,Oracle,Troubleshooting — Jonathan Lewis @ 6:59 pm BST Apr 10,2020

Many years ago I wrote a note with some comments about conveting the low_value and high_value columns of user_tab_columns to see what values they actually held. At the time the dbms_stats package held a few procedures to convert raw values and it was necessary to wrap these procedures in user-defined functions if you wanted a function to do the job.

In recent versions of Oracle (18c onwards) the package has finally acquired several functions to do a proper job, these are:

  • convert_raw_to_bin_double()
  • convert_raw_to_bin_float()
  • convert_raw_to_date()
  • convert_raw_to_number()
  • convert_raw_to_nvarchar()
  • convert_raw_to_rowid()
  • convert_raw_to_varchar()

You might note that these functions do not appear in the PL/SQL Packages and Types Reference, but they are documented in the script $ORACLE_HOME/rdbms/admin/dbmsstat.sql that creates the package:


-- Convert the internal representation of a minimum or maximum value
-- into a datatype-specific value. The minval and maxval fields
-- of the StatRec structure as filled in by get_column_stats or
-- prepare_column_values are appropriate values for input.

One thing you’ll notice about the list is that there’s no convert_raw_to_timestamp(), and a question came up recently on the oracle-l list server asking how to do this. This note answers that question and, in passing, demonstrates the typical use of the other functions.

As ever we start by creating some data:


rem
rem     Script:         raw_to_timestamp.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1(
        t0 timestamp(0),
        t3 timestamp(3),
        t6 timestamp(6),
        t9 timestamp(9),
        ts timestamp
);

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.123456789','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.987654321','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 2'
        );
end;
/

I’ve creaed a table to hold timestamps constrained to different levels of precision. The maximum precision allowed is 9 decimal places for the seconds, the default (column ts) is 6. I’ve then created two rows in the table using slightly different timestamps for the rows but giving all the columns in a single row the same value. Then I’ve gathered stats – including a histogram – on the table and all its columns.

I can now query user_tab_cols to pick up the low ahd high values:


select
        column_name, low_value, high_value
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  HIGH_VALUE
-------------------- -------------------------- --------------------------
T0                   7878040116100B             7878040116100C
T3                   7878040116100B0754D4C0     7878040116100B3AE3AF00
T6                   7878040116100B075BCDE8     7878040116100B3ADE6770
T9                   7878040116100B075BCD15     7878040116100B3ADE68B1
TS                   7878040116100B075BCDE8     7878040116100B3ADE6770


And here’s a dump of the columns so that you can see the actual values held in the table in their internal representation.


select
        dump(t0,16), 
        dump(t3,16), 
        dump(t6,16), 
        dump(t9,16), 
        dump(ts,16) 
from 
        t1;


DUMP(T0,16)
-------------------------------------------------------------------------------------
DUMP(T3,16)
-------------------------------------------------------------------------------------
DUMP(T6,16)
-------------------------------------------------------------------------------------
DUMP(T9,16)
-------------------------------------------------------------------------------------
DUMP(TS,16)
-------------------------------------------------------------------------------------
Typ=180 Len=7: 78,78,4,1,16,10,b
Typ=180 Len=11: 78,78,4,1,16,10,b,7,54,d4,c0
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,15
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8

Typ=180 Len=7: 78,78,4,1,16,10,c
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,e3,af,0
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,68,b1
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70

You’ll notice that, despite the attempt to limit the precision, all the columns other than the first use 11 bytes. The first column is different, with zero precision we use only 7 bytes and if you look closely (and know your internal storage formats) you’ll realise that those 7 bytes are using the standard format for stored dates. The last 4 bytes hold the fraction of the second stored as a number of nano-seconds as a simple 32-bit binary number. (Warning: it’s possible that you will see a difference in this set of bytes if your machine uses a different endianness from mine – I can’t test that for myself at present.)

So let’s see what we get if we try to convert the raw values using the call to dbms_stats.convert_raw_to_date(). I’ll start by setting the nls_XXX formats to get the full conntent of a date column or a timestamp column reported from SQL*Plus.


alter session set nls_date_format = 'dd-Mon-yyyy hh24:mi:ss';
alter session set nls_timestamp_format = 'dd-Mon-yyyy hh24:mi:ss.ff9';

set linesize 165

column t0 format a32
column t3 format a32
column t6 format a32
column t9 format a32
column ts format a32

select  t0, t3, t6, t9, ts from t1;


select
        column_name, 
        dbms_stats.convert_raw_to_date(low_value)       date_low,
        dbms_stats.convert_raw_to_date(high_value)      date_high
from 
        user_tab_cols
where
        table_name = 'T1'
order by 
        column_name
/



T0                               T3                               T6                               T9                               TS
-------------------------------- -------------------------------- -------------------------------- -------------------------------- --------------------------------
01-Apr-2020 21:15:10.000000000   01-Apr-2020 21:15:10.123000000   01-Apr-2020 21:15:10.123457000   01-Apr-2020 21:15:10.123456789   01-Apr-2020 21:15:10.123457000
01-Apr-2020 21:15:11.000000000   01-Apr-2020 21:15:10.988000000   01-Apr-2020 21:15:10.987654000   01-Apr-2020 21:15:10.987654321   01-Apr-2020 21:15:10.987654000

2 rows selected.


COLUMN_NAME          DATE_LOW             DATE_HIGH
-------------------- -------------------- --------------------
T0                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:11
T3                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T6                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T9                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
TS                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10

5 rows selected.

Since I’ve inserted only two rows, with each row holding a single value with different precision, we expect those values to be reflected in the low_value and high_value for the columns – and we’re not disappointed (probably) but, as you might have expected, Oracle has processed the timestamp data type into a date data type by simply ignoring the last 4 bytes and dealing with the first 7 bytes as if they were a date. We need to do a little more check that we can convert the low and high values into exactly the values that appear in the table itself.

So we can write a messy bit of SQL that extracts the first 7 bytes, converts them using the convert_to_date() call, then extracts the last 4 bytes and converts them to a number using a simple to_number() call (with an nvl() thrown in to deal with the special case of there being no nanosecond component), divides by 1e9, converts the result to an interval in seconds using the numtointervalds() function, and adds that to the date. To keep things clean I’ll only apply the mess to the low_value column. You’ll note that I’ve


select 
        column_name, 
        low_value,
--
--      convert bytes 8 onwards to numeric with an nvl()
--      to handle the case of zero precision when there 
--      are no bytes to show the number of nanoseconds
--
--      Note - I've used substr() to bytes 8 onwards turns
--      into characters 15 - 22.
--
        nvl(
                to_number(
                        hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                ),
                0
        ) nanoseconds,
--
--      repeat the conversion, and wrap with numtodsinterval()
--      to show the resulting interval in seconds.
--
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                )+0,
                'SECOND'
        ) interval_val,
--
--      Converted the 1st 7 bytes (14 characters) to a date,
--      coerce the result to a timestamp, then convert the 
--      nanoseconds bytes to an interval and add
--
        to_timestamp(
                dbms_stats.convert_raw_to_date(hextoraw(substr(low_value,1,14)))
        ) + 
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                ),
                'SECOND'
        ) timestamp_val
from 
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  NANOSECONDS INTERVAL_VAL                     TIMESTAMP_VAL
-------------------- -------------------------- ----------- -------------------------------- --------------------------------
T0                   7878040116100B                       0 +000000000 00:00:00.000000000    01-Apr-2020 21:15:10.000000000
T3                   7878040116100B0754D4C0       123000000 +000000000 00:00:00.123000000    01-Apr-2020 21:15:10.123000000
T6                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000
T9                   7878040116100B075BCD15       123456789 +000000000 00:00:00.123456789    01-Apr-2020 21:15:10.123456789
TS                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000

5 rows selected.

If you go back to the original output we got from selecting the actual values from the table you can see that the timestamp_val column we’ve generated correctly matches the column values for the lower of the two original values we inserted into the table.

Q.E.D.

March 12, 2020

Sequence Costs

Filed under: Bugs,Oracle,Performance,trace files,Troubleshooting — Jonathan Lewis @ 8:18 pm GMT Mar 12,2020

[Update: make sure you read the comments – the behaviour described here is a bug introduced in 19c]

You’re probably aware of the “identity” construct that appeared in 12.1 and uses Oracle’s sequence mechanism to model the Autonumber or Identity columns that other databases have. A posting from Clay Jackson on the Oracle-L list server suggests that something about their code path has introduced a surprising overhead in 19c … when you’re not using them.

The following code is a slightly modified version of a test case that Clay Jackson posted to demonstrate a strange difference in performance between 12.2 and 19.3

rem
rem     Script:         19c_sequence_cost.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2020
rem
rem     Last tested
rem             19.3.0.0
rem             12.2.0.1
rem

drop table test_tab2;
drop table test_tab1;

drop sequence test_seq1;
drop sequence test_seq2;

create sequence test_seq1;
create sequence test_seq2;

create table test_tab1 (
   id number primary key,
   c1 varchar2(1000)
);

create table test_tab2 (
   id number,
   t1_id number,
   c2 varchar2(1000),
   constraint test_tab2_fk foreign key (t1_id)
        references test_tab1 (id) on delete cascade
);

spool 19c_sequence_cost.lst

set feedback off
set pagesize 0
set serveroutput off

execute snap_my_stats.start_snap
alter session set events '10046 trace name context forever';

prompt  ==============================================================================
prompt  start_1000.sql cascades to start_1.sql
prompt  start_1.sql holds:
prompt
prompt  insert into test_tab1 values (test_seq1.nextval,'hello');
prompt  insert into test_tab2 values (test_seq2.nextval, test_seq1.currval, 'byebye');
prompt  commit;
prompt
prompt  ==============================================================================

start start_1000

set serveroutput on
set pagesize 60
alter session set events '10046 trace name context off';
execute snap_my_stats.end_snap

spool off

I’ve got a couple of tables with columns that I plan to generate from sequences, and I’ve got a referential integrity constraint between those tables. I’ll be using nextval from one sequence to populate the first table, then use currval from the same sequence for the foreign key and nextval from the other sequence as a unique identifier for the second table.

I’ve used my “repeater” code to run a short script from the SQL*Plus prompt 1,000 times (as suggested by Clay Jackson in his posting). I’ve also added lines to enable SQL_trace at the basic level, and taken a snapshot of the session activity stats.

Just as Clay Jackson had reported – 19.3.0.0 took significantly longer than 12.2.0.1 to complete the 1,000 calls to the script. (Though in absolute terms we’re only talking fractions of a second for a small single user test.) Examination of the stats – which prompted me to add the 10046 trace and repeat – made it very clear where the difference lay. Here are a few figures picked from the session activity stats:

Version    Statistic                        Value
========   ========================         =====
12.2.0.1   CPU used by this session            53
19.3.0.0   CPU used by this session           114

12.2.0.1   recursive cpu usage                 18
19.3.0.0   recursive cpu usage                 44

12.2.0.1   recursive calls                  1,182
19.3.0.0   recursive calls                  7,215

12.2.0.1   execute count                    2,137
19.3.0.0   execute count                    8,151

Clearly it’s worth finding out what was happening in those 6,000 extra recursive calls. And this is what the 19.3 trace file (after passing through tkprof … sort=execnt) showed:


select count(1)
from
 sys.idnseq$ where seqobj# = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute   6000      0.07       0.09          0          0          0           0
Fetch     6000      0.03       0.04          0       6000          0        6000
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    12001      0.10       0.14          0       6000          0        6000

If you can’t guess what the table idnseq$ is about, the following definition appears in $ORACLE_HOME/rdbms/admin/dcore.bsq:

create table idnseq$     /* stores table identity column to sequence mapping */
( obj#         number not null,                       /* table object number */
  intcol#      number not null,                    /* identity column number */
  seqobj#      number not null,                    /* sequence object number */
  startwith    number not null,                   /* sequence starting value */
  supplemental log group idnseq$_log_grp (obj#, intcol#, seqobj#) always
)
/

We’re not using identity columns in this test, but every time we run the script we do six checks against this data dictionary table to see (presumably) if we’re using a sequence that is internally associated with an identity column. The requirement doesn’t seem completely unreasonable – but it’s a bit surprising that we do every check twice. (Note – we have to check test_seq1 for both statements but test_seq2 only for the second statement, which is how we get 6,000 executions: 2 * (2 checks for test_seq1 + 1 check for test_seq2) * 1,000 executions of the script.

The doubling looks like a bug, and you have to wonder why a suitable piece of information isn’t kept in the dictionary cache anyway to allow Oracle to avoid executing the recursive statement. I should point out that the surprisingly large impact is visible because I’ve executed 2,000 separate statements from the client side; mechanisms like array inserts and pl/sql loops might only have to do this check once for each array or pl/sql call. (Tests of this hypothesis are left as exercise for the interested reader.)

January 13, 2020

Collections

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 2:31 pm GMT Jan 13,2020

This is a note I drafted in September 2015 and only rediscovered a couple of days ago while searching for something I was sure I’d written about collections and/or table functions. The intention of collections and table functions is that they should behave like tables when you use them in a query – but there are cases where a real table and something cast to a table() aren’t treated the same way by the optimizer – and this 4-year old note (which is still valid in 2020 for 19c) is one of those cases.

 

There was a question – with test case – on Oracle-L recently [ed: now more than 4 years ago] about the behaviour of a query that changed plans as you switched from using a global temporary table to a collection – why was Oracle doing something inefficient with the collection. The answer was: “Bad luck, it’s a limitation in the optimizer”.  (Sub-text: collections are a pain).

The test case was short and simple so I thought I’d post it – with an h/t to Patrick Jolliffe who presented the probem and Timur Akhmadeev and Stefan Koehler who explained the problems.

Here’s the script (with a little cosmetic editing) to create the necessary objects and data:

rem
rem     Script:         collections.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2015
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2
rem

create or replace type number_table is table of number;
/

create table test_objects as select * from all_objects;
create /* unique */ index test_objects_idx on test_objects(object_id);

exec dbms_stats.gather_table_stats(null, 'test_objects');

create global temporary table gtt_test_objects (object_id number);
insert into gtt_test_objects values (1);


In this example I’ve created a type which is a simple table of number. In a more general case you might create a simple object type, and then a type that was a table of that object type, then you might create a function that returned a variable of that table type, or a function that was declared to return the table type “pipelined” and uses the “pipe row” instruction in the code to return one value of the simple object type at a time. Whichever variation you used you could then use the table() operator to tell Oracle to treat the content of the table type as if it were a relational table. (In recent versions of Oracle the table() operator is redundant).

Here’s the first query, which uses the global temporary table in an “IN” subquery, followed by its execution plan – again with a little cosmetic editing and the addition of query block names across the board:


prompt  ==================================
prompt  Query using global temporary table
prompt  ==================================

select  
        /*+ qb_name(main) */ 
        null 
from    (
        select
                /*+ qb_name(inline) */
                distinct object_id 
        from    test_objects
        ) 
where   object_id in (
                select 
                        /*+
                                qb_name(subq)
                                cardinality(gtt_test_objects 1) 
                        */ 
                        gtt_test_objects.object_id
                from
                        gtt_test_objects 
        )
;


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |      1 |        |      0 |00:00:00.01 |       5 |       |       |          |
|   1 |  VIEW                 | VM_NWVW_1        |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   2 |   SORT UNIQUE NOSORT  |                  |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   3 |    NESTED LOOPS       |                  |      1 |      1 |      0 |00:00:00.01 |       5 |       |       |          |
|   4 |     SORT UNIQUE       |                  |      1 |      1 |      1 |00:00:00.01 |       3 |  2048 |  2048 | 2048  (0)|
|   5 |      TABLE ACCESS FULL| GTT_TEST_OBJECTS |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|*  6 |     INDEX RANGE SCAN  | TEST_OBJECTS_IDX |      1 |      1 |      0 |00:00:00.01 |       2 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_ID"="GTT_TEST_OBJECTS"."OBJECT_ID")

As you can see I’ve set statistics_level to all, and used dbms_xplan.display_cursor() to pull the actual execution plan from memory. This plan tells us that the optimizer unnested the IN subquery to generate a unique set of values and used that unique set to drive a nested loop join into the test_objects table (with an index-only probe). Moreover, before this step, the optimizer used complex view merging and cost-based query transformation to postpone the “distinct” from the original query to do the join before distinct. The E-rows at operation 5 also tells us that the optimizer “knew” that there was only one row in the GTT – it took note of my cardinality() hint.

Now we replace with gtt_test_objects table with the collection – casting it to a table() and giving Oracle the same cardinality() hint – as follows:


select 
        /*+ 
                qb_name(main)
--              no_use_hash_aggregation(@sel$1)
        */ 
        null
from    (
        select  
                /*+ inline */
                distinct object_id 
        from    test_objects
        )
where   object_id in (
                select 
                        /*+ 
                                qb_name(subq)
                                cardinality(gtt_test_objects 1) 
                        */ 
                        column_value object_id
                from
                        table(number_table(1)) gtt_test_objects
        )
;

-------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |      1 |        |      0 |00:00:00.08 |     132 |       |       |          |
|   1 |  MERGE JOIN SEMI                        |                  |      1 |      1 |      0 |00:00:00.08 |     132 |       |       |          |
|   2 |   SORT JOIN                             |                  |      1 |  56762 |      1 |00:00:00.08 |     132 |  1470K|   606K| 1306K (0)|
|   3 |    VIEW                                 |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |       |       |          |
|   4 |     HASH UNIQUE                         |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |  4122K|  2749K| 3418K (0)|
|   5 |      INDEX FAST FULL SCAN               | TEST_OBJECTS_IDX |      1 |  56762 |  56762 |00:00:00.01 |     132 |       |       |          |
|*  6 |   SORT UNIQUE                           |                  |      1 |      1 |      0 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   7 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|                  |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access("OBJECT_ID"=VALUE(KOKBF$))
       filter("OBJECT_ID"=VALUE(KOKBF$))

The second plan is completely different. The optimizer has unnested the subquery to produce a join, but instead of using the unique set of values that it generated from the collection to drive a nested loop it’s decide to do a merge semi-join, which has entailed an expensive fast full scan of the test_objects_idx index to acquire all the key values first.

I tried to make the optimizer use the collection to drive a nested loop, adding some carefully targeted hints to force the join order and dictate a nested loop join with pushed predicate: but the optimizer wouldn’t push the “obvious” join predicate and continued to do an index fast full scan and sort of the text_object_idx. If you’re interested here are the hints and the resulting plan:

/*+
        qb_name(main)
        leading( @sel$8969f1c9 kokbf$0@sel$2 "from$_subquery$_001"@main)
        use_nl( @sel$8969f1c9 "from$_subquery$_001"@main)
        push_pred(@sel$8969f1c9 "from$_subquery$_001"@main)
*/

-------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |      1 |        |      0 |00:00:00.03 |     132 |       |       |          |
|   1 |  NESTED LOOPS                           |                  |      1 |      1 |      0 |00:00:00.03 |     132 |       |       |          |
|   2 |   SORT UNIQUE                           |                  |      1 |      1 |      1 |00:00:00.01 |       0 |  2048 |  2048 | 2048  (0)|
|   3 |    COLLECTION ITERATOR CONSTRUCTOR FETCH|                  |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|*  4 |   VIEW                                  |                  |      1 |      1 |      0 |00:00:00.03 |     132 |       |       |          |
|   5 |    SORT UNIQUE                          |                  |      1 |  56762 |  56762 |00:00:00.03 |     132 |  2604K|   728K| 2314K (0)|
|   6 |     INDEX FAST FULL SCAN                | TEST_OBJECTS_IDX |      1 |  56762 |  56762 |00:00:00.01 |     132 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("OBJECT_ID"=VALUE(KOKBF$))

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 6 (U - Unused (1))
---------------------------------------------------------------------------
0 - SEL$102722C0
- qb_name(subq)

1 - SEL$8969F1C9
- leading( @sel$8969f1c9 kokbf$0@sel$2 "from$_subquery$_001"@main)
- qb_name(main)

1 - SEL$8969F1C9 / from$_subquery$_001@MAIN
U - push_pred(@sel$8969f1c9 "from$_subquery$_001"@main)
- use_nl( @sel$8969f1c9 "from$_subquery$_001"@main)

5 - SEL$1
- inline

In the previous post we had a “NOT IN” subquery against a collection/pipelined table function that couldn’t even be unnested (even in 19c); in this example we have an IN subquery that does unnest but then can’t drive a nested loop efficiently because the optimizer won’t push the collection values into the distinct view, and won’t do complex view merging to avoid having to do that join predicate pushdown. Collections and table functions() just don’t play nicely with the optimizer!

In fact this plan also shows one of those “generic” approaches in the optimizer that allows a human operator to see a special case that could have been further optimized: if the optimizer had used a sort unique rather than a hash unique at operation 4 then the sort join at operation 2 would have been redundant – with an overall reduction in memory and CPU usage that I managed to get in a separate test by adding the hint /*+ no_use_hash_aggregation(@sel$1) */ to the query. (Since operation 6 is also a sort unique the merge join semi could, in principle, have become a merge join with no risk of producing duplicates – but the semi-join code path is probably a little more efficient, anyway, and a balance has to be struck between the risk of introducing complexity for a special case and the potential frequency and scale of the benefit it might produce.)

Conclusion

You can often see collections and table functions behaving very like tables when you use them in the from clause of queries – but there are some restrictions on the transformations that the optimizer can use when your query isn’t using “real” tables.

Footnote

There are many ways that you can play around with this starting model to investigate where the boundaries might be. For example, if I make the index on test_objects unique the plan changes to a simple nested loop driven by the unnested collection (there’s no longer a non-mergeable view in the way). If I eliminate the distinct from the original query the same thing happens (for the same reason). If I force the join order to start with the collection (using the leading() hint) but don’t hint a nested loop Oracle produces (at least in my case) a hash join with a Bloom filter that minimised the memory and and CPU requirement.

I mentioned at the start that Timur Akhmadeev and Stefan Koehler supplied explanations for what was going on behind the scenes. Critically Stefan also referenced one of two posts from the Oracle blog on complex view merging and its restrictions: part 1, part 2.

The related problem that led me to re-discover and complete this note is at this URL (published a couple of days ago).

January 10, 2020

Collection limitation

Filed under: Execution plans,Hints,Oracle,Troubleshooting — Jonathan Lewis @ 5:43 pm GMT Jan 10,2020

The ODC SQL and PL/SQL forum came up with an example a couple of days ago that highlighted an annoying limitation in the optimizer’s handling of table functions. The requirement was for a piece of SQL that would generate “installments” information from a table of contract agreements and insert into another table any installments that were not yet recorded there.

The mechanism to turn a single row of contract data into a set of installments was a (optionally pipelined) table function that involved some business logic that (presumably) dealt with the timing and size of the installments. The final SQL to create the data that needed to be inserted was reported as follows (though it had clearly been somewhat modified):

select 
        y.*,
        trunc(sysdate,'mm'),
        user
from 
        table_a a
join 
        table(
                my_function(
                        a.loan_acct_nbr, 
                        a.start_dt,
                        a.maturity_dt,
                        a.num_of_terms
                )
        ) y
on 
        a.loan_acct_nbr = y.loan_acct_nbr
where 
        a.yr_mnth_cd = to_char(add_months(sysdate,-1),'yyyymm')       -- last month
and     a.loan_typ   = 'ABC'
and     not exists ( 
                select  1 
                from
                        final_load_table l
                where
                        l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd 
        )
;

In this query, table_a is the set of contracts, final_load_table is the set of installments, and my_function() is the pipelined function returning a table of installments derived from the start date, maturity date, and number of installments for a contract. The query needs a “not exists” subquery to eliminate any installments that are already known to the database. Once this query is operating efficiently it could be used either to drive a PL/SQL loop or (generally the better strategy) to do a simple “insert as select”.

We were told that the function would return around 60 rows per contract; that the starting size of the final_load_table would be about 60M rows, and the size of the result set from the initial join would be about 60M or 70M rows (which suggests about 1M rows acquired from table_a).

The owner of this problem seemed to be saying that the query had returned no data after 4 or 5 hours – which suggests (a) the optimizer chose a bad plan and/or (b) the PL/SQL function is working very inefficiently and/or (c) the join had generated a vast amount of data but the effect of the subquery was to discard all of it .

Clearly we needed to see an execution plan (preferably from SQL Monitor) and be given have some idea of how much of the 60M/70M rows predicted for the join would be discarded because it already existed.

The owner did send us an execution plan – which included a very obvious threat and suggested a couple of optimizer errors – but supplied it as a picture rather than a flat text.

You’ll notice, of course, that the tables and columns have changed from the original statement. More significantly, though, there are serious problems with the numbers – the estimated row count for the basic join is only 5,326 rather than 50 Million which, at first sight, is probably why the optimizer has decided that a filter subquery (operation 1) to do an index-only probe (operation 5) is a good way of handling the subquery. Perhaps if the estimates had been a little more accurate (e.g. through the addition of a couple of column groups or, if necessary, by opt_estimate() or cardinality() hints) the subquery would have been unnested and turned into a hash anti-join.

I suggested a quick test of a suitable cardinality() hint – but ran up a little model to check that I’d got the hint right – and found that I had but it wasn’t going to help. So I decided to write up the model (and a possible solution for the owner of the problem) in this blog note.

Here’s the code to create the set of objects I’m going to work with. The naming follows the naming in the original statement of the problem suggested by the owner:


rem
rem     Script:         table_function_plan.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table table_a (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        start_dt        date,
        maturity_dt     date    ,
        number_of_terms number,
        constraint ta_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

execute dbms_random.seed(0)

insert /*+ append */
into    table_a
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        sysdate-(365-mod(rownum,365)),
        sysdate+(1500+mod(rownum,365)),
        60
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table final_load_table_l(
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200),
        constraint lt_pk primary key(loan_acct_nbr, yr_mnth_cd)
)
;

insert /*+ append */ into final_load_table_l
with generator as (
        select rownum id
        from    dual
        connect by
                level <= 4000 -- > comment to avoid wordpress issue
)
select
        trunc(dbms_random.value(1e9, 2e9)),
        to_char(sysdate-(365-mod(rownum,365)),'yyyymm'),
        lpad(rownum,10),
        lpad('x',200,'x')
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'table_a',
                method_opt  => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'final_load_table_l',
                method_opt  => 'for all columns size 1'
        );
end;
/

create type my_row_type as object (
        loan_acct_nbr   number,
        yr_mnth_cd      varchar2(6),
        v1              varchar2(10),
        padding         varchar2(200)
);
/

create type my_table_type as table of my_row_type;
/

create  or replace function my_function (
        i_loan_acct_nbr         in      number,
        i_yr_mnth_cd            in      varchar2,
        i_start_dt              in      date,
        i_maturity_dt           in      date,
        i_number_of_terms       in      number
)       return  my_table_type pipelined
as
begin
        for i in 1..i_number_of_terms loop
                pipe row (
                        my_row_type(
                                i_loan_acct_nbr,
                                to_char(i_start_dt+32*i,'yyyymm'),
                                i,
                                lpad('x',200,'x')
                        )
                );
        end loop;
        return;
end;
/

I was planning to create some large tables – hence the option to generate 16M rows from my generator CTEs – but I discovered the critical problem almost as soon as I had some data and code in place, so I didn’t need to go large.

I’ve had to create an object type and table type in order to create a pipelined function that returns the table type by piping rows of the object type. The data I’ve created, and the way the function generates data probably doesn’t bear much resemblance to the real system of course, but I don’t think it needs to be very close to make the crucial point.

Here’s the text of the select statement the OP wants to run, with the execution plan I got from my data set after running the query and pulling the plan from memory:

alter session set statistics_level = all;

select 
        /*+ find this 1 */
        y.* 
from 
        table_a a, 
        table(my_function(
                a.loan_acct_nbr,
                a.yr_mnth_cd,
                a.start_dt,
                a.maturity_dt,
                a.number_of_terms
        )) y
where 
        a.yr_mnth_cd = '202001'
and     not exists (
                select  
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = y.loan_acct_nbr
                and     l.yr_mnth_cd    = y.yr_mnth_cd
        )
;

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |      1 |        |  5059K(100)|  14580 |00:00:00.15 |   16330 |
|*  1 |  FILTER                             |             |      1 |        |            |  14580 |00:00:00.15 |   16330 |
|   2 |   NESTED LOOPS                      |             |      1 |   6283K| 10421   (8)|  14580 |00:00:00.11 |     335 |
|*  3 |    TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     297 |
|   4 |    COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    14   (8)|  14580 |00:00:00.10 |      38 |
|*  5 |   INDEX UNIQUE SCAN                 | LT_PK       |  14580 |      1 |     1   (0)|      0 |00:00:00.02 |   15995 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NULL)
   3 - filter("A"."YR_MNTH_CD"='202001')
   5 - access("L"."LOAN_ACCT_NBR"=:B1 AND "L"."YR_MNTH_CD"=:B2)


I’ve put in a hint to tell the optimizer to unnest the subquery – and it didn’t. Oracle does not ignore hints (unless they’re illegal, or out of context, or the optimizer never got to them, or you’ve found a bug) so why did Oracle appear to ignore this hint? There’s a really nice feature in 19.3 execution plans – you can request a hint report for a plan, and here’s the hint report for this query (ignore the bits about “find this” being an error):


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 3 (U - Unused (1), E - Syntax error (2))
---------------------------------------------------------------------------
   0 -  SEL$1
         E -  find
         E -  this

   5 -  SEL$3
         U -  unnest / Invalid correlated predicates


I put in an /*+ unnest */ hint to unnest the subquery, and I’ve been told that the predicates are not valid. The only thing about them that could be invalid is that they come from a pipelined function that has returned an object type. The pipelined function does not behave exactly like a table. But this gives me a clue about forcing the unnest to happen – hide the pipelined function inside a non-mergeable view.


select
        /*+ find this 2 */
        v.*
from    (
        select  /*+ no_merge */
                y.* 
        from 
                table_a a, 
                table(my_function(
                        a.loan_acct_nbr,
                        a.yr_mnth_cd,
                        a.start_dt,
                        a.maturity_dt,
                        a.number_of_terms
                )) y
        where 
                a.yr_mnth_cd = '202001'
        )       v
where   not exists (
                select
                        /*+ unnest */
                        null
                from    final_load_table_l l
                where   l.loan_acct_nbr = v.loan_acct_nbr
                and     l.yr_mnth_cd    = v.yr_mnth_cd
        )
/


------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |             |      1 |        | 10628 (100)|  14580 |00:00:00.12 |     387 |       |       |          |
|*  1 |  HASH JOIN RIGHT ANTI                |             |      1 |   6283K| 10628  (10)|  14580 |00:00:00.12 |     387 |  1878K|  1878K| 2156K (0)|
|   2 |   INDEX FAST FULL SCAN               | LT_PK       |      1 |  10000 |     6  (17)|  10000 |00:00:00.01 |      91 |       |       |          |
|   3 |   VIEW                               |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.11 |     296 |       |       |          |
|   4 |    NESTED LOOPS                      |             |      1 |   6283K| 10371   (8)|  14580 |00:00:00.10 |     296 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | TABLE_A     |      1 |    769 |    10  (10)|    243 |00:00:00.01 |     296 |       |       |          |
|   6 |     COLLECTION ITERATOR PICKLER FETCH| MY_FUNCTION |    243 |   8168 |    13   (0)|  14580 |00:00:00.10 |       0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("L"."LOAN_ACCT_NBR"="V"."LOAN_ACCT_NBR" AND "L"."YR_MNTH_CD"="V"."YR_MNTH_CD")
   5 - filter("A"."YR_MNTH_CD"='202001')

I’ve wrapped the basic join between table_a and the table function in an inline view called (unimaginatively) v, and I’ve added a /*+ no_merge */ hint to that inline view. So the main query becomes a select from a non-mergeable view with a “not exists” subquery applied to a couple of view columns. And Oracle thinks that that’s okay – and my unnest hint encouraged the optimizer to use a hash anti-join.

So here’s an option for the owner of the problem – but with a few outstanding questions: will a rewrite of their query in this form do the same, will the performance of the select be good enough, and will the subsequent “insert as select” keep the same driving plan.

There’s one significant detail to worry about: the build table in this hash (anti-)join is going to be constructed from 50M (load_acct_bfr, yr_mnth_cd) pairs – which means the workarea requirement is likely to be about 1.2GB for an optimial (i.e. in-memory) hash join; otherwise the join may spill to disc and do a lot of I/O – probably as a one-pass hash join.

(Did you notice,by the way, that the word storage appeared at operation 3 in the original plan?  That suggests a nice big Exadata box; however, there’s no storage predicate in the Predicate Information section for that operation and you would have thought that lease_type = ‘US_SSP’ would be pushed to storage, so maybe this is a ZFS Pillar backing a less exotic server.)

Conclusion

Some (if not all) types of correlated subqueries behave badly if the correlation predicates involve table functions. But you may be able to work around the issue by hiding part of the query, including the table function, inside a non-mergeable inline view before applying the subquery to the view.

Footnote

When I realised that the table function was causing a problem unnesting I remembered that I had written about a similar problem a few years ago – after searching for a little while I discovered a draft note that I had started in September 2015 but had not quite finished; so I’ll be finishing it off and publishing it some time in the next few days. (Update: now published)

 

 

Next Page »

Powered by WordPress.com.