Oracle Scratchpad

June 5, 2020

Analytic cost error

Filed under: CBO,Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:06 pm BST Jun 5,2020

Here’s a surprising costing error that was raised on the Oracle Developer Forum a few days ago. There’s a glitch in the cost atributed to sorting when an analytic over() clause – with corresponding “window sort” operation – makes a “sort order by” operation redundant. Here’s a script to generate the data set I’ll use for a demonstration with a template for a few queries I’ll be running against the data.


rem
rem     Script:         window_sort_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                    id,
        cast(lpad(rownum,30,'0') as varchar2(30)) vc30,
        cast(lpad(rownum,65,'0') as varchar2(65)) vc65,
        lpad('x',100,'x')                         padding
from
        generator
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

select
--      row_number() over (order by vc65) rn1,
--      row_number() over (order by vc30) rn2,
--      vc30,
--      vc65,
        id
from
        t1
-- order by
--      vc65
--      vc30
/


I’m (optionally) using the row_number() analytic function over the entire data set and for each row_number() I include in the select list Oracle will have to sort the data; I’ve also got an (optional) order by on the two columns that appear in the row_number() functions and that may introduce some sorting as well. Here, for example, are a few examples of the queries I might run:


prompt  ===========================================
prompt  Select vc30, order by vc30
prompt  ===========================================

explain plan for
select 
        vc30,
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ==========================================
prompt  Select row_number over vc30 - no ordering
prompt  ==========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

prompt  ===========================================
prompt  Select row_number over vc30 - order by vc65
prompt  ===========================================

explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

In the first query we select and sort vc30 so (approximately) we sort 10,000 rows x 30(-ish) bytes for 300K of sort space. In the second query we generate the row_number() based on sorting vc30 – the size of the output is much smaller (it’s only 10,000 numbers between 1 and 10,000) but to generate those numbers we do have to select and sort vc30, so the workload (predicted and actual) will probably be similar to that of the firsrt query. In the final query we have to select and sort vc30 to generate the row_number() but we also have to select (without reporting) and sort vc65 in order to report the results in the right order – so we should expect the workload to be roughly 3 times the size (approximately 10,000 * (30 + 65) bytes). Here, from 12.2.0.1, are the execution plans (with a little cosmetic tidying):

===========================================
Select vc30, order by vc30
===========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  SORT ORDER BY     |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

==========================================
Select row_number over vc30 - no ordering
==========================================
------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|       |   130   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|   448K|   130   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|       |    42   (5)|
------------------------------------------------------------------------

===========================================
Select row_number over vc30 - order by vc65
===========================================
-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  SORT ORDER BY      |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

As expected, the execution plans are roughly consistent with the estimates I gave for volume of data – the agreement between the query with order by vc30 and the query with over(order by vc30), and the increased load of ordering by vc65 when selecting the row_number(over vc30) is good.

So let’s see what the plan looks like when we select row_number(over vc30) and then order by vc30. If the optimizer is smart it will recognise that it’s possible to adopt a strategy that allows it to take advantage of the sorting from the over() clause to avoid a separate sort order by:


explain plan for
select 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

----------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   341K|    42   (5)|
|   1 |  WINDOW SORT       |      | 10000 |   341K|    42   (5)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   341K|    42   (5)|
----------------------------------------------------------------

The plan shows us that Oracle has used the 10gR2 “order by elimination” feature to bypass the need for a “sort order by” operation because it knows the data will be arriving in the right order from the “Window Sort” operation. Ynfortunately it also shows us that Oracle has lost the cost of doing the Window Sort!

Let’s try another experiment – let’s generate two different row_number() columns, with and without ordering:


prompt  =====================================
prompt  Select Both row_numbers - no ordering
prompt  =====================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ======================================
prompt  Select Both row_numbers order by vc30
prompt  ======================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

prompt  ============================================
prompt  Select Both row_numbers order by vc65
prompt  ============================================

explain plan for
select 
        row_number() over (order by vc65) rn1, 
        row_number() over (order by vc30) rn2, 
        id 
from 
        t1
order by
        vc65
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes projection'));

You’ll notice that I’ve included a request for the projection information in the plans for these examples so that you can see what columns are passed up from each operation to its parent. Again, though, we’ll start by focusing on just the costs:


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   463   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   463   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   463   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC65")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC30")[22]
   2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22],
       "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]


-------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 10000 |   986K|       |   253   (3)|
|   1 |  WINDOW SORT        |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   WINDOW SORT       |      | 10000 |   986K|  1120K|   253   (3)|
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
-------------------------------------------------------------------------

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "VC65"[VARCHAR2,65], "VC30"[VARCHAR2,30],
       "ID"[NUMBER,22], ROW_NUMBER() OVER ( ORDER BY "VC30")[22], ROW_NUMBER()
       OVER ( ORDER BY "VC65")[22]
   2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22],
       "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]
   3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The first query – without an order by” clause – reports a cost of 463; add an order by clause and the cost drops to 253 (and the “order by” clause doesn’t appear as a sort order by operation in the plan). The cost differential between the ordered and “unordered” plans , by the way, is 210 (and from there down to the base tablescan is another 211) – and here’s another way to see that number (+/- 1) appearing:


explain plan for
select
        vc65,
        id
from
        t1
order by
        vc30
/

select * from table(dbms_xplan.display(null,null,'basic cost rows bytes'));

------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   986K|       |   253   (3)|
|   1 |  SORT ORDER BY     |      | 10000 |   986K|  1120K|   253   (3)|
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   986K|       |    42   (5)|
------------------------------------------------------------------------

The cost of the query with no “order by” clause is basically the cost of a table scan plus two sorts of (vc65, vc30, plus a few bits). When you add in an “order by” clause the optimizer discards the “order by” clause and then subtracts one of the sort costs as well.

CBO trace file

Every time I say something about 10053 (CBO) trace files I feel compelled to remind everyone that I rarely look at them, and then it’s usually because I think I know there’s a bug and where I’ll find it in the trace. That’s exactly the case here.

I’m expecting to see two differences in the trace files between the “no order” query, and a query where I’ve added in an “order by” clause. One difference is that one trace file will have an “OBYE” (eliminate order by) comment which won’t be in the other trace, one trace file will have a “cost for SORT” calculation which won’t be in the other.

So here are the relevant bits – first from the query without the order by clause:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE bypassed: no order by to eliminate.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 463.384707  card: 10000.000000  bytes: 1010000.000000
***********************

And from one of the plans with an order by:


OBYE:   Considering Order-by Elimination from view SEL$1 (#0)
***************************
Order-by elimination (OBYE)
***************************
OBYE:     OBYE performed.

...

GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
Permutations for Starting Table :0
Join order[1]:  T1[T1]#0
WiF sort
    SORT ressource         Sort statistics
      Sort width:         497 Area size:      435200 Max Area size:    87240704
      Degree:               1
      Blocks to Sort: 150 Row size:     122 Total Rows:          10000
      Initial runs:   2 Merge passes:  1 IO Cost / pass:         56
      Total IO sort cost: 206.000000      Total CPU sort cost: 12180621
      Total Temp space used: 1147000
***********************
Best so far:  Table#: 0  cost: 252.512458  card: 10000.000000  bytes: 1010000.000000
***********************

As you can see, the first (no order) trace file has two sort calculations under WiF sort, (Window Function?) while the second (order by) trace file reports “OBYE performed” and then loses one of its WiF sorts.

Note: If I had ordered by ID I would have seen two calculations of cost under the heading of WiF sort and a third calculation below that with the heading ORDER BY sort. Unfortunately when I ran the test to check this I also found that the OBYE report said: “OBYE performed” when it wasn’t relevant and there wasn’t an “order by” available for elimination.

Interestingly I tested to see if I could change the optimizer’s behaviour by adding the no_eliminate_oby(@sel$1) hint to the “order by” queries but the GENERAL PLANS section didn’t change, even though the trace file report: “OBYE: OBYE bypassed: hinted”, and the “Hint Report” from the Oracle 19i execution plan acknowledge the hint as legal and used.

Summary

If you have an “order by” clause in a query block that includes analytic functions and the optimizer decides that it can eliminate the “order by” and rely on the side effect of an analytic over() clause you may find that the cost of the query block is reduced by the cost of one of the Window Sort operations. (Technically this might lead to cases where the optimizer then made some poor choices in overall shape of the execution plan – though such cases might be very rare given that this costing error doesn’t affect the cardinality estimates.)

Lagniappe

In the last set of tests I added in the projection information as a simple example of a case where it can help you understand a little more of what the plan is supposed to achieve.  If you examine the last two query plans carefully (select both row_number() values and order by vc30 / vc65 respectively) Operation 2 of the first plan reports:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

while operation 2 of the second plan reports:

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

It’s not until you look at these two lines that the plans show any differences – operations 1 and 2 simply say “Window Sort” without giving any clue about which window sort is for which over() clause. The projection information, though, tells you which way around the over() clauses operated – in the first query the over(order by vc65) is applied to the result of the tablescan first, while in the second query it’s the over(order by vc30) that is applied first.

Lagniappe 2

There’s another little oddity you might spot when you look at the projection information and think about the WiF sort costs from the unordered query. The columns passed from operation 3 to operation 2 are:

 3 - "ID"[NUMBER,22], "VC30"[VARCHAR2,30], "VC65"[VARCHAR2,65]

The columns passed from operation 2 to operation 1 are one of:

2 - (#keys=1) "VC65"[VARCHAR2,65], "ID"[NUMBER,22], "VC30"[VARCHAR2,30], ROW_NUMBER() OVER ( ORDER BY "VC65")[22]

2 - (#keys=1) "VC30"[VARCHAR2,30], "ID"[NUMBER,22], "VC65"[VARCHAR2,65], ROW_NUMBER() OVER ( ORDER BY "VC30")[22]

Operation 2 sorts the data from operation 3, and operation 1 sorts the data from operation 2 – but the columns arriving from operation 2 have an extra row_number()[22] added to them. So when you look in the trace file at the two Wif Sort calculations why do they both show:

Blocks to Sort: 150 Row size: 122 Total Rows: 10000

Shouldn’t one of them show a Row size that’s (at least) 22 longer than the other ?

 

 

 

 

 

 

June 3, 2020

Fetch First Update

Filed under: 12c,Hints,Oracle,Tuning — Jonathan Lewis @ 1:48 pm BST Jun 3,2020

A question about mixing the (relatively new) “fetch first” syntax with “select for update” appeared a few days ago on the Oracle Developer Forum. The requirement was for a query something like:


select
        *
from
        t1
order by
        n1
fetch
        first 10 rows only
for     update
;

The problem with this query is that it results in Oracle raising error ORA-02014: cannot select FOR UPDATE from view with DISTINCT, GROUP BY, etc. The error doesn’t seem to be particularly relevant, of course, until you remember that “fetch first” creates an inline view using the analytic row_number() under the covers.

One suggested solution was to use PL/SQL to open a cursor with a pure select then use a loop to lock each row in turn. This would need a little defensive programming, of course, since each individual “select for update” would be running at a different SCN from the driving loop, and there would be some risk of concurrency problems (locking, or competing data change) occuring.

There is a pure – thought contorted – SQL solution though where we take the driving SQL and put it into a subquery that generates the rowids of the rows we want to lock, as follows:


select
        /*+
                qb_name(main)
        */
        *
from
        t1
where
        t1.rowid in (
                select
                        /*+ qb_name(inline) unnest no_merge */
                        t1a.rowid
                from
                        t1 t1a
                order by
                        t1a.n1
                fetch 
                        first 10 rows only
        )
for update
;

The execution plan for this query is critical – so once you can get it working it would be a good idea to create a baseline (or SQL Patch) and attach it to the query. It is most important that the execution plan should be the equivalent of the following:


select  /*+   qb_name(main)  */  * from  t1 where  t1.rowid in (
select    /*+ qb_name(inline) unnest no_merge */    t1a.rowid   from
t1 t1a   order by    t1a.n1   fetch    first 10 rows only  ) for update

Plan hash value: 1286935441

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |     10 |00:00:00.01 |     190 |       |       |          |
|   1 |  FOR UPDATE                   |      |      1 |        |     10 |00:00:00.01 |     190 |       |       |          |
|   2 |   BUFFER SORT                 |      |      2 |        |     20 |00:00:00.01 |     178 |  2048 |  2048 | 2048  (0)|
|   3 |    NESTED LOOPS               |      |      1 |     10 |     10 |00:00:00.01 |     178 |       |       |          |
|*  4 |     VIEW                      |      |      1 |     10 |     10 |00:00:00.01 |     177 |       |       |          |
|*  5 |      WINDOW SORT PUSHED RANK  |      |      1 |  10000 |     10 |00:00:00.01 |     177 |  2048 |  2048 | 2048  (0)|
|   6 |       TABLE ACCESS FULL       | T1   |      1 |  10000 |  10000 |00:00:00.01 |     177 |       |       |          |
|   7 |     TABLE ACCESS BY USER ROWID| T1   |     10 |      1 |     10 |00:00:00.01 |       1 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("from$_subquery$_003"."rowlimit_$$_rownumber"<=10)
   5 - filter(ROW_NUMBER() OVER ( ORDER BY "T1A"."N1")<=10)

Critically you need the VIEW operation to be the driving query of a nested loop join that does the “table access by user rowid” joinback. In my case the query has used a full tablescan to identify the small number of rowids needed – in a production system that would be the part of the statement that should first be optimised.

It’s an unfortunate feature of this query structure (made messier by the internal rewrite for the analytic function) that it’s not easy to generate a correct set of hints to force the plan until after you’ve already managed to get the plan. Here’s the outline information that shows the messiness of the hints I would have needed:


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"INLINE")
      OUTLINE_LEAF(@"SEL$A3F38ADC")
      UNNEST(@"SEL$1")
      OUTLINE(@"INLINE")
      OUTLINE(@"MAIN")
      OUTLINE(@"SEL$1")
      NO_ACCESS(@"SEL$A3F38ADC" "from$_subquery$_003"@"SEL$1")
      ROWID(@"SEL$A3F38ADC" "T1"@"MAIN")
      LEADING(@"SEL$A3F38ADC" "from$_subquery$_003"@"SEL$1" "T1"@"MAIN")
      USE_NL(@"SEL$A3F38ADC" "T1"@"MAIN")
      FULL(@"INLINE" "T1A"@"INLINE")
      END_OUTLINE_DATA
  */

You’ll notice that my /*+ unnest */ hint is now modified – for inclusion at the start of the query – to /*+ unnest(@sel1) */ rather than the /*+ unnest(@inline) */ that you might have expected. That’s the side effect of the optimizer doing the “fetch first” rewrite before applying “missing” query block names. If I wanted to write a full hint set into the query itself (leaving the qb_name() hints in place but removing the unnest and merge I had originally) I would need the following:


/*+
        unnest(@sel$1)
        leading(@sel$a3f38adc from$_subquery$_003@sel$1 t1@main)
        use_nl( @sel$a3f38adc t1@main)
        rowid(  @sel$a3f38adc t1@main)
*/

I did make a bit of a fuss about the execution plan. I think it’s probably very important that everyone who runs this query gets exactly the same plan and the plan should be this nested loop. Although there’s a BUFFER SORT at operation 2 that is probably ensuring that every would get the same data in the same order regardless of the execution plan before locking any of it, I would be a little worried that different plans might somehow be allowed to lock the data in a different order, thus allowing for deadlocks.

June 1, 2020

Order By

Filed under: Bugs,CBO,Indexing,Oracle — Jonathan Lewis @ 1:05 pm BST Jun 1,2020

This is a brief note with an odd history – and the history is more significant than the note.

While searching my library for an example of an odd costing effect for the “order by” clause I discovered a script that looked as if I’d written for 11.1.0.6 in 2008 to demonstrate a redundant sort operation appearing in an execution plan; and then I discovered a second script written for 11.2.0.4 in 2014 demonstrating a variant of the same thing (presumably because I’d not found the original script in 2014) and the second script referenced a MOS bug number

Bug 18701129 : SORT ORDER BY ISN’T AVOIDED WHEN ROWID IS ADDED TO ORDER BY CLAUSE

Whenever I “discover” an old bug test I tend to re-run it to check whether or not the bug has been fixed.  So that’s what I did, and found that the anomaly was still present in 19.3.0.0. The really odd thing, though, was that the bug note no longer existed – and even after I’d done a few searches involving the text in the description I couldn’t manage to find it!

For the record, here’s the original 2008 script (with a couple of minor edits)


rem
rem     Script:         order_by_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2008
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0        Still sorting
rem             12.2.0.1
rem             11.1.0.6
rem

set linesize 180
set pagesize 60

create table test 
as 
select  * 
from    all_objects 
where   rownum <= 10000 -- >  comment to avoid wordpress format issue
;

alter table test modify object_name not null;
create index i_test_1 on test(object_name);

analyze table test compute statistics;

set serveroutput off
alter session set statistics_level = all;

select  * 
from    (select * from test order by object_name) 
where 
        rownum < 11 -- > comment to avoid wordpress format issue
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));



select  * 
from    (select /*+ index(test) */ * from test order by object_name,rowid) 
where
        rownum < 11 -- > comment to avoid wordpress format issue
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on

Yes, that is an analyze command – it’s a pretty old script and I must have been a bit lazy about writing it. (Or, possibly, it’s a script from an Oracle-l or Oracle forum posting and I hadn’t re-engineered it.)

I’ve run two queries – the first uses an inline view to impose an order on some data and then selects the first 10 rows. The second query does nearly the same thing but adds an extra column to the “order by” clause – except it’s not a real column it’s the rowid pseudo-column. Conveniently there’s an index on the table that is a perfect match for the “order by” clause and it’s on a non-null column so the optimizer can walk the index in order and stop after 10 rows.

Adding the rowid to the “order by” clause shouldn’t make any difference to the plan as the index Oracle is using is a single column non-unique index, which means that the internal representation makes it a two-column index where the rowid is (quite literally) stored as the second column. But here are the two execution plans:


----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |      1 |        |     10 |00:00:00.01 |       7 |
|*  1 |  COUNT STOPKEY                |          |      1 |        |     10 |00:00:00.01 |       7 |
|   2 |   VIEW                        |          |      1 |     10 |     10 |00:00:00.01 |       7 |
|   3 |    TABLE ACCESS BY INDEX ROWID| TEST     |      1 |  10000 |     10 |00:00:00.01 |       7 |
|   4 |     INDEX FULL SCAN           | I_TEST_1 |      1 |     10 |     10 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<11)



----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |          |      1 |        |     10 |00:00:00.01 |    4717 |       |       |          |
|*  1 |  COUNT STOPKEY                         |          |      1 |        |     10 |00:00:00.01 |    4717 |       |       |          |
|   2 |   VIEW                                 |          |      1 |  10000 |     10 |00:00:00.01 |    4717 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY               |          |      1 |  10000 |     10 |00:00:00.01 |    4717 |  4096 |  4096 | 4096  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| TEST     |      1 |  10000 |  10000 |00:00:00.01 |    4717 |       |       |          |
|   5 |      INDEX FULL SCAN                   | I_TEST_1 |      1 |  10000 |  10000 |00:00:00.01 |      44 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<11)
   3 - filter(ROWNUM<11)


When I add the rowid to the “order by” clause the optimizer no longer sees walking the index as an option for avoiding work; it wants to collect all the rows from the table, sort them, and then report the first 10. In fact walking the index became such an expensive option that I had to hint the index usage (hence the non-null declaration) to make the optimizer choose it, the default plan for 19.3 was a full tablescan and sort.

It’s just a little example of an edge case, of course. It’s a pity that the code doesn’t recognise the rowid as (effectively) a no-op addition to the ordering when the rest of the “order by” clause matches the index declaration, but in those circumstances the rowid needn’t be there at all and you wouldn’t expect anyone to include it.

As I said at the start – the interesting thing about this behaviour is that it was once described in a bug note that has since disappeared from public view.

 

May 12, 2020

from$_subquery$_NNN

Filed under: Execution plans,Oracle — Jonathan Lewis @ 4:26 pm BST May 12,2020

This is a reference note for a question that came up as a comment on a lengthy note I wrote about reading execution plans.

How do you interpret something like: from$_subquery$_001@SEL$1 in the Query Block Name / Object Alias section of an execution plan.

The simple answer is that if you’ve got an inline view in the FROM clause of a query and you haven’t given the inline view an alias the optimizer will have to invent one – and this is what they look like.

As a quick demo here’s a script to create a couple of tables and then run a query that joins two inline views (using “ANSI”-style SQL), with variations on which of the inline views are named:


rem
rem     Script:         from_subquery.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1
as
select  * 
from    all_objects
where   rownum <= 100
;

create table t2
as
select  *
from    all_objects
where   rownum <= 100
;

set serveroutput off

prompt  =========================
prompt  Neither inline view named
prompt  =========================

select 
        count(*)
from    (select /*+ no_merge */ * from t1)
join 
        (select /*+ no_merge */ * from t2)
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  ============================
prompt  Only first inline view named
prompt  ============================

select 
        count(*)
from    (select /*+ no_merge */ * from t1) v1
join 
        (select /*+ no_merge */ * from t2)
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  =============================
prompt  Only second inline view named
prompt  =============================

select 
        count(*)
from    (select /*+ no_merge */ * from t1)
join 
        (select /*+ no_merge */ * from t2) v2
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

prompt  =======================
prompt  Both inline views named
prompt  =======================

select 
        count(*)
from    (select /*+ no_merge */ * from t1) v1
join 
        (select /*+ no_merge */ * from t2) v2
using
        (object_id)
;

select * from table(dbms_xplan.display_cursor(null,null,'alias'));

In all four examples I’ve added the /*+ no_merge */ hint to the inline views; if I hadn’t done that the optimizer would simply have reduced each query to a single query block joining two tables. As it is here are the resulting execution plans (with a little cosmetic editing) reporting the plan hash value, plan, outline and predicates:


=========================
Neither inline view named
=========================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / from$_subquery$_001@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / from$_subquery$_003@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("from$_subquery$_001"."OBJECT_ID"="from$_subquery$_003"."O
              BJECT_ID")


============================
Only first inline view named
============================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / V1@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / from$_subquery$_003@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."OBJECT_ID"="from$_subquery$_003"."OBJECT_ID")


=============================
Only second inline view named
=============================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / from$_subquery$_001@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / V2@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("from$_subquery$_001"."OBJECT_ID"="V2"."OBJECT_ID")


=======================
Both inline views named
=======================

Plan hash value: 1978226902

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|*  2 |   HASH JOIN          |      |   100 |  2600 |     4   (0)| 00:00:01 |
|   3 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1   |   100 |   500 |     2   (0)| 00:00:01 |
|   5 |    VIEW              |      |   100 |  1300 |     2   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |   100 |   500 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   3 - SEL$2        / V1@SEL$1
   4 - SEL$2        / T1@SEL$2
   5 - SEL$3        / V2@SEL$1
   6 - SEL$3        / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("V1"."OBJECT_ID"="V2"."OBJECT_ID")

As you scan down the plans you can see that they are all the same – with only a change in naming where V1 is synonymous with from$_subquery$_001 and v2 is synonymous with from$_subquery$_003.

Don’t ask me to explain how the optimizer chooses the names – I had thought I might see a from$_subquery$_002 somewhere in the optimizer trace file, but apart from the two aliases I’ve shown here the only other alias I got was one occurrence of from$_subquery$_005@sel$4.

 

May 5, 2020

Execution Plans

Filed under: Execution plans,Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 12:36 pm BST May 5,2020

Table Of Contents

1.0 Introduction
2.0 Overview
3.0 The Main Course
4.0 Simplify
5.0 Filling the Gaps
6.0 Looking at the numbers
7.0 Predicate Information
8.0 Resolution
9.0 Summary
Footnote


 

1.0 Introduction

1.1 In a comment to a recent post on reading a non-trivial execution someone asked me to repeat the exercise using a plan I had published a few days previously in a post about tweaking the hints in an outline. The query in question involved a number of subqueries and transformations of different types, which means it’s going to take a little work explaining the details, and it’s probably going to be a fairly long read.

1.2 Here’s the query that produced the plan we’re going to examine. I’ve done some cosmetic alteration  to make it a little easier to read (though it’s still not perfect according to my standards). I’ve also made one very important addition to the query to make it easier to follow my walkthrough of the execution plan; the original text didn’t specify any query block names (/*+ qb_name() */ hints) even though it starts off with 9 separate query blocks, so I’ve walked through the text very carefully adding in the query block names that Oracle would have used (sel$NN) for each query block. In this case I got lucky because there were no views of other recursive problems involved so all I had to do was find each occurence of the word “select” in literal text order and increment the NN in sel$NN for each one.


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id,
                academic_year,
                applicant_gender,
                medium_of_study,
                education_type,
                college_id,
                course_id,
                medium_id,
                hostel_required,
                preference_order,
                status_flag,
                attribute7,  -- Added on 7-mar-20
                college_status_flag,
                percentage,
                caste_category,
                alloted_category,
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        adt.applicant_id,
                        lmt_gender.lov_code applicant_gender,
                        adt.medium_of_study,
                        act.college_id,
                        lmt_education_type.lov_code education_type,
                        act.course_id,
                        act.medium_id,
                        act.hostel_required,
                        act.preference_order,
                        act.status_flag,
                        act.attribute7, -- Added on 7-mar-20
                        adt.college_status_flag,
                        adt.academic_year,
                        adt.percentage,
                        adt.applicant_dob,
                        adt.legacy_appln_date,
                        adt.caste_category,
                        act.attribute1 alloted_category,
                        DECODE (lmt_pass.lov_code,  'ATTFIRST', 1,  'COMPARTL', 2,  3) order_of_pass,
                        DECODE (late_entry_flag,  'N', 1,  'Y', 2,  3)      order_of_entry,
                        DECODE (lmt_appearance.lov_code,  'REGULAR', 1,  'PRIVATE', 2,  3) order_of_appearance,
                        DECODE (adt.is_ttd_employ_ward,  'Y', 1,  'N', 2,  3) order_of_ttd_emp,
                        DECODE (adt.is_balbhavan_studnt,  'Y', 1,  'N', 2,  3) order_of_schooling,
                        act.attribute3 course_qe_priority,
                        adt.is_local_canditature_valid,
                        adt.is_ttd_emp_ward_info_valid,
                        adt.is_sv_bm_student_info_valid,
                        adt.is_social_ctgry_info_valid,
                        DECODE(adt.college_status_flag,'B',1,'O',2,'N',3) order_of_status
                FROM 
                        xxadm.xxadm_applicant_details_tbl    adt,
                        xxadm.xxadm_applicant_coursprefs_tbl act,
                        xxadm.xxadm_college_master_tbl       cmt,
                        xxadm.xxadm_course_master_tbl        crmt,
                        xxadm.xxadm_medium_master_tbl        mmt,
                        xxadm.xxadm_lov_master_tbl           lmt_gender,
                        xxadm.xxadm_lov_master_tbl           lmt_pass,
                        xxadm.xxadm_lov_master_tbl           lmt_appearance,
                        xxadm.xxadm_lov_master_tbl           lmt_religion,
                        xxadm.xxadm_lov_master_tbl           lmt_education_type
                WHERE
                        adt.applicant_id = act.applicant_id
                AND     act.college_id = cmt.college_id
                AND     act.course_id = crmt.course_id
                AND     act.medium_id = mmt.medium_id
                AND     adt.applicant_gender = lmt_gender.lov_id
                AND     adt.pass_type = lmt_pass.lov_id
                AND     adt.appearance_type = lmt_appearance.lov_id
                AND     adt.religion = lmt_religion.lov_id
                AND     cmt.education_type = lmt_education_type.lov_id
                AND     adt.status = 'Active'
                AND     1 = (CASE 
                                WHEN act.hostel_required = 'Y'
                                        THEN (CASE
                                                     WHEN    adt.distance_in_kms >20
                                                     AND     lmt_religion.lov_code = 'HINDU'
                                                     AND     adt.caste_category NOT IN (
                                                                     SELECT  /*+ QB_NAME(SEL$4) */
                                                                             category_id
                                                                     FROM    xxadm.xxadm_category_master_tbl
                                                                     WHERE   category_code IN ('BACKWRDC', 'BACKWRDE')
                                                             )
                                                             THEN 1
                                                             ELSE 2 
                                              END
                                             )
                                        ELSE 1 
                               END
                              )
                AND     1 =  (CASE 
                                WHEN act.hostel_required  = 'Y'
                                        THEN    (CASE 
                                                        WHEN    (    lmt_education_type.lov_code = 'COEDUCOL' 
                                                                 AND mt_gender.lov_code = 'FEMALE'
                                                                )
                                                                THEN 2
                                                                ELSE 1 
                                                 END
                                                )
                                        ELSE 1 
                               END
                              )
                AND     adt.course_applied_for = 'DEG' 
                AND     (adt.college_status_flag IS NULL OR adt.college_status_flag IN ('N','T','C','B','O')) 
                AND     act.preference_order <= NVL( -- > comment to avoid WordPress format issue
                                (SELECT  /*+ QB_NAME(SEL$5) */ 
                                         preference_order 
                                 FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                 WHERE   act1.applicant_id = adt.applicant_id 
                                 AND     status_flag IN('B','T','C','O') 
                                 ), act.preference_order 
                        )
                AND     act.preference_order >=  NVL(
                                (SELECT /*+ QB_NAME(SEL$6) */
                                        preference_order
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act2 
                                WHERE   act2.applicant_id = adt.applicant_id
                                AND     status_flag  = 'C'
                                ), act.preference_order
                        )
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$7) */
                                        act3.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act3
                                WHERE   act3.applicant_id = adt.applicant_id 
                                AND     act3.status_flag  = 'O'
                        ) 
                AND     act.preference_order NOT IN (
                                SELECT  /*+ QB_NAME(SEL$8) */
                                        act1.preference_order 
                                FROM    xxadm.xxadm_applicant_coursprefs_tbl act1 
                                WHERE   act1.applicant_id = adt.applicant_id 
                                AND     act1.status_flag IN ('C','B')
                                AND     act1.attribute1 IN (
                                                SELECT  /*+ QB_NAME(SEL9) */
                                                        category_id 
                                                FROM    xxadm.xxadm_category_master_tbl 
                                                WHERE   category_code IN ('OPENMERT')
                                        ) 
                                AND     NVL(act1.attribute7,'N') = 'N'
                        ) 
                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 1-1

1.3 This query first came to light in a thread on the Oracle Developer forums with an extract from a tkprof file showing that it had executed 842,615  times. That number should be ringing alarms and flashing warning lights, but if we assume that there really is no way of doing some sort of batch processing to get through the data we need to do a little bit of arithmetic to see how much of a threat this query is and how much is matters.

1.4 For every extra 0.01 seconds it takes to execute this query the total run-time goes up by8,426 seconds, which is 2 hours and 20 minutes. If the average execution time is a mere 0.06 seconds you’ll be at it all night long – and it will be a long, long night.


 

2.0 Overview

2.1 Before we look at the execution plan let’s take a moment to pick out a few points from the query. You may want to re-open this post in a separate window so that you can switch easily between the SQL and my comments.

2.2 We start off with a simple select from an inline view – and if we replace the inline view the simple “object name” V_THING we get the following query:


select  count(applicant_id)
from    V_THING
where   applicant_id = :p_applicant_id
;

Figure 2-1

2.3 This should prompt two questions

  • First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply, possibly the nature of the view is such that the optimizer could do a simple filter pushdown to apply the predicate very early. That still leaves (or leads on to) the question of whether the optmizer might then be able to generate further uses of the predicate through transitive closure.
  • Secondly, if the view V_THING is a multiable view will we be able to work out which table applicant_id comes from by the time it becomes visible in the view.  It’s possible that changing the table from which applicant_id comes will change the execution plan.

2.4 Digging down one layer we see that our V_THING is also a simple select from an inline view – let’s call it V_ANOTHER – so if we again forget about the complexity of the inner view we’re looking at a query that goes:


select  /*+ QB_NAME(SEL$1) */
        count(applicant_id)
from    (
        select  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                null    allotment_type
        from
                V_OTHER
        where
                rownum <=  :p_seats
        )       V_THING
where 
        applicant_id = :p_applicant_id
;

Figure 2-2

2.5 A couple of details hit the eye when you look at this: Why are we selecting 17 columns from a complex view, and then counting only one of them and discarding the rest. Let’s hope the optimizer is smart enough to discard the excess columns at the earliest possible moment (which might allow it to do some index-only accesses instead of visiting tables for columns we don’t really need).

2.6 Stranger still, one of those columns is a delberately generated NULL! This hints at the possibility that the client code is doing something like “count how many query X will give me, then run query X”– giving us the pattern “select count(*) from (inlne query X); execute query X” Maybe this whole query is a waste of time, but if it can’t be avoided maybe it should be edited down to the smallest  query that will get the correct count.

2.7 Another thought about this layer of the query, the predicate “rownum <= :bind_variable” may be pushing the optimizer into first_rows(n) optimization and this might be enough to make it choose a bad execution plan. I’d have to check, and check for specific versions, but off the top of my head I think that when comparing rownum with a bind variable the optimizer will optimizer for first_rows(10) unless there’s some other reason for choosing anything else.)

2.8 I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

2.9 Let’s dig one layer deeper before we get into the complex stuff. Here’s a version of the code that expands V_OTHER in an extremely stripped down form:


SELECT  /*+ QB_NAME(SEL$1) */
        COUNT(applicant_id)
FROM    (
        SELECT  /*+ QB_NAME(SEL$2) */
                applicant_id, 
                {15 more columns}
                NULL allotment_type
        FROM    (
                SELECT   /*+ QB_NAME(SEL$3) */
                        {lots of columns}
                FROM 
                        {lots of tables}
                WHERE
                        {lots of predicates}
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                )  
        WHERE
                ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id
;

Figure 2-3

2.10 At this point we can start to see reasons for the layering of inline views – we need to select data in the right order before we apply the rownum predicate; as for the excess columns in the select list – even if we selected only the applicant_id in the outer layers the optimizer would still have to acquire the five columns in the order by clause.

2.11 But this emphasises the oddity of the query. If we’re only counting applicant_id to see whether we got :p_seats or fewer rows for a specific applicant_id why does the order matter – surely the order will only matter when we “repeat” the query to get the actual rows (if that’s what we do). As it is, to count a small number of rows this query might have to fetch and sort a large number, then discard most of them. (Some statistics from other posts by the OP indicated that the underlying query might fetch anything between a few hundred and a couple of thousand rows. This particular run showed the query finding 171 rows to sort and then restricting the rowsource to the first two sorted rows)


 

3.0 The Main Course

3.1 To make it a little easier to discuss the detail of the execution plan I’ve laid it out in a small number of sections corresponding to the final (outline_leaf) query blocks the optimizer generated. To do this I applied two sets of information – the Query Block / Object Alias information (which follows the body of the plan) and any appearances of the VIEW operation in the plan.


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                                |      1 |        |   574 (100)|      1 |00:00:00.02 |    3822 |       |       |         |
|   1 |  SORT AGGREGATE                             |                                |      1 |      1 |            |      1 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
select count(applicant_id) - above
select where rownum less than - below
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  2 |   VIEW                                      |                                |      1 |      1 |   574   (2)|      0 |00:00:00.02 |    3822 |       |       |         |
|*  3 |    COUNT STOPKEY                            |                                |      1 |        |            |      2 |00:00:00.02 |    3822 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   4 |     VIEW                                    |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |       |       |         |
|*  5 |      SORT ORDER BY STOPKEY                  |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |  2048 |  2048 | 2048  (0)|
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |       |       |         |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |       |       |         |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |       |       |         |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
join index transformation query block SEL$082F290F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Continuation of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |  1245K|  1245K| 1277K (0)|
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |       |       |         |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |       |       |         |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |       |       |         |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |       |       |         |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |       |       |         |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unnested subquery SEL$A75BE177 (from sel$8, sel$9)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Start of "real" main query, query block SEL$7E0D484F
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |       |       |         |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$5
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery, query block SEL$6
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Filter subquery SEL$F665FE1B (from sel$4 with tranform for index join)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2        / from$_subquery$_001@SEL$1
   3 - SEL$2
   4 - SEL$7E0D484F / from$_subquery$_002@SEL$2
   5 - SEL$7E0D484F
  14 - SEL$082F290F / LMT_GENDER@SEL$3
  15 - SEL$082F290F
  16 - SEL$082F290F / indexjoin$_alias$_001@SEL$082F290F
  17 - SEL$082F290F / indexjoin$_alias$_002@SEL$082F290F
  21 - SEL$7E0D484F / CMT@SEL$3
  22 - SEL$7E0D484F / CMT@SEL$3
  23 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  24 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3
  27 - SEL$7E0D484F / ACT3@SEL$7
  28 - SEL$7E0D484F / ACT3@SEL$7
  29 - SEL$A75BE177 / VW_SQ_1@SEL$67DC521B
  30 - SEL$A75BE177
  31 - SEL$A75BE177 / ACT1@SEL$8
  32 - SEL$A75BE177 / ACT1@SEL$8
  33 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  34 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  35 - SEL$7E0D484F / LMT_PASS@SEL$3
  36 - SEL$7E0D484F / LMT_PASS@SEL$3
  37 - SEL$7E0D484F / LMT_APPEARANCE@SEL$3
  38 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  39 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  40 - SEL$5        / ACT1@SEL$5
  41 - SEL$5        / ACT1@SEL$5
  42 - SEL$6        / ACT2@SEL$6
  43 - SEL$6        / ACT2@SEL$6
  44 - SEL$F665FE1B / XXADM_CATEGORY_MASTER_TBL@SEL$4
  45 - SEL$F665FE1B
  46 - SEL$F665FE1B / indexjoin$_alias$_001@SEL$F665FE1B
  48 - SEL$F665FE1B / indexjoin$_alias$_002@SEL$F665FE1B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_dsdir_usage_control' 0)
      OPT_PARAM('_optimizer_adaptive_plans' 'false')
      OPT_PARAM('_optimizer_gather_feedback' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F665FE1B")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$A75BE177")
      PUSH_PRED(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B" 16 15)
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$9D10C90A")
      UNNEST(@"SEL$9")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$C04829E0")
      ELIMINATE_JOIN(@"SEL$3" "CRMT"@"SEL$3")
      ELIMINATE_JOIN(@"SEL$3" "MMT"@"SEL$3")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$7E0D484F" "CMT"@"SEL$3" ("XXADM_COLLEGE_MASTER_TBL"."COLLEGE_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      INDEX_JOIN(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_CODE") ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "ACT3"@"SEL$7" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      NO_ACCESS(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      LEADING(@"SEL$7E0D484F" "CMT"@"SEL$3" "LMT_EDUCATION_TYPE"@"SEL$3" "ACT"@"SEL$3" "ADT"@"SEL$3" "LMT_GENDER"@"SEL$3" "ACT3"@"SEL$7" "VW_SQ_1"@"SEL$67DC521B"
              "LMT_PASS"@"SEL$3" "LMT_APPEARANCE"@"SEL$3" "LMT_RELIGION"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT3"@"SEL$7")
      USE_NL(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      USE_NL(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      NLJ_BATCHING(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      PQ_FILTER(@"SEL$7E0D484F" SERIAL)
      INDEX_RS_ASC(@"SEL$A75BE177" "ACT1"@"SEL$8" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      INDEX_RS_ASC(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID"))
      LEADING(@"SEL$A75BE177" "ACT1"@"SEL$8" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      USE_NL(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      INDEX_RS_ASC(@"SEL$6" "ACT2"@"SEL$6" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."STATUS_FLAG"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$6" "ACT2"@"SEL$6")
      INDEX_RS_ASC(@"SEL$5" "ACT1"@"SEL$5" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."COLLEGE_ID"
              "XXADM_APPLICANT_COURSPREFS_TBL"."COURSE_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."MEDIUM_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."HOSTEL_REQUIRED"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$5" "ACT1"@"SEL$5")
      INDEX_JOIN(@"SEL$4" "XXADM_CATEGORY_MASTER_TBL"@"SEL$4" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID") ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_CODE"))
      END_OUTLINE_DATA
  */

Figure 3-1

3.2 There’s no rigid rule I can give you about an approach for looking for query blocks and transformations, but it’s worth checking to see which of your original query blocks still exist in the final execution plan and which have disappeared thanks to some transformation.

3.3 If we look down the Query Block Name list above we can see that sel$1, sel$2, sel$5 and sel$6 have “survived” the machinations of the optimizer. We’ve already noted that sel$1 and sel$2 are simply “select from {inline view}” as far as the optimizer is concerned; sel$5 and sel$6 are simple subqueries that appeared as filter subqueries in the original query text and have kept that status by the end of the optimizer’s transformation stage.

3.4 Tracking down the other query blocks that we named we can see the following:

  • sel$3 – most of its tables appear in a new query block called SEL$7E0D484F but one of them appears in a query block called SEL$082F290F; a closer look at SEL$082F290F shows us that it ranges from operations 14 to 17 and holds a “single table” transformation where the optimizer has chosen to use an index join of two indexes on the xxadm_lov_master_tbl rather than doing a tablescan. The index join is represented as a VIEW of a hash join, hence the separate query block. Another little detail we note – the xxadm_lov_master_tbl appears five times in the query, so we need to know which occurrence this is: fortunately the Object Alias information tells us at operation it’s the LMT_GENDER alias.
  • sel$4 is the scalar subquery inside a CASE expression, involving table xxadm_category_master_tbl. We can find the table name (which hasn’t been given an alias) and the query block name in the Object Alias information at operation 44 in a query block called SEL$F665FE1B. There are two points of interest about this query block – it has come into existence because it’s another example where the optimizer has used an index join to avoid a full tablescan; and it has been used in a filter subquery (the parent of operation 44 is the FILTER at operation 6).
  • sel$7 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl with an alias of act3. The Query Block / Object Alias information tells us that ACT3@SEL$7 appears at operations 27 and 28 – and when we track up the plan from operation 27 we see that it is the second child of operation 12 which is a nested loop anti. The optimizer has unnested the subquery and turned it into an anti join as one of the steps that produced query block SEL$7E0D484F
  • sel$8 appeared in the original text as a NOT IN subquery against xxadm_applicant_coursprefs_tbl aliased as act1 (OUCH – that’s the second time the alias act1 has appeared in this query!). But the subquery had it’s own subquery, named sel$9, against xxadm_category_master_tbl which didn’t have an alias (more OUCH!). When we search for ACT1@SEL$8 and XXADM_CATEGORY_MASTER_TBL@SEL$9 in the Query Block / Object Alias information we find that they both appear in a query block called SEL$A75BE177 which ranges from operations 29 to 34, and a check of the plan shows that operation 29 is a view pushed predicate of a view called VW_SQ_1 – a name that identifies the view as an internally generated non-mergeable view that was created as the result of unnestingn a subquery. The view contains a join between xxadm_applicant_coursprefs_tbl and xxadm_category_master_tbl, so we can say that the optimizer has unnested our sel$9 to create a NOT IN subquery that is a join subquery, then it has unnested again to produce an inline non-mergeable view, and it has then allowed “join predicate pushdown (JPPD)” so that the non-mergeable view can be the second table of a nested loop. To confirm the last comment we track up the plan to discover that operation 29 is the second child of operation 11 which is, indeed, a nested loop and (since the subquery was a NOT IN subquery) a nested loop anti.
  • sel$9 – see sel$8.

3.5 As you can see, when you’re using the execution plan output to identify what’s happened to the individual query blocks from your original query you’re likely to jump around from the query to the plan body, to the Query Block / Object Alias information in a fairly arbitrary way.

3.6 I’ll close this chapter of the analysis with a quick look at the Outline Data – in particular two of the hint types that appear: OUTLINE() and OUTLINE_LEAF() – which I’ve extracted and sorted for ease of reading:

      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")

      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$7E0D484F")
      OUTLINE(@"SEL$9D10C90A")
      OUTLINE(@"SEL$C04829E0")

      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      OUTLINE_LEAF(@"SEL$A75BE177")
      OUTLINE_LEAF(@"SEL$F665FE1B")

Figure 3-2

3.7 In this context an OUTLINE() is a query block that existed at some point in the optimization sequence that got us to the final execution plan but did not appear as a query block in the final plan. In the previous paragraphs we described how the original query blocks sel$3, sel$4, sel$7, sel$8 and sel$9 disappeared through transformation so (apart from sel$4 which is a bit of an anomaly that I’ll pick up in a moment) they appear in an   OUTLINE() hint. I described how sel$9 would have been unnested into sel$8 to create a join which would still have been a filter subquery until that too was unnested – that join subquery would have been one of the five OUTLINE() query blocks above with the 8 character hexadecimal names.

3.8 An OUTLINE_LEAF() is a “final” query block – one that is present in the final execution plan. If you ignore sel$4, you’ll see that the other 8 query blocks correspond to the 8 query block names that appear in the Query Block Name / Object Alias information. The appearance of sel$4 as an OUTLINE_LEAF() looks like an anomaly to me; I can’t think of a good reason why it should be in the list.


 

4.0 Simplify

4.1 We’re just about ready to do a full read-through of the execution plan. We’ve taken the two outer layers off the query/plan because they represent such simple in-line views, and we’ve discussed the disappearance of some of our initial query blocks and identified and explained all the different query blocks that have appeared in the final plan. So with the extra bits of information in hand let’s take a couple more steps in simplifying the execution plan.

4.2 First I’ll replace each of the three VIEW operations and their descendants with a single line that says “this is a rowsource”. I’ll distinguish between the two variants of the operation (VIEW and VIEW PUSHED PREDICATE) by calling them BULK ROWSOURCE and PRECISION ROWSOURCE respectively: it’s not a perfect description but broadly speaking we expect a VIEW to be called once by its parent to produce a “large” data set and we expect a VIEW PUSHED PREDICATE to be called many times by its parent to produce (each time) a “small” data set using extremely efficient methods.

4.3 Then I’ll remind you that a multichild FILTER operation calls the first child once to supply a rowsource then, for each row returned, calls the other child operations in turn to determine whether or not to keep the row from the first child. This means we can examine just the first child in isolation to see how the optimizer wants to get the driving bulk of the data (and we can examine the other children later, bearing in mind how often they might need to be called and checking how efficient each call is likely to be).

4.4 Finally I’ll note that query block SEL$7E0D484F (the “real main query” as I labelled it in the plan above) starts: “VIEW -> SORT ORDER BY STOPKEY -> FILTER” – after we’ve filtered our data we simply sort it with the intention of keeping only the “top few” rows. That part of the plan is so simple we’ll ignore those lines of the plan and focus on just the first child of the FILTER- leaving the core plan looking like this:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Figure 4-1

4.5 We need to examine a plan of only 25 lines with no complicated bits (because we’ve hidden any bits that looked complicated and will get back to them later). The thing now looks like a single query block which means we can think “First Child First”, so:

  • operation 7 calls operation 8 which calls operation 9 which calls operation 10 which calls operation 11 which calls operation 12 which calls operation 13 which calls operation 14 which is the first operation to produce a rowsource (though we don’t care how at present).
  • Operation 13 is a hash join, so the rowsource from operation 14 becomes its “build” table, and we call operation 18 to supply the “probe” table.
  • Operaion 18 calls operation 19 which calls operation 20 which calls operation 21 which is a table access by rowid that has to call operation 22 to get rowids. So operation 22 supplies the second rowsource (in our collapsed plan). It’s an INDEX UNIQUE SCAN of the index that appears (judging from its name)to be the primary key index of a table, so operation 22 will produce at most one rowid that is passed up to operation 21 that will use that rowid to get the one row from the table. (Operation 21 supplies the 3rd rowsource).
  • Operation 21 passes a row up to operation 20 which calls operation 23 which calls operation 24 (4th rowsource) to do another unique scan of a unique index to get a rowid to pass up to operation 23 to find (and test) a row from the table (5th rowsource) which it passes up to operation 20 to do the join and pass the result up (6th rowsource) to operation 19.
  • Operation 19 calls its second child (operatiomn 25) for each row it receives – but because of the primary key/unique scans the optimizer knows that the first child will return at most one row and sees no problem with using a full tablescan as the second child of the nested loop. So the tablescan of XXADM_APPLICANT_COURSPREFS_TBL is the 7th rowsource. Any rows survinging the join are passed up to operation 18 (making operation 19 the 8th rowsource).
  • Operation 18 uses the incoming rowsource to build its in-memory hash table, and calls operation 26 to supply its second (probe table) rowsource. Operation 26 is the the 9th rowsource, executing a full tablescan of XXADM_APPLICANT_DETAILS_TBL and passing the results up to operation 18, which performs the join and passes the results up to its parent, making it the 10th rowsource.
  • Operation 18 was the second child of the hash join at operation 13, which now uses the incoming data as the probe table to generate the 11th rowsource and pass the results up to operation 12.
  • Operation 12 is a nested loop anti-join and operation 13 has just supplied it with its first child rowsource, so operation 12 will now call its second child once for each row in the first child. Its second child is operation 27 (table access by rowid) which calls its first child (operation 28 index range scan) which fetches rowids from the index passes them up to operation 27 which fetches table rows and passes them up to operation 12. So operation 28 supplies the 12th rowsource, operation 27 the 13th. Since operation 12 is an ANTI join a row from the first child survives if operation 27 doesn’t find a row to return. Operation 12 passes any survivors (14th rowsource) up to operation 11.
  • Operation 11 is another ANTI-join nested loop so for each row from operation 12 it will call its second child, passing in values from its first child to drive an efficient access path and forwarding any rows from the first child where the second child returns no rows. Its second child is operation 29 – our “precision rowsource” – and we can postpone worrying about the exact details of how that works. Operation 29 produces the 15th rowsource in our reduced plan, which it passes up to operation 11.
  • Operation 11 is the first child of the nested loop at operation 10 – and from this point onwards we have 4 nested loop joins and we can simply list through the order of operation. Operation 11 produces the 16th rowsource, then Operation 10 calls its second child (operation 35) which calls operation 36 which passes rowids (17th rowsource) up to operation 35 which passes rows (18th rowsource) up to operation 10.
  • Operation 10 passes its data (19th rowsource) up to operation 9 which calls operation 37 as its second child. Operation 37 (20th rowsource) passes index entries up to operation 9 which performs the join and passes the result (21st rowsource) up to operation 8.
  • Operation 8 calls operation 38 as its second child. Operation 38 (22nd rowsource) passes index entries up to operation 8 which performs the join and passes the result (23rd rowsource) up to operation 7.
  • Operation 7 calls operation 39 as its second child. Operation 39 (24th rowsource) passes index entries up to operation 7 which performs the join and that’s the final (25th) rowsource as far as our reduced execution plan is concerned.

4.6 Here’s the reduced plan, cut back to minimum width, with the order of rowsource generation included:


-----------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           |  Order |
-----------------------------------------------------------------------------------------------
|   7 |        NESTED LOOPS                         |                                |     25 |
|   8 |         NESTED LOOPS                        |                                |     23 |
|   9 |          NESTED LOOPS                       |                                |     21 |
|  10 |           NESTED LOOPS                      |                                |     19 |
|  11 |            NESTED LOOPS ANTI                |                                |     16 |
|  12 |             NESTED LOOPS ANTI               |                                |     14 |
|* 13 |              HASH JOIN                      |                                |     11 |
|  14 |               BULK ROWSOURCE                | index$_join$_008               |      1 |
|* 18 |               HASH JOIN                     |                                |     10 |
|  19 |                NESTED LOOPS                 |                                |      8 |
|  20 |                 NESTED LOOPS                |                                |      6 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      3 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      2 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      5 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      4 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      7 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      9 |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |     13 |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |     12 |
|  29 |             PRECISION ROWSOURCE             | VW_SQ_1                        |     15 |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |     18 |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |     17 |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |     20 |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |     22 |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |     24 |
-----------------------------------------------------------------------------------------------

Figure 4-2

4.7 Once we’ve got this image sorted out we still have a few details to fill in before we’ve gpt the full picture of the execution plan.

  • What does Oracle do to generate the “bulk rowsource” at operation 14
  • What does Oracle do on every call to the “precision rowsource” at operation 29
  • We know that the reduced plan above is the first child of a FILTER operation and if we refer back to previous “real main query” we know that there are three further children to the FILTER that might have to execute once for each row produced by the first child. So that’s another 3 (small) query blocks we need to examine in detail.
  • We need to bring in the predicates to see how the optimizer has used them
  • We need to look at the Starts and A-Rows to compare what happened with the optimizer’s expectation
  • We need to look at disk reads and buffer gets to see how much excess work we did to acquire the data


 

5.0 Filling the Gaps

5.1 After getting the overall shape of the query’s execution we can go back and examine the bits we have so far postponed view. There are three pieces to consider

  • the “bulk rowsource” at operation 14 that was the first child of a hash join.
  • the “precision rowsource” at operation 29 that was the second child of a nested loop anti-join
  • the filter subqueries that were the 2nd, 3rd and 4th children of the explicit FILTER at operation 6

5.2 We start with the “bulk rowsource” that was a VIEW with a highly suggestive name of index$_join$_008. This shows Oracle finding a way of selecting data from a table without visiting the table, using a couple of indexes as if they were skinny tables that could be scanned and joined. In effect Oracle has transformed “select key1, key2 from tableX” into something like:


select  ix1.key1, ix2.key2
from
        (select key1, rowid r1 from tableX) ix1,
        (select key2, rowid r2 from tableX) ix2
where
        ix1.r1 = ix2.r2
;

5.3 This strategy can only be used when Oracle knows that every relevant row will appear in the two indexes – which basically means you have to be careful about NULLs. In the simplest case you might have to have a NOT NULL constraint on at least one column in each of the target indexes; or a predicate in each inline view that ensures that Oracle can use just the index without losing some of the rowids that it needs. After acquiring key values and rowids from each index in turn, Oracle then joins the two sets of data using a hash join. Technically there is no limit to the number of indexes that Oracle can join in this fashion, the choice of strategy depends largely on how big the table is compared to the sum of the sizes of the indexes that could be used instead; practically (as in our main query) it’s rare to see more than two indexes used for this “index join” mechanism.


join index transformation query block SEL$082F290F, with parent
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.4 Moving on to the “precision rowsource” that appears in the original plan as a VIEW PUSHED PREDICATE. This means that Oracle has optimised a non-mergeable view allowing for an input value from its parent. If you take operations 30 to 34 in complete isolation it’s just a simple nested loop join and you might wonder why the view is non-mergable. But when you look back at the parent you discover that it’s an ANTI-join, so Oracle has to say (for each driving row) “join these two tables and see if you get any rows as a result”, it doesn’t have a generic mechanism for doing two separate but consecutive (anti-)join operations at this point.


Unnested subquery SEL$A75BE177 (from sel$8, sel$9) with parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             Driving Rowsource               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.5 Finally we have the three filter subqueries, which I’ve shown with their parent FILTER and a one-liner for the driving rowsource. For each row in operation 7 we call operations 40, 42 and 44 in turn although the parent filter may decide after calling operation 40 that it doesn’t need to call the other two and can simply move on to the next row from operation 7. Similarly the filter might call operations 40 and 42 and not need to call operation 44. It’s also possible that, thanks to scalar subquery caching, Oracle can say “I’ve already called operation 40 for this value, so I know the result and don’t need to call it again”. When we look at the Starts and A-Rows columns for the three operations we will get some idea of how the “notional” execution sequence turned into an actual workload.


Filter subqueries SEL$5, SEL$6 and SEL$F665FE1B, with their parent and its first child
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        "Simplify" Plan                      |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

5.6 The operations for the first two subqueries (40,41) and (42,43) shouldn’t need any explanation. The third set of operations (44 to 48) is a little more complex. In fact it’s similar to the mechanism that appeared in our first “bulk rowsource” – we have Oracle turning a table access into an “index (only) join”, collecting key values and rowids from two different indexes and doing a hash join on the rowids. The plan is just a little more subtle, though – instead of getting their data from “index fast full scans”, one index is accessed by a range scan (possibly using a predicate value passed in by the filter operation) and the other is using an INLIST ITERATOR with unique scan. In the second case we must be handling a predicate of the form: “key_column in {list of values}”. (Taking a first look at the Starts column we can see that the INLIST ITERATOR runs 6 times calling the INDEX UNIQUE SCAN a total of 12 times so it’s fairly obvious that there are exactly two elements in the list in this case.)


 

6.0 Looking at the Numbers

6.1 Rather than walking through the entire plan again putting the pieces together I’m going to assume that I can carry on to the next stage of analysis, and assume that the pieces will fall into place as we talk about some of the critical numbers.

6.2 It’s probably best to open a copy of the note in a separate window so that you can examine the plan and read my comments at the same time. Starting at the top we apply “first child first”, noting in passing that the examples of the VIEW operations we have in this plan don’t have a significicant impact on the order of operation as we work down the plan – they simply remind us that there are “non-mergeable” parts of the plan. Eventually we get to operation 13 which is a hash join and operations 14 through 17 give us the build (first) table – a quick check shows 127 rows for estimated (E-rows) and actual (A-rows); then we see the probe (second) table is itself a hash join returning 182 rows (estimate 487, so in the right ballpark) and the hash join at operation 13 produces a result set of 182 rows.

6.3 At this point a quick check back UP the plan tells us that the 182 rows survive all the way up to operation 6, where a FILTER eliminates just a few of them; then the result set drops to just 2 rows at operation 5. Then a quick check of operation 5 (with a cross reference to the query) reminds us that we have an inline view that does an “order by” followed by a “rownum < :bind” predicate – so the sort order by stop key at operation 5 is sorting all the data but only passing on the first two rows: so there’s no way we could have modified the join order to eliminate the redundant rows sooner.

6.4 So we see that we get the right volume of data at about the right moment in the plan, and probably can’t do much to avoid the volume of data access – but let’s check how we got the 182 rows at operation 18. Using “first child first” – we see a nested loop joining two “single row by unique index” rowsources, then a nested loop to a full tablescan of XXADM_APPLICANT_COURSPREFS_TBL. The knee-jerk reaction to a “full tablescan as the second table in a nested loop” is that it must be bad – but in this case we know that it will happen no more than once, so we don’t need to panic immediately. Applying a little more thought (and arithmetic): we note that the tablescan returns 209 rows (estimated 478) using 879 buffer gets; that’s not an extreme number of buffer gets per row (especially if the 478 is a reasonably accurate average for the operation). We’ll postpone worrying about the tablescan for the moment but take note that it might be worth revisiting.

6.5 The second child to the hash join at operation 18 is another full tablescan (of XXADM_APPLICANT_DETAILS_TBL) which requires 1,141 buffer gets. Again, though this might not be a bad move, since the alternative would be an actual 209 index probes (or an estimated average 478 index probes). The workload is, again, in the right ballpark but, again, something we might come back to. In fact in both tablescans it might be more important to worry about the work done at each row by the row predicates rather than worrying about the fact that the operation is a tablescan; a predicate involving a CPU-intensive PL/SQL function might be the thing that makes 478 index probes to 478 rows a better option than a tablescan of (say) 100,000 rows.

6.6 From this point onwards (operation 13) we have 6 nested loop joins (though the first two are anti-joins) so it’s “call the second child for each row in the first child” all the way down, and we’ve seen that we don’t eliminate any data as we go. If we want to make the execution plan any faster by “local” tweaking we’ll just have to make sure each “second child” operation is as efficient as possible, which tends to mean looking for cases where we supply a lots of rows (rowids) from an index range scan but find that we then discard the table rows after visiting the table. So …

6.6.1 Operation 12 – nested loop anti join – calls 28/27 (table access by unique index). We find an index entry on each call, but the table row doesn’t qualify – which is what we want for a “successful” anti-join. We could make this a little more efficient by adding a column to the (already unique) index and avoid visiting the table.

6.6.2 Operation 11 – nested loop anti join – calls operation 29 (view with pushed predicate) which operates a high-precision nested loop join at operation 30. The first child of operation 30 is a table access by index, but the table never returns a row (which is nice) so we never call the second child of operation 30. Again we could make this view access a little more efficient by adding extra columns to (already unique) indexes to avoid any need to visit tables.

6.6.3 Operations 10, 9, 8, 7 – nested loop joins – operating very efficiently, some not even visiting tables to acquire data. The order of operation at this point is: 11, 36, 35, 10, 37, 9. 38, 8, 19, 7. And then we get to the FILTER operation, which has 3 subqueries to operate in turn,

6.7 Operation 6 executes in turn the two subqueries we named sel$5 and sel$6 respectively, the first one 182 times, the second one 171 times. Since operation 6 produces 171 rows it seems likely that the initial 182 rows dropped to 171 rows as a consequence of the sel$5 subquery resulting in the smaller number of calls to the sel$6 subquery. It’s worth noting here that operation 41, the index range scan, returned 1,450 rowids, but the subsequent table accesses returned a total of only 29 rows after an additional 316 buffer gets (507 – 191). There may be an opportunity here (yet again) for adding an extra column to the index so that we visit the table only 29 times rather than visiting it 1,450 times. In fact, though it’s not obvious in this SQL Monitor report, the indications from other examples from the same query suggested that this subquery was the single more resource intensive part of the plan.

6.8 The last subquery executed by Operation 6 is the one identified by query block sel$4. The sub-plan starts with a VIEW operation because the table (identified as originating in sel$4 in the query block / object alias information) is “accessed” by way of an index hash join. This subquery is executed only 6 times. Given that there are (at least) 171 rows for which this subquery could be called this means one of two things.  First we can from the query text that this subquery is part of a complex CASE expression – so maybe the simple conditions in the expression mean we rarely need to call the subquery;. secondly it could mean the run-time engine has managed to take advantage of scalar subquery caching and the query doesn’t have many distinct inputs for this subquery – and when we check the predicate section we can see the relevant predicate for a query against the XXADM_CATEGORY table was “category_id = {correlation variable}” which has the look of a table with a few rows and distinct ids..

6.9 In summary, then, there may be a few “localised” tweaks that we can to do improve performance of this plan – largely by adding columns to existing indexes and using them effectively. There are indications that one of the filter subqueries might be a particularly good target for this type of tweak; after which we might want to look at what could be done with the two tablescans which are in that grey area where it’s not easy to decide whether an indexed access or a tablescan is the better option. We have to remember, though, that this query was originaly reported as executing 842,000 times – so maybe we need to do much better than just a little tweaking.


 

7.0 Predicate Information

7.1 Why are we running a query 842,000 times in a batch? The right way to find an answer to that question is to ask the right person – if you can find them. A slightly more long winded way is to find out what is driving the 842,000 executions – and you might be able to do that if you have the full tkprof output from the trace file. (Hint: statement X runs 842,000 times, and if statement Y executes 13 times and produces 842,000 rows maybe Y is driving X.) Sometimes, though, you don’t have the people, or the full data set, or the access you need, so you might take a look at the query and the predicates and start making some reasonable guesses.

7.2 Here’s the tail end of the query, conveniently capturing all the input bind variables:

                AND     cmt.college_id = :p_college_id
                AND     crmt.course_id = :p_course_id
                AND     mmt.medium_id  = :p_medium_id
                AND     act.hostel_required = :p_hostel_required
                ORDER BY
                        order_of_pass,
                        course_qe_priority,
                        percentage DESC,
                        applicant_dob,
                        legacy_appln_date
                ) 
        WHERE
                 ROWNUM <=  :p_seats
        ) 
WHERE 
        applicant_id = :p_applicant_id

7.3 There are two things we can note about these predicates – first they don’t follow the pattern of “:Bnnn” so they’re not from a statement statically embedded in PL/SQL, secondly the names are intelligible and meaningful, so we might draw some tentative conclusions from them, in particular how many distinct values there might be and how this lead to 842,000 executions of the query.

7.4 The variable name that stands out is :p_applicant_id. We seem to be looking at a query about applicants for courses at colleges – and the latter pair probably gives us a relatively small number of combinations. The variable :p_hostel_required is surely just going to be a “yes/no/maybe/null” option. The :p_medium_id is a bit of a puzzle but scanning through the query it looks like it’s the id for the “medium of study” so probably another variable with a small number of values. So where in the plan do these variables get used? Here’s the full list of predicates from the plan, followed by an extra few lines showing just the predicates that reference the bind variables:

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   6 - filter((    "ACT"."PREFERENCE_ORDER"<=NVL(,"ACT"."PREFERENCE_ORDER") -- > comment added to avoid wordpress format issue
               AND "ACT"."PREFERENCE_ORDER">=NVL(,"ACT"."PREFERENCE_ORDER") AND CASE "ACT"."HOSTEL_REQUIRED"
              WHEN 'Y' THEN CASE  WHEN ("ADT"."DISTANCE_IN_KMS">20 AND "LMT_RELIGION"."LOV_CODE"='HINDU' AND  IS NULL) THEN 1 ELSE 2 END  ELSE 1 END =1))
  13 - access("ADT"."APPLICANT_GENDER"="LMT_GENDER"."LOV_ID")
       filter(CASE "ACT"."HOSTEL_REQUIRED" WHEN 'Y' THEN CASE  WHEN ("LMT_EDUCATION_TYPE"."LOV_CODE"='COEDUCOL' AND "LMT_GENDER"."LOV_CODE"='FEMALE') THEN 2 ELSE 1 END
               ELSE 1 END =1)
  15 - access(ROWID=ROWID)
  18 - access("ADT"."APPLICANT_ID"="ACT"."APPLICANT_ID")
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  24 - access("CMT"."EDUCATION_TYPE"="LMT_EDUCATION_TYPE"."LOV_ID")
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))
  26 - filter(("ADT"."STATUS"='Active' AND "ADT"."COURSE_APPLIED_FOR"='DEG' AND (INTERNAL_FUNCTION("ADT"."COLLEGE_STATUS_FLAG") OR "ADT"."COLLEGE_STATUS_FLAG" IS
              NULL)))
  27 - filter("ACT3"."STATUS_FLAG"='O')
  28 - access("ACT3"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT"."PREFERENCE_ORDER"="ACT3"."PREFERENCE_ORDER")
  31 - filter((INTERNAL_FUNCTION("ACT1"."STATUS_FLAG") AND NVL("ACT1"."ATTRIBUTE7",'N')='N'))
  32 - access("ACT1"."APPLICANT_ID"="ADT"."APPLICANT_ID" AND "ACT1"."PREFERENCE_ORDER"="ACT"."PREFERENCE_ORDER")
  33 - filter("CATEGORY_CODE"='OPENMERT')
  34 - access("CATEGORY_ID"=TO_NUMBER("ACT1"."ATTRIBUTE1"))
  36 - access("ADT"."PASS_TYPE"="LMT_PASS"."LOV_ID")
  37 - access("ADT"."APPEARANCE_TYPE"="LMT_APPEARANCE"."LOV_ID")
  38 - access("ADT"."RELIGION"="LMT_RELIGION"."LOV_ID")
  40 - filter(("STATUS_FLAG"='B' OR "STATUS_FLAG"='C' OR "STATUS_FLAG"='O' OR "STATUS_FLAG"='T'))
  41 - access("ACT1"."APPLICANT_ID"=:B1)
  43 - access("ACT2"."APPLICANT_ID"=:B1 AND "STATUS_FLAG"='C')
  44 - filter(("CATEGORY_ID"=:B1 AND INTERNAL_FUNCTION("CATEGORY_CODE")))
  45 - access(ROWID=ROWID)
  46 - access("CATEGORY_ID"=:B1)
  48 - access(("CATEGORY_CODE"='BACKWRDC' OR "CATEGORY_CODE"='BACKWRDE'))


   2 - filter("APPLICANT_ID"=:P_APPLICANT_ID)
   3 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
   5 - filter(ROWNUM<=:P_SEATS) -- > comment added to avoid wordpress format issue
  22 - access("CMT"."COLLEGE_ID"=:P_COLLEGE_ID)
  25 - filter(("ACT"."COURSE_ID"=:P_COURSE_ID AND "ACT"."COLLEGE_ID"=:P_COLLEGE_ID AND "ACT"."MEDIUM_ID"=:P_MEDIUM_ID AND "ACT"."HOSTEL_REQUIRED"=:P_HOSTEL_REQUIRED))

Figure 7-1

7.5 Apart from the predicates in the final shortlist you probably noticed further bind variables at operation 41, 43, 44, and 46 – but these are all named :B1, which is Oracle flagging up the need to pass correlating values into filter subqueries.

7.6 Operation 25 (where we test almost all the predicates) is one of the first operations to drive the query while operation 2 (where we test the :p_applicant_id) is close to the very last thing we do in the execution plan. So we generate a load of data for a college, course, and couple of other predicates, sort it then – at the last moment – decide that we only want a few (:p_seats) rows and count how many rows we’ve found for the specific applicant – and we do that a very large number of times. This takes me back to section 2 where I asked a couple of critical questions:

7.6.1 (2.3) First, how far into the view V_THING will the optimizer be able to push that predicate, possibly the entire content of the view will have to be constructed before the predicate can apply,

7.6.2 (2.8) I’m also a little curious about a requirement that seems to say – “pick at most N rows, then tell me how many you’ve picked”. What’s it actually trying to do and why?

7.7 We now know the answer to the first query – that predicate isn’t going anywhere, and we recognise why not, of course: it’s a consequence of the “rownum” pseudo-column which has to be evaluated for all the generated data before the rownum restriction can be applied: “select for the applicant then apply the rownum” is very different from “apply the rownum (column and predicate) then restrict to the applicant”. That brings us to the second question – why would you generate all the data, then order it, then restrict it to the first N, and then count how many times a specific applicant appeared? And there’s one “valid” answer to the last bit of the question – what if you’re not really trying to count how many times the applicant appeared, you’re only trying to find out whether the count is zero or non-zero.

7.8 The intent of the query is to answer the question: “does this applicant appear in the first N candidates”. Once you’ve realised this the underlying performance problem with the query becomes clear. In the monitored example show here the query found 171 applicants that matched the initial predicates – and at some point in the batch it’s going to do the same work all over again for each of the other 170 applicants that we’ve discarded. For each combination of the initial predicates (excluding applicant id and seat count) we run the query N times if there are N candidates that match that combination. It’s bad enough that this query took 0.02 seconds to run and would have run 172 times (for a total of 3,4 seconds) but another sample run took 0.05 seconds to run identifying 1,835 applicants (which means another 1,834 executions for a total of 91 seconds run time).


 

8.0 Resolution

8.1 There is a serious flaw in the design of this application. We are seeing a piece of code running once per applicant_id when (with some minor variations) it looks as if it should be running no more than once per set of distinct combinations of (course_id, college_id, medium_id, hostel_required). In fact, if the set of distinct combinations could be generated by a simple query, you could imagine the entire required result set as a join between two non-mergable views, with a little row-by-row post processing – but that might be too ambitious a change to implement in the short term.

8.2 Realistically (as a low-risk strategy) it might be possible to keep a very large percentage of the existing code structure for whatever this task does but precede it with a PL/SQL loop that steps through each of the distinct combinations required, populating a table (perhaps an IOT) with {applicant_id, (combination columns), “rownum”); and then replace our problem query by a simple primary key look up to find the saved “rownum” for each applicant and combination, to check whether the stored “rownum” was fell within the required seat count.


 

9.0 Summaryi

9.1 For a DBA working on-site, or a consultant on a short-term visit, the analysis shown in this post is probably not how things would have progressed. I could imagine the sequence of events being more like:

9.1.1 This “start of year / start of term” batch job takes too long

9.1.2 What is it trying to achieve (business overview) – sketch an outline of the process (technical overview)

9.1.3 Trace the job and discover most of the time went on this query

9.1.4 Investigate the logic of this query and why it is run for every applicant_id

9.1.5 Recognise the fundamental design threat then choose between three possible strategies:

9.1.5.1 make the query much faster

9.1.5.2 re-engineer this part of the batch completely

9.1.5.3 subvert this section of the batch to pre-build a “materialized result” and use a much simpler query to query it

9.2 Effectively, however, we’ve come in at 9.1.5.1 and run the consultation backwards. As we did so we raised an early question about the applicant_id and pushing predicates and the oddity of counting a limited list, and we finally came back to those points towards the end of the post with an educated guess about what the query was trying to achieve and how it should be reduced from “once per applicant” to “once per combination”.

9.3 Despite the post starting at the wrong place it’s quite possible that we would have reached 9.1.5.1 by following a sensible order of problem analysis, and still want to think about how the query might run more quickly – so this investigation wasn’t a total waste of time and it’s allowed us to work through a real-world query and plan in a realistic way which we can sum up in the following stages:

9.3.1 Simplify: cross-referencing between the overall plan shape, the Query Block / Object Alias information, and original query we can take out sections of the plan (sub-plans) and analyse them separately. In particular we can identify and reduce to a minimum the core of the plan that generates the final result set, calling on the various sub-plans as it goes.

9.3.2 Follow the workload: in this case we didn’t get much help from the timing information, but buffer gets, disk reads and A-rows also supply clues to where most work is done. Critically we noted that the volume of data we picked up early on in the query was needed all the way through the query – until the last moment – and we didn’t waste resources carrying and processing unnecessary rows. Along the way, of course, we compare Oracle’s predictions of data volume with the actual data volume (A-Rows = Starts * E-Rows as a guideline). We also noted a couple of opportunities where modifying indexes might eliminate table visits, potentially reducing I/O and buffer gets.

9.3.3 Check the predicates: which goes hand in hand with following the workload – how and where are our predicates used. What predicates have been generated (or eliminated) by transitive closure; which predicates are (or could be) pushed further down the plan tree to eliminate data earlier; will multiple predicates result in bad optimizer estimates followed by bad choices for access paths.

9.4 It would be nice to think that there was a simple progression, a fixed sequence of steps that one could follow to interpret an execution plan quickly and accurately. Unfortunately (like the optimisation process itself) interpretation requires a measure of looping and recursion. It’s probably always best to start with simplifying – but how much you simplify and how you pick which subplan to simplify (or start analysing in detail) depends on being able to spot where the biggest workload appears; and before you get stuck too deeply into a sub-plan you might glance down at the use of predicates because a change in one predicate might make the optimizer completely re-engineer its choice of plan. And maybe, before anything else, you’ll see a single operation which you know should (for exanple) generate about 10 rows when the optimizer is predicting 25,000 rows (or vice versa) and you’ll want to check why there’s such a bad estimate at one point in the plan before you tackle any of the harder questions.

9.5 The bottom line with execution plans is simply this: the more you practice the faster you get at spotting the clues that are worth pursuing; and the faster you spot the clues the less time you waste unpicking every little detail, and the less time you spend on the preripheral pieces of the plan the easier it becomes to keep the big picture in your mind and see how the optimizer got to where it is, and how you might want to redirect it. So pick a couple of random queries each week that produce plans of about 20 lines and use them to exercise your interpretation skills; and increase the complexity of the queries every couple of weeks.

The End

 
 

 

Footnote

I think I’ve spent more than 20 hours writing a detailed description of something that would normally take me a few minutes to do [and some poeple wonder why I’ve not yet written another book on the optimizer]. In part the difference in time is because with practice the “intuitive” skill grows and the pattern of reading is more like –

  • How does it start (ignoring the “trivial” bits around the edges)
  • How does it end ( ditto )
  • Where do we do most work reading and discarding data

What I’ve done in this note is talk about every single query block and every single line whereas in real-life I might have scanned the plan, examined about 10 lines, and done a quick check on the corresponding predicates as the first step to deciding whether or not the plan was reasonably efficient.

May 1, 2020

Execution Plans

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:58 pm BST May 1,2020

In previous articles on reading execution plans I’ve made the point that the optimizer is very “keen” to transform complex queries into queries consisting of a single query block and that there’s a simple “First Child First (FCF)” rule for reading the plan for a single query block. I’ve then pointed out that when the optimizer can’t transform your query into a single query block you can still apply FCF to each “final” query block (outline_leaf) in turn, but you then have to work out how Oracle is connecting those query blocks and FCF is not guaranteed to apply between query blocks.

In this note I want to follow-up an earlier comment that “The FILTER operation covers a multitude of sins.” because the filter operation (and variations thereof) often goes hand in hand with multiple query blocks and often (especially in recent versions of Oracle) needs a little care when you’re looking at a larger plan.

First though – a silly little question:

How many rows will be returned by the query “select * from tableX where 1 = 2”?

The answer is obviously “none”. But here’s the execution plan for a query of that form (cut and pasted from an SQL*Plus session in 19.3):

SQL> create table t1 as select * from all_objects where rownum <= 1000 -- > hint to avoid wordpress format issue
 2   /

Table created.

SQL> set autotrace traceonly explain
SQL> select * from t1 where 1 = 2;

Execution Plan
----------------------------------------------------------
Plan hash value: 3332582666

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |   116 |     0   (0)|          |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000 |   113K|     4   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NULL IS NOT NULL)

Although it’s clearly impossible for any rows to be returned Oracle (apparently) wants to do a full tablescan. Of course it doesn’t actually do that tablescan; if you check the Predicate Information there’s a filter predicate at Operation 1 that tests for “null is not null”[1], which is never true, and one of the refinements on the basic “first child first” is that a child operation is called only if the parent thinks it is needed. Consider, for example, the following query with its execution plan (again cut-n-pasted from SQL*Plus 19.3.0.0 with the same table t1):


SQL> set serveroutput off
SQL> alter session set statistics_level = all;

Session altered.

SQL> select t1.*
  2  from   t1, t1 t2
  3  where  t1.object_id < 0 -- > comment to avoid wordpress format issue
  4  and    t2.object_id = t1.object_id 
  5  /

no rows selected 

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  86ytbkc1fpbzt, child number 0
-------------------------------------
select t1.* from   t1, t1 t2 where  t1.object_id < 0 and
t2.object_id = t1.object_id

Plan hash value: 1734879205

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      0 |00:00:00.01 |      22 |     18 |       |       |          |
|*  1 |  HASH JOIN         |      |      1 |      1 |      0 |00:00:00.01 |      22 |     18 |   799K|   799K|  199K (0)|
|*  2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      0 |00:00:00.01 |      19 |     18 |       |       |          |
|*  3 |   TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_ID"="T1"."OBJECT_ID")
   2 - filter("T1"."OBJECT_ID"<0)
   3 - filter("T2"."OBJECT_ID"<0)

If you look at the Starts column for operation 3 you’ll see that the second tablescan doesn’t run. This makes sense since the A-Rows column of operation 2 reports zero rows, which means the hash join at operation 1 has no data for its build table, so there’s no point  in it calling operation 3 to search for probe  data that has nothing to match. (This optimization doesn’t necessarily appear for parallel hash joins).

You probably won’t see many cases of people writing code with literal predicates quite like “1 = 2”, of course, but you might see predicates like “:bindvar1 = ‘M'” fairly often, either because the client code really does contain the predicate or because the optimizer has produced it through some transformation (such as concatentation or nvl_or_expansion or transitive closure).

[1] In older versions of Oracle a predicate that was always false (i.e. a contradiction) would have been transformed into the predicate “1=0”

Constant Subqueries

Another example of a less common appearance of the FILTER operation comes from a requirement like: “if there are more than 10 orders outstanding show them to me”. Using the t1 table above, this could be modelled with a query like:


SQL> set serveroutput off
SQL> alter session set statistics_level = all;

Session altered.

SQL> select * from t1 
  2  where 10 <= (select count(*) from t1 where object_id < 0) -- > comment to avoid wordpress format issue
  3  /

no rows selected

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last alias'));

PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID  2pha6dc0b9zzq, child number 1
-------------------------------------
select * from t1 where 10 <= (select count(*) from t1 where object_id < 0) -- > comment to avoid wordpress format issue

Plan hash value: 2626881942

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      0 |00:00:00.01 |      19 |
|*  1 |  FILTER             |      |      1 |        |      0 |00:00:00.01 |      19 |
|   2 |   TABLE ACCESS FULL | T1   |      0 |   1000 |      0 |00:00:00.01 |       0 |
|   3 |   SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      19 |
|*  4 |    TABLE ACCESS FULL| T1   |      1 |      1 |      0 |00:00:00.01 |      19 |
--------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$1 / T1@SEL$1
   3 - SEL$2
   4 - SEL$2 / T1@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(>=10)
   4 - filter("OBJECT_ID"<0)

If you go at this plan in too much of a rush you might think it is a single query block and apply FCF to produce the description:
“For each row in the tablescan of t1 at operation 2 the FILTER operation at operation 1 calls operation 3 to do a full tablescan (operation 4) of the second copy of t1 – scalar subquery caching means we only do that tablescan once and then cache the result.”

But the query clearly starts out with two (unnamed) query blocks, and the Query Block Name / Object Alias information shows us we still have two query blocks, and the Starts column tells us that operation 2 didn’t run at all. This is a case where we have to think carefully about how to combine multiple query blocks after interpreting the sub-plan for each query block individually.

In this example, which I call the “constant subquery”, Oracle can evaluate the subquery first to decide whether or not to run the main query, and the visual impact of this is that the second (last) child of the FILTER runs before the first child. This is not breaking FCF – it’s just one of the patterns you have to recognise as Oracle combines multiple query blocks in a single plan.

Correlated Filter Subqueries

Moving on to one of the most common types of filter subquery – a simple correlated subquery – there are four patterns to watch out for, and a fifth pattern that isn’t a filter subquery that might fool you if you get too confident of handling filter subqueries. I’ll be using hints fairly aggressively to force the plans I want to see, but all the plans that I’ll show in this section could appear as a consequence of basic costing.

We start with an order_lines table and a products table, and have a requirement to report any order lines for a given product class where the number of items ordered is 6.


select  /*+ 
                qb_name(main) 
        */
        orl.*
from    order_lines     orl
where
        orl.quantity = 6
and     orl.id_product in (
                select  /*+ 
                                qb_name(class) 
                        */
                        prd.id
                from    products prd
                where   prd.class = 'Group25'
        )
;


In the four plans I’m going to show you I’ve added the /*+ no_unnest */ hint to the subquery (though I’ve not shown it in the text above) to make sure that the optimizer doesn’t transform the subquery into a join.

In the first plan I don’t have any indexes on the order_lines table that could help eliminate data early, I’ve also added the hint /*+ no_push_subq */ to the subquery so that the optimizer doesn’t try to move the subquery to the earlies possible point in the plan. The resulting plan looks like the “traditional” plan for a filter subquery – reporting a FILTER operation that (notionally) executes the subquery for each row returned by a tablescan.

NO_UNNEST, NO_PUSH_SUBQ
-----------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |       |          |
|*  1 |  FILTER                      |             |       |          |
|*  2 |   TABLE ACCESS FULL          | ORDER_LINES |  5556 | 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 | 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 | 00:00:01 |
-----------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - MAIN  / ORL@MAIN
   3 - CLASS / PRD@CLASS
   4 - CLASS / PRD@CLASS

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("ORL"."QUANTITY"=6)
   3 - filter("PRD"."CLASS"='Group25')
   4 - access("PRD"."ID"=:B1)

   1 - filter( EXISTS (SELECT /*+ NO_PUSH_SUBQ NO_UNNEST QB_NAME ("CLASS") */ 0 
                       FROM "PRODUCTS" "PRD" 
                       WHERE "PRD"."ID"=:B1 AND "PRD"."CLASS"='Group25'))

As you can see, I’ve reported the Query Block / Object Alias information and the two original query blocks are still clearly visible in this plan. Since I’ve pulled this plan from memory you’ll also note in the Predicate Information that the FILTER operation has “lost” the text of the filter() predicate.  This is why I’ve added at the end of the output the filter predicate reported by using explain plan with dbms_xplan.display().

For the next run I’m going to add an index on the quantity column of the order_lines table, and I’m going to tell the optimizer to run the subquery at the earliest possible moment (using the push_subq hint). This can produce two different plans – depending on whether or not the optimizer thinks it would be efficient to use the new index.

First, when the optimizer ignores the index:

NO_UNNEST, PUSH_SUBQ, order_lines index ignored
-----------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |   282 | 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | ORDER_LINES |   282 | 00:00:01 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 | 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 | 00:00:01 |
-----------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN  / ORL@MAIN
   2 - CLASS / PRD@CLASS
   3 - CLASS / PRD@CLASS

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ORL"."QUANTITY"=6 AND  IS NOT NULL)
   2 - filter("PRD"."CLASS"='Group25')
   3 - access("PRD"."ID"=:B1)


   1 - filter("ORL"."QUANTITY"=6 AND  EXISTS (SELECT /*+ PUSH_SUBQ
              NO_UNNEST QB_NAME ("CLASS") */ 0 FROM "PRODUCTS" "PRD" WHERE
              "PRD"."ID"=:B1 AND "PRD"."CLASS"='Group25'))

In the above you can see that Operation 2 appears to be the first child of operation 1 and if we applied FCF we would read this as “first use the primary key on products to pick up a single product and use it during a full tablescan of order_lines. Fortunately we can see the separate query blocks and so we ignore FCF. The filter() predicate for operation 1 (again revealed by the explain plan version) tells us that we have a filter subquery so “for each row we read in the tablescan we check if the quantity is 6 and if so we execute the subquery to see if the product is in class 25″. You’ll notice the :B1 in the filter predicate – this is the correlation variable where Oracle passes the order_lines.id_product to the subquery.

Now when the optimizer uses the index (which I had to hint in this case):

NO_UNNEST, PUSH_SUBQ, order_lines index used (quantity)
------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDER_LINES |   122 | 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ORL_QTY     |  5556 | 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID       | PRODUCTS    |     1 | 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN                | PRD_PK      |     1 | 00:00:01 |
------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN  / ORL@MAIN
   2 - MAIN  / ORL@MAIN
   3 - CLASS / PRD@CLASS
   4 - CLASS / PRD@CLASS

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - access("ORL"."QUANTITY"=6)
   3 - filter("PRD"."CLASS"='Group25')
   4 - access("PRD"."ID"=:B1)


   1 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("CLASS")
              */ 0 FROM "PRODUCTS" "PRD" WHERE "PRD"."ID"=:B1 AND
              "PRD"."CLASS"='Group25'))

In this case operation 1 is a “table access by rowid” that appears to have two child operations! Again, though, the query block information tells us that there is a separate query block starting at operation 3. So FCF applies only to operations 1 and 2, and we have to decide how to bring in the query block described by operations 3 and 4.

Checking the predicate information we can see, once again, that there is a “filter() gone empty” at operation 1. And the filter() predicate from explain plan tells us that this corresponds to executing a subquery against the products table. In this case we have simply used the index on (order_lines.quantity) to access only the rows with the correct quantity, and then executed the products subquery for each of those rows.

Finally I’ve created an index on order_lines(quantity, id_product), and forced the optimizer to use it with the following effect:

NO_UNNEST, PUSH_SUBQ, order_lines index used (quantity, id_product)
------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDER_LINES |   122 | 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ORL_QTY_PRD |   122 | 00:00:01 |
|*  3 |    TABLE ACCESS BY INDEX ROWID      | PRODUCTS    |     1 | 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | PRD_PK      |     1 | 00:00:01 |
------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN  / ORL@MAIN
   2 - MAIN  / ORL@MAIN
   3 - CLASS / PRD@CLASS
   4 - CLASS / PRD@CLASS

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORL"."QUANTITY"=6)
       filter( IS NOT NULL)
   3 - filter("PRD"."CLASS"='Group25')
   4 - access("PRD"."ID"=:B1)


   2 - access("ORL"."QUANTITY"=6)
       filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST QB_NAME ("CLASS")
              */ 0 FROM "PRODUCTS" "PRD" WHERE "PRD"."ID"=:B1 AND
              "PRD"."CLASS"='Group25'))

It would be very easy to read this plan using FCF from top to bottom – and then have to wonder how Oracle managed to find a single product to drive the query. But the query block information rescues us from this error, and we realise that there’s no predicate associatecd with operation 1 so we need to see a way that we can connect the index range scan at operation 2 with the subquery at operations 3 and 4.

Checking the predicate section (and the explan plan predicate) at operation 2 we can see that we execute the subquery as we are running the index range scan of the order_lines index and before we use any rowids to visit the table. So Oracle picks up an index entry (which includes a product id), executes the subquery for that entry’s product id, and only visits the order_lines table if the subquery says the product is in class 25.

Summary Warning

If the optimizer needs to execute a subquery as a filter subquery there are 4 basic patterns you might see in the execution plan. One of them includes an explicit, standalone, FILTER operation; the other three “hide” the filter operation and it appears only as a filter() predicate.

In two of the “hidden filter” cases the shape of the plan is distorted in a way that looks a little strange until you realise that there are two query blocks involved and FCF doesn’t apply across the query blocks; in the third case it would be very easy to read down the plan thinking that FCF was appropriate because the shape of the plan looks perfectly reasonable.

Footnote (The “driving” subquery)

It’s worth seeing one more query involving the order_lines table and giving us a fifth execution plan that highlights the ease with which subqueries can cause (temporary) confusion.

select  /*+ qb_name(main) */
        orl.*
from    order_lines orl
where   orl.id_ord = (
                select
                        /*+ qb_name(max) */ 
                        max(ord.id) 
                from    orders ord
        )
/

---------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |             |     5 |    90 |     9   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDER_LINES |     5 |    90 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ORL_PK      |     5 |       |     2   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE                   |             |     1 |     4 |            |          |
|   4 |     INDEX FULL SCAN (MIN/MAX)       | ORD_PK      |     1 |     4 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - MAIN / ORL@MAIN
   2 - MAIN / ORL@MAIN
   3 - MAX
   4 - MAX  / ORD@MAX

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ORL"."ID_ORD"=)

   2 - access("ORL"."ID_ORD"= (SELECT /*+ QB_NAME ("MAX") */ MAX("ORD"."ID") FROM "ORDERS"
              "ORD"))

As in earlier examples in this note I’ve reported the Predicate Information from the in-memory plan then followed this with the predicate section I got from using explain plan.

The basic shape of the plan suggests a simple FCF, and in this case if you follow that route you’ll get the right interpretation of how Oracle executes the query – but you’ll have got it right for the wrong reason.

The query block information tells you that operations 3 and 4 come from a separate query block so should be examined in isolation from operations 1 and 2, which form the main query block. To combine the query blocks we then note that operation 2 (the index range scan) uses the subquery as an access predicate, not a filter predicate, so the subquery has to execute first to supply a value for the main query to use as a driving value for the index range scan.

When you have to handle subqueries in execution plans make sure you check whether they operate as filter subqueries or access subqueries before you try to interpret how the interact with the rest of the plan.

 

Hint hacking

Filed under: Execution plans,Hints,Oracle — Jonathan Lewis @ 8:49 am BST May 1,2020

How do you work out what hints you need to tweak an execution plan into the shape you want?

Here’s a “case study” that’s been playing out over a few weeks on the Oracle Developer Community (here and here) and most recently ended up (in one of its versions) as a comment on one of my blog notes. It looks like a long note, but it’s a note about how to find the little bit of information you need from a large output – so it’s really a short note that has to include a long output.

 

Problem: a query is not running fast enough, and it runs a very large number of times in a single batch (the original trace/tkprof file reported 842,000 executions). Each individual execution, though, is very quick (as far as we know – the individual examples we have seen take a few hundredths of a second). Here’s one execution plan for the query with Query Block / Object Alias information and Outline Data pulled from memory with rowsource execution statistics enabled.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                                |      1 |        |   574 (100)|      1 |00:00:00.02 |    3822 |       |       |         |
|   1 |  SORT AGGREGATE                             |                                |      1 |      1 |            |      1 |00:00:00.02 |    3822 |       |       |         |
|*  2 |   VIEW                                      |                                |      1 |      1 |   574   (2)|      0 |00:00:00.02 |    3822 |       |       |         |
|*  3 |    COUNT STOPKEY                            |                                |      1 |        |            |      2 |00:00:00.02 |    3822 |       |       |         |
|   4 |     VIEW                                    |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |       |       |         |
|*  5 |      SORT ORDER BY STOPKEY                  |                                |      1 |      1 |   574   (2)|      2 |00:00:00.02 |    3822 |  2048 |  2048 | 2048  (0)|
|*  6 |       FILTER                                |                                |      1 |        |            |    171 |00:00:00.02 |    3822 |       |       |         |
|   7 |        NESTED LOOPS                         |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    3128 |       |       |         |
|   8 |         NESTED LOOPS                        |                                |      1 |      1 |   568   (2)|    182 |00:00:00.02 |    2946 |       |       |         |
|   9 |          NESTED LOOPS                       |                                |      1 |      1 |   567   (2)|    182 |00:00:00.02 |    2942 |       |       |         |
|  10 |           NESTED LOOPS                      |                                |      1 |      1 |   566   (2)|    182 |00:00:00.02 |    2938 |       |       |         |
|  11 |            NESTED LOOPS ANTI                |                                |      1 |      1 |   565   (2)|    182 |00:00:00.02 |    2752 |       |       |         |
|  12 |             NESTED LOOPS ANTI               |                                |      1 |      1 |   562   (2)|    182 |00:00:00.02 |    2388 |       |       |         |
|* 13 |              HASH JOIN                      |                                |      1 |      5 |   557   (2)|    182 |00:00:00.02 |    2022 |  1599K|  1599K| 1503K (0)|
|  14 |               VIEW                          | index$_join$_008               |      1 |    127 |     2   (0)|    127 |00:00:00.01 |       8 |       |       |         |
|* 15 |                HASH JOIN                    |                                |      1 |        |            |    127 |00:00:00.01 |       8 |  1368K|  1368K| 1522K (0)|
|  16 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_CODE_UK             |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|  17 |                 INDEX FAST FULL SCAN        | XXADM_LOVS_PK                  |      1 |    127 |     1   (0)|    127 |00:00:00.01 |       4 |       |       |         |
|* 18 |               HASH JOIN                     |                                |      1 |    478 |   555   (2)|    182 |00:00:00.01 |    2014 |  1245K|  1245K| 1277K (0)|
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |   243   (2)|    209 |00:00:00.01 |     883 |       |       |         |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |     2   (0)|      1 |00:00:00.01 |       4 |       |       |         |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |     1   (0)|      1 |00:00:00.01 |       2 |       |       |         |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |     0   (0)|      1 |00:00:00.01 |       1 |       |       |         |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |   241   (2)|    209 |00:00:00.01 |     879 |       |       |         |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |   311   (2)|  10488 |00:00:00.01 |    1131 |       |       |         |
|* 27 |              TABLE ACCESS BY INDEX ROWID    | XXADM_APPLICANT_COURSPREFS_TBL |    182 |   8881 |     1   (0)|      0 |00:00:00.01 |     366 |       |       |         |
|* 28 |               INDEX UNIQUE SCAN             | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     0   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|  29 |             VIEW PUSHED PREDICATE           | VW_SQ_1                        |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|  30 |              NESTED LOOPS                   |                                |    182 |      1 |     3   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 31 |               TABLE ACCESS BY INDEX ROWID   | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     2   (0)|      0 |00:00:00.01 |     364 |       |       |         |
|* 32 |                INDEX UNIQUE SCAN            | XXADM_APPLCNT_PREF_ORDER_UK    |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     184 |       |       |         |
|* 33 |               TABLE ACCESS BY INDEX ROWID   | XXADM_CATEGORY_MASTER_TBL      |      0 |      1 |     1   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|* 34 |                INDEX UNIQUE SCAN            | XXADM_CATEGORY_PK              |      0 |      1 |     0   (0)|      0 |00:00:00.01 |       0 |       |       |         |
|  35 |            TABLE ACCESS BY INDEX ROWID      | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     186 |       |       |         |
|* 36 |             INDEX UNIQUE SCAN               | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 37 |           INDEX UNIQUE SCAN                 | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|* 38 |          INDEX UNIQUE SCAN                  | XXADM_LOVS_PK                  |    182 |      1 |     0   (0)|    182 |00:00:00.01 |       4 |       |       |         |
|  39 |         TABLE ACCESS BY INDEX ROWID         | XXADM_LOV_MASTER_TBL           |    182 |      1 |     1   (0)|    182 |00:00:00.01 |     182 |       |       |         |
|* 40 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    182 |      1 |     3   (0)|     29 |00:00:00.01 |     507 |       |       |         |
|* 41 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_PREFS_UK         |    182 |      5 |     2   (0)|   1450 |00:00:00.01 |     191 |       |       |         |
|  42 |        TABLE ACCESS BY INDEX ROWID BATCHED  | XXADM_APPLICANT_COURSPREFS_TBL |    171 |      1 |     2   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 43 |         INDEX RANGE SCAN                    | XXADM_APPLCNT_APPLICANT_STATUS |    171 |      1 |     1   (0)|      0 |00:00:00.01 |     173 |       |       |         |
|* 44 |        VIEW                                 | index$_join$_014               |      6 |      1 |     0   (0)|      0 |00:00:00.01 |      14 |       |       |         |
|* 45 |         HASH JOIN                           |                                |      6 |        |            |      0 |00:00:00.01 |      14 |  1519K|  1519K|  666K (0)|
|* 46 |          INDEX RANGE SCAN                   | XXADM_CATEGORY_PK              |      6 |      1 |     0   (0)|      6 |00:00:00.01 |       6 |       |       |         |
|  47 |          INLIST ITERATOR                    |                                |      6 |        |            |     12 |00:00:00.01 |       8 |       |       |         |
|* 48 |           INDEX UNIQUE SCAN                 | XXADM_CATEGORY_CODE_UK         |     12 |      1 |     0   (0)|     12 |00:00:00.01 |       8 |       |       |         |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$2        / from$_subquery$_001@SEL$1
   3 - SEL$2
   4 - SEL$7E0D484F / from$_subquery$_002@SEL$2
   5 - SEL$7E0D484F
  14 - SEL$082F290F / LMT_GENDER@SEL$3
  15 - SEL$082F290F
  16 - SEL$082F290F / indexjoin$_alias$_001@SEL$082F290F
  17 - SEL$082F290F / indexjoin$_alias$_002@SEL$082F290F
  21 - SEL$7E0D484F / CMT@SEL$3
  22 - SEL$7E0D484F / CMT@SEL$3
  23 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  24 - SEL$7E0D484F / LMT_EDUCATION_TYPE@SEL$3
  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3
  27 - SEL$7E0D484F / ACT3@SEL$7
  28 - SEL$7E0D484F / ACT3@SEL$7
  29 - SEL$A75BE177 / VW_SQ_1@SEL$67DC521B
  30 - SEL$A75BE177
  31 - SEL$A75BE177 / ACT1@SEL$8
  32 - SEL$A75BE177 / ACT1@SEL$8
  33 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  34 - SEL$A75BE177 / XXADM_CATEGORY_MASTER_TBL@SEL$9
  35 - SEL$7E0D484F / LMT_PASS@SEL$3
  36 - SEL$7E0D484F / LMT_PASS@SEL$3
  37 - SEL$7E0D484F / LMT_APPEARANCE@SEL$3
  38 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  39 - SEL$7E0D484F / LMT_RELIGION@SEL$3
  40 - SEL$5        / ACT1@SEL$5
  41 - SEL$5        / ACT1@SEL$5
  42 - SEL$6        / ACT2@SEL$6
  43 - SEL$6        / ACT2@SEL$6
  44 - SEL$F665FE1B / XXADM_CATEGORY_MASTER_TBL@SEL$4
  45 - SEL$F665FE1B
  46 - SEL$F665FE1B / indexjoin$_alias$_001@SEL$F665FE1B
  48 - SEL$F665FE1B / indexjoin$_alias$_002@SEL$F665FE1B

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_optimizer_dsdir_usage_control' 0)
      OPT_PARAM('_optimizer_adaptive_plans' 'false')
      OPT_PARAM('_optimizer_gather_feedback' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$F665FE1B")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SEL$6")
      OUTLINE_LEAF(@"SEL$A75BE177")
      PUSH_PRED(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B" 16 15)
      OUTLINE_LEAF(@"SEL$082F290F")
      OUTLINE_LEAF(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$180402DE")
      OUTLINE(@"SEL$7E0D484F")
      UNNEST(@"SEL$9D10C90A")
      UNNEST(@"SEL$7")
      OUTLINE(@"SEL$67DC521B")
      OUTLINE(@"SEL$9D10C90A")
      UNNEST(@"SEL$9")
      OUTLINE(@"SEL$7")
      OUTLINE(@"SEL$C04829E0")
      ELIMINATE_JOIN(@"SEL$3" "CRMT"@"SEL$3")
      ELIMINATE_JOIN(@"SEL$3" "MMT"@"SEL$3")
      OUTLINE(@"SEL$8")
      OUTLINE(@"SEL$9")
      OUTLINE(@"SEL$3")
      NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
      NO_ACCESS(@"SEL$2" "from$_subquery$_002"@"SEL$2")
      INDEX_RS_ASC(@"SEL$7E0D484F" "CMT"@"SEL$3" ("XXADM_COLLEGE_MASTER_TBL"."COLLEGE_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      INDEX_JOIN(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_CODE") ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "ACT3"@"SEL$7" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      NO_ACCESS(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX_RS_ASC(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      INDEX(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3" ("XXADM_LOV_MASTER_TBL"."LOV_ID"))
      LEADING(@"SEL$7E0D484F" "CMT"@"SEL$3" "LMT_EDUCATION_TYPE"@"SEL$3" "ACT"@"SEL$3" "ADT"@"SEL$3" "LMT_GENDER"@"SEL$3" "ACT3"@"SEL$7" "VW_SQ_1"@"SEL$67DC521B"
              "LMT_PASS"@"SEL$3" "LMT_APPEARANCE"@"SEL$3" "LMT_RELIGION"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_EDUCATION_TYPE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "ACT3"@"SEL$7")
      USE_NL(@"SEL$7E0D484F" "VW_SQ_1"@"SEL$67DC521B")
      USE_NL(@"SEL$7E0D484F" "LMT_PASS"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_APPEARANCE"@"SEL$3")
      USE_NL(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      NLJ_BATCHING(@"SEL$7E0D484F" "LMT_RELIGION"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$7E0D484F" "LMT_GENDER"@"SEL$3")
      PQ_FILTER(@"SEL$7E0D484F" SERIAL)
      INDEX_RS_ASC(@"SEL$A75BE177" "ACT1"@"SEL$8" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."PREFERENCE_ORDER"))
      INDEX_RS_ASC(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID"))
      LEADING(@"SEL$A75BE177" "ACT1"@"SEL$8" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      USE_NL(@"SEL$A75BE177" "XXADM_CATEGORY_MASTER_TBL"@"SEL$9")
      INDEX_RS_ASC(@"SEL$6" "ACT2"@"SEL$6" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."STATUS_FLAG"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$6" "ACT2"@"SEL$6")
      INDEX_RS_ASC(@"SEL$5" "ACT1"@"SEL$5" ("XXADM_APPLICANT_COURSPREFS_TBL"."APPLICANT_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."COLLEGE_ID"
              "XXADM_APPLICANT_COURSPREFS_TBL"."COURSE_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."MEDIUM_ID" "XXADM_APPLICANT_COURSPREFS_TBL"."HOSTEL_REQUIRED"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$5" "ACT1"@"SEL$5")
      INDEX_JOIN(@"SEL$4" "XXADM_CATEGORY_MASTER_TBL"@"SEL$4" ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_ID") ("XXADM_CATEGORY_MASTER_TBL"."CATEGORY_CODE"))
      END_OUTLINE_DATA
  */

This is just one of a handful of variations that all look fairly similar and there was plenty that could be said about the query and the plan; I only want to look at one idea, though. The point came where the suggestion came to eliminate the the full tablescans at operations 25 and 26. Here’s the relevant section of the plan, stripped back a bit to make it narrower:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                           | Starts | E-Rows |
--------------------------------------------------------------------------------------------------------
|* 18 |               HASH JOIN                     |                                |      1 |    478 |
|  19 |                NESTED LOOPS                 |                                |      1 |    478 |
|  20 |                 NESTED LOOPS                |                                |      1 |      1 |
|  21 |                  TABLE ACCESS BY INDEX ROWID| XXADM_COLLEGE_MASTER_TBL       |      1 |      1 |
|* 22 |                   INDEX UNIQUE SCAN         | XXADM_COLLEGES_PK              |      1 |      1 |
|  23 |                  TABLE ACCESS BY INDEX ROWID| XXADM_LOV_MASTER_TBL           |      1 |      1 |
|* 24 |                   INDEX UNIQUE SCAN         | XXADM_LOVS_PK                  |      1 |      1 |
|* 25 |                 TABLE ACCESS FULL           | XXADM_APPLICANT_COURSPREFS_TBL |      1 |    478 |
|* 26 |                TABLE ACCESS FULL            | XXADM_APPLICANT_DETAILS_TBL    |      1 |   6685 |
--------------------------------------------------------------------------------------------------------

To isolate the above as a relevant, self-contained, part of the plan I’ve checked that operation 26 has no child operations, and I’ve scanned up the plan to find the parent of child 26 – which turns out to be operation 18, which is a hash join with a nested loop (operation 19) as its first child and operation 26 as its second chlid.

We want to change operations 25 and 26 from full tablescans to indexed accesses; that’s the only change we need make for operation 25 which is the second table of a nested loop join, but we’ll also want to change the hash join at operation 18 into a nested loop join. To make it easy to create the right hints we start by checking the Query Block / Object Alias information to identify exactly what we’re dealing with and “where” we’re dealing with it in operations 25 and 26.

  25 - SEL$7E0D484F / ACT@SEL$3
  26 - SEL$7E0D484F / ADT@SEL$3

Now we can look in the Outline Data section for the hints which will say “do full tablescans on acr@sel$3 and adt@sel$3 in query block sel$7E0D484F; and we’ll need to find a hint that tells us to do a hash join with adt4@sel$3 – and this is what we find:

      FULL(@"SEL$7E0D484F" "ACT"@"SEL$3")
      FULL(@"SEL$7E0D484F" "ADT"@"SEL$3")
      USE_HASH(@"SEL$7E0D484F" "ADT"@"SEL$3")

We were a little lucky with the use_hash() hint here, as the situation could have been made a little murkier if the table we were after had also been subject to swapping join inputs (the swap_join_inputs() hint).

So all we need to do now is change those hints which (getting rid of redundant quotes, and converting to lower case because I don’t like block capitals everywhere) gives us the following:


index( @sel$7e0d484f act@sel$3 {name/definition of index})
index( @sel$7e0d484f adt@sel$3 {name/definition of index})
use_nl(@sel$7e0d484f adt@sel$3)

You have to decide your strategy for getting these hints in place, of course. Just sticking the three hints into the query probably isn’t a stable solution. Editing the outline information to include these hints (replacing the previous 3) then copying the whole outline into the query is a little messy and may not be allowed at your site. Creating an SQL Patch (with recent versions of Oracle) or an SQL Plan Baseline is probably the most appropriate strategy (possibly hacked into an SQL Profile, but I don’t like doing that). That’s a topic for another blog note, though, which I don’t need to write.

Summary

If you have a complex plan that needs a little tweaking, it’s fairly easy to find out how to change the current Outline Data to get where you want to be if you start by looking at the Query Block / Object Alias section of the plan for the operations you want to change, and then search the Outline  Data for the query blocks, aliases and operations you’ve identified.

April 29, 2020

Execution Plans

Filed under: Execution plans,Hints,Oracle — Jonathan Lewis @ 1:52 pm BST Apr 29,2020

A couple of days ago I discussed an execution plan that displayed some variation in the way it handled subqueries and even threw in a little deception by displaying an anti-join that was the result of transforming a “not exists” subquery and a semi-join that looked at first sight as if it were going to be the result of transforming an “exists” subquery.

As part of the dissection I reverse engineered the query into a set of tables that would allow me to reproduce the execution plan so that I could report the “final query blocks” (outline_leafs). As a brief appendix to that blog note I’m publishing here the script to create those tables and three plans that I went through to get to the plan I needed.


rem
rem     Script:         anti_semi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.1.0.2
rem

create table ip_spells 
partition by range (admission_dttm) interval (numtoyminterval(1,'MONTH')) (
        partition p_start values less than (to_date('01-Jan-2020'))
)
as
with g as (
        select rownum id from dual
        connect by level <= 1e4 -- > avoid wordpress format issue
)
select
        rownum                                  spell_id,
        to_char(mod(rownum,75),'FM00')          admission_method_code,
        to_date('01-jan_2020') + rownum / 1000  admission_dttm,
        to_char(mod(rownum,57),'FM00')          administrative_category_code,
        lpad(rownum,10)                         v1,
        rpad('x',100)                           padding
from
        g,g
where
        rownum <= 365000
;

alter table ip_spells add constraint sp_pk primary key(spell_id);

create table ip_episodes
as
select
        spell_id,
        spell_id        episode_id,
        1+mod(rownum,6) episode_order,
        to_char(mod(rownum,125)+ 500,'FM999')   main_specialty_code,
        v1,
        padding
from
        ip_spells
order by
        dbms_random.value
;

alter table ip_episodes add constraint ep_pk primary key(episode_id);
alter table ip_episodes add constraint ep_fk_sp foreign key (spell_id) references ip_spells;
create index idx02_ip_episodes on ip_episodes(spell_id);


create table ip_diagnosis
as
select
        episode_id,
        chr(mod(rownum,25) + 65) ||
                to_char(dbms_random.value(30,512),'FM999')      diagnosis_code,
        mod(rownum,5)                                           diagnosis_sequence,
        lpad(rownum,10)                 v1,
        rpad('x',100)                   padding
from
        (select rownum id from dual connect by level <= 5),
        ip_episodes
;

alter table ip_diagnosis add constraint di_fk_ep foreign key(episode_id) references ip_episodes;
create index idx01_ip_diagnosis on ip_diagnosis(episode_id);

The original ip_spells table was range partitioned and there was a date-based predicate in the query that encouraged me to use the date column as the partitioning column. I also decided to generate data for one year at 1,000 rows per day from the start of a year, with interval partitioning of one month to get a small number of partitions.

The original plan suggested that the number of ip_episodes was similar to the number of ip_spells, so I just used a copy of the rows from ip_spells to create ip_epsisodes, and then gave it some appropriate primary and foreign key constraints and indexes.

Finally, the ip_diagnosis table looked as if it held an average of 5 rows per ip_episodes, so I generated it from ip_episodes by joining to a 5-row set generated by the usual “connect by” trick with dual.

I’ve only got a small data set, and most of the indexes are sequence based with excellent clustering_factors, so I wasn’t going to be surprised if my data and stats didn’t immediately produce the execution plan of the original query.

Here’s the original query (just as a reminder), and the first plan I got with no hinting (running 12.1.0.2):


select
        * 
from 
        ip_spells a
where 
        not exists (
                select
                        1
                from
                        ip_episodes e
                inner join 
                        ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                           substr(d.diagnosis_code,1,1) = 'C'
                        or substr(d.diagnosis_code,1,3) between 'D37' and 'D48'
                        or substr(d.diagnosis_code,1,1)  = 'V'
                        or d.diagnosis_code = 'Z511'
                )
        )
and     exists (
                select
                        1
                from
                        ip_episodes e
                left join 
                        ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                            e.episode_order = '1'
                        and e.main_specialty_code not in ('501','560','610')
                        and d.diagnosis_sequence = 1
                        and substr(d.diagnosis_code,1,1) <> 'O'
                        )
                )
and     substr(a.admission_method_code,1,1) = '2'                       -- 1% selectivity on substr()
and     a.admission_dttm >= to_date('01-jan-2011', 'dd-mon-yyyy')
and     a.administrative_category_code = '01'                           -- 1 / 57 by definition
;


Plan hash value: 1492475845

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |     1 |   131 |  1299   (5)| 00:00:06 |       |       |
|*  1 |  FILTER                               |                    |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ALL                 |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  3 |    TABLE ACCESS FULL                  | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   4 |   NESTED LOOPS SEMI                   |                    |     1 |    20 |    11   (0)| 00:00:01 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    10 |     4   (0)| 00:00:01 |       |       |
|*  6 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|*  7 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 40931 |   399K|     7   (0)| 00:00:01 |       |       |
|*  8 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
|   9 |   NESTED LOOPS SEMI                   |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

As you can see, both subqueries ran as FILTER subqueries. If I were to include the query block information you would see that operations 4 to 8 are the (transformed) “not exists” subquery, operations 9 to 13 are the transformed “exists” subquery, and operations 1 to 3 represent the main body of the query.

I needed to see the “not exists” subquery unnested and transformed into a hash anti-join, so my first attempt at hinting was to add an /*+ unnest */ hint to that subquery, producing the following plan:


Plan hash value: 147447036
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |     1 |   133 |  1652   (4)| 00:00:07 |       |       |
|*  1 |  FILTER                                 |                    |       |       |            |          |       |       |
|   2 |   NESTED LOOPS ANTI                     |                    |     1 |   133 |  1641   (4)| 00:00:07 |       |       |
|   3 |    PARTITION RANGE ALL                  |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                   | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    VIEW PUSHED PREDICATE                | VW_SQ_1            |     1 |     2 |    11   (0)| 00:00:01 |       |       |
|   6 |     NESTED LOOPS SEMI                   |                    |     1 |    20 |    11   (0)| 00:00:01 |       |       |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    10 |     4   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 40931 |   399K|     7   (0)| 00:00:01 |       |       |
|* 10 |       INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
|  11 |   NESTED LOOPS SEMI                     |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED  | IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                    | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 14 |    TABLE ACCESS BY INDEX ROWID BATCHED  | IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 15 |     INDEX RANGE SCAN                    | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------

You can see the unnested subquery in the name vw_sq_1 at operation 5; but we have a nested loop anti at operation 2 when I want a hash join anti, and we also see the option “pushed predicate” for the view at operation 5.

If I published the query block information in this case you would still see 3 “final” query blocks. Operations 11 to 15 would be the “exists” subquery; operation 5 to 10 would be the query block for the non-mergeable view that the optimizer produced by unnesting the “not exists” subquery; and operations 1 to 4 represent the overall parent query block.

This example highlights a detail which is a little easy to miss in earlier discussions of the query and its plans. Some operations in a plan look as if they could be associated with two query block names – the query block for which they are the top line, and the query block by which they are used.

Taking this plan as an example, operation 5 is clearly the starting point of the query block from operations 5 to 10, and operation 11 is clearly the starting point for the query block from operations 11 to 15. On the other hand when we collapse query blocks to understand the overall structure of how the query operates we get the following plan – in which we view “Rowsource 2” and “Rowsource 3” as simple data sources in the main query block – and it would be nice to see operations 5 and 11 with the same query block name as operations 1 to 4.

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |     1 |   133 |  1652   (4)| 00:00:07 |       |       |
|*  1 |  FILTER                                 |                 |       |       |            |          |       |       |
|   2 |   NESTED LOOPS ANTI                     |                 |     1 |   133 |  1641   (4)| 00:00:07 |       |       |
|   3 |    PARTITION RANGE ALL                  |                 |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                   | IP_SPELLS       |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    Rowsource 2 : VIEW PUSHED PREDICATE  | VW_SQ_1         |     1 |     2 |    11   (0)| 00:00:01 |       |       |
|  11 |   Rowsource 3 : NESTED LOOPS SEMI       |                 |     1 |    30 |    11   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------

We still have a little work to do to get where we want to be: the /*+ unnest */ hint has got us part way to the plan we want –  so (taking my cue from the  pushed predicate option) to get from the nested loop anti join to the hash anti join I decided to try changing the hint in the “not exists” subquery to /*+ unnest no_push_pred */ – and this is the plan that appeared as a result:


Plan hash value: 2721384176
----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |     1 |   144 |  6633  (11)| 00:00:26 |       |       |
|*  1 |  FILTER                               |                    |       |       |            |          |       |       |
|*  2 |   HASH JOIN ANTI                      |                    |     1 |   144 |  6622  (11)| 00:00:26 |       |       |
|   3 |    PARTITION RANGE ALL                |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    VIEW                               | VW_SQ_1            | 40931 |   519K|  5685  (12)| 00:00:23 |       |       |
|*  6 |     HASH JOIN                         |                    | 40931 |   799K|  5685  (12)| 00:00:23 |       |       |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       | 40931 |   399K|  4761  (13)| 00:00:19 |       |       |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        |   365K|  3564K|   906   (5)| 00:00:04 |       |       |
|   9 |   NESTED LOOPS SEMI                   |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

It looks the right shape, it’s got the right mechanisms in place, and (very importantly – though not a 100% guarantee) it’s got the same plan_hash_value as the orginally published plan. So at this point I felt the final query block names it reported would reflect the ones that would have been used in the original plan.

In a production system, of course, you don’t just stick a couple of hints into a query and hope it will be enough to stabilise the plan. Here’s the full set of hints that appeared in the Outline Data when I added my two hints to the query:


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      PARTIAL_JOIN(@"SEL$2B0A33EA" "D"@"SEL$2")
      USE_NL(@"SEL$2B0A33EA" "D"@"SEL$2")
      LEADING(@"SEL$2B0A33EA" "E"@"SEL$3" "D"@"SEL$2")
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "D"@"SEL$2")
      INDEX_RS_ASC(@"SEL$2B0A33EA" "D"@"SEL$2" ("IP_DIAGNOSIS"."EPISODE_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "E"@"SEL$3")
      INDEX_RS_ASC(@"SEL$2B0A33EA" "E"@"SEL$3" ("IP_EPISODES"."SPELL_ID"))
      USE_HASH(@"SEL$8D33959D" "E"@"SEL$1")
      LEADING(@"SEL$8D33959D" "D"@"SEL$1" "E"@"SEL$1")
      FULL(@"SEL$8D33959D" "E"@"SEL$1")
      FULL(@"SEL$8D33959D" "D"@"SEL$1")
      PQ_FILTER(@"SEL$2B969259" SERIAL)
      USE_HASH(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
      LEADING(@"SEL$2B969259" "A"@"SEL$4" "VW_SQ_1"@"SEL$F49409E0")
      NO_ACCESS(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
      FULL(@"SEL$2B969259" "A"@"SEL$4")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$5")
      MERGE(@"SEL$2")
      OUTLINE(@"SEL$64EAE176")
      OUTLINE(@"SEL$6")
      OUTLINE(@"SEL$F49409E0")
      MERGE(@"SEL$1")
      OUTLINE(@"SEL$8C3A16E3")
      MERGE(@"SEL$64EAE176")
      OUTLINE(@"SEL$4B169FC8")
      UNNEST(@"SEL$8C3A16E3")
      OUTLINE_LEAF(@"SEL$2B969259")
      OUTLINE_LEAF(@"SEL$8D33959D")
      OUTER_JOIN_TO_INNER(@"SEL$4B169FC8" "D"@"SEL$2")
      OUTLINE_LEAF(@"SEL$2B0A33EA")
      ALL_ROWS
      DB_VERSION('12.1.0.2')
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

When I put my two hints into the query text and reran the test under 19.3 the FILTER operation disappeared and the “exists” subquery also unnested (to become vw_sq_2), turning into a nested loop semi-join. With the full set of 40 hints in place the plan from 12.1.0.2 re-appeared.

I did actually have an alternative strategy for the manual hint test. The plan with the nested loop anti join reported the following query block information:


Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$2B969259
   4 - SEL$2B969259 / A@SEL$4
   5 - SEL$D276C01C / VW_SQ_1@SEL$F49409E0
   6 - SEL$D276C01C
   7 - SEL$D276C01C / E@SEL$1
   8 - SEL$D276C01C / E@SEL$1
   9 - SEL$D276C01C / D@SEL$1
  10 - SEL$D276C01C / D@SEL$1
  11 - SEL$2B0A33EA
  12 - SEL$2B0A33EA / E@SEL$3
  13 - SEL$2B0A33EA / E@SEL$3

This prompted me to tell Oracle to do a hash join in query block SEL$2B969259 between A@SEL$4 and VW_SQ_1@SEL$F49409E0 (in that order) by adding a simple set of hints to the start of the query while leaving (just) the /*+ unnest */ hint in the “not exists” subquery.


select
        /*+ 
                leading (@sel$2b969259  a@sel$4  vw_sq_1@sel$f49409e0)
                use_hash(@sel$2b969259  vw_sq_1@sel$f49409e0)
                no_swap_join_inputs(@sel$2b969259 vw_sq_1@sel$f49409e0)
        */
        * 
from 
 

Again, this produced the required execution path.

It’s not always this easy to reproduce an execution plan by looking at a query text – sometimes the actual plan depends on knowing about things like unique constraints, foreign key constraints, and not null constraints. But if you can get your hands on the SQL to create the objects involved it’s usually possible to re-create the plan by hinting, which then allows you to discover how you change the hints to modify the plan.

 

April 27, 2020

Execution Plans

Filed under: Execution plans,Oracle — Jonathan Lewis @ 11:57 am BST Apr 27,2020

In a recent blog note I made the point that there is a very simple rule (“first child first”) for reading execution plans if the query (as written or after transformation by the optimizer) consists of a single “query block”. However, if you have a plan that is reporting multiple query blocks you have to be careful that you identify the boundaries of the individual query blocks and manage to link them together correctly.

In this note I’m going to work through an example that appeared on the Oracle Developer Forum a couple of years ago where someone produced a query that may have fooled them into misreading the execution plan. It’s a very good example of a production plan that is sufficiently complex to be interesting and make a couple of useful points but still produces a plan that is short enough to pick apart in just a couple of pages of text.

The initial question was about an oddity (actually a bug) in the optimizer’s cardinality estimate for one of the plan operations, but my answer to the question produced a follow-up question which said:

“Okay so it’s an error, was wondering why for one of the subqueries it’s using a hash join (7,8) and the other a nested loop.”

Here’s the query that produced this question. Before jumping straight to the plan it’s a good idea to take a close look at the query, take note of any optimizer strategies you might see, any features of the query that might make a big difference to the optimizer strategies, and any details that might mean the plan doesn’t do what you might expect.

Since the Oracle version can have a big impact on what the optimizer can do, I’ll point out that this was running on 12.1.0.2


select  /*+ gather_plan_statistics */ 
        * 
from 
        dwh_prod.ip_spells a
where 
        not exists (
                select  1
                from
                        dwh_prod.ip_episodes e
                inner join 
                        dwh_prod.ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                           substr(d.diagnosis_code,1,1) = 'C'
                        or substr(d.diagnosis_code,1,3) between 'D37' and 'D48'
                        or substr(d.diagnosis_code,1,1)  = 'V'
                        or d.diagnosis_code = 'Z511'
                )
        )
and     exists (
                select  1
                from
                        dwh_prod.ip_episodes e
                left join 
                        dwh_prod.ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                            e.episode_order = '1'
                        and substr(d.diagnosis_code,1,1) <> 'O'
                        and d.diagnosis_sequence = 1
                        and e.main_specialty_code not in ('501','560','610')
                        )
                )
and     substr(a.admission_method_code,1,1) = '2'
and     a.admission_dttm >= to_date('01-jan-2011', 'dd-mon-yyyy')
and     administrative_category_code = '01'

Point to note:

  • It’s a fairly simple query – one table, with two correlated subqueries.
  • The correlating predicate for the first (not exists) subquery is at line 15
  • The correlating predicate for the second (exists) subquery is at line 32
  • Both subqueries are joins between two tables, and the two tables are the same in both cases.
  • The first subquery is an “inner join” (line 10), the second subquery is a “left join” (line 27)
  • There’s an error in the code at line 36!
    • the predicate “d.diagnosis_sequence = 1” will eliminate any ip_episode (e) rows that have been preserved by the outer join
    • so the optimizer will automatically treat the outer join as an inner join
  • We might see either subquery running as a filter subquery
  • We might see either subquery unnested into an inline view – with, or without, “pushed predicate”
  • We might then see see complex view merging turn the “exists” subquery into a semi-join and/or the “not exists” into an anti-join

That last comment is something that makes it easy to jump to conclusions while reading the plan and head off in the wrong direction. So let’s take a careful look at the execution plan – which is an actual run-time plan reporting the rowsource execution statistics:


SELECT * FROM TABLE(DBMS_XPLAN.display_cursor(format=>'ALLSTATS LAST'));

Plan hash value: 2721384176
----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |      1 |        |     50 |00:00:06.13 |     283K|  94413 |       |       |          |
|*  1 |  FILTER                               |                    |      1 |        |     50 |00:00:06.13 |     283K|  94413 |       |       |          |
|*  2 |   HASH JOIN ANTI                      |                    |      1 |    156 |     51 |00:00:05.70 |     282K|  94247 |   143M|  7865K|  143M (0)|
|   3 |    PARTITION RANGE ALL                |                    |      1 |  15592 |    425K|00:00:00.65 |   79553 |      0 |       |       |          |
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          |     24 |  15592 |    425K|00:00:00.58 |   79553 |      0 |       |       |          |
|   5 |    VIEW                               | VW_SQ_1            |      1 |    530K|    464K|00:00:04.74 |     203K|  94247 |       |       |          |
|*  6 |     HASH JOIN                         |                    |      1 |    530K|    464K|00:00:04.65 |     203K|  94247 |    26M|  3954K|   34M (0)|
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       |      1 |    528K|    464K|00:00:03.12 |     109K|      0 |       |       |          |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        |      1 |   2491K|   2495K|00:00:00.44 |   94253 |  94247 |       |       |          |
|   9 |   NESTED LOOPS SEMI                   |                    |     51 |      1 |     50 |00:00:00.36 |     514 |    166 |       |       |          |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     51 |      1 |     51 |00:00:00.15 |     229 |     92 |       |       |          |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     51 |      1 |     76 |00:00:00.13 |     153 |     51 |       |       |          |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       |     51 |   2688K|     50 |00:00:00.21 |     285 |     74 |       |       |          |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     51 |      5 |    174 |00:00:00.21 |     153 |     74 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - access(A.SPELL_ID=ITEM_1)
   4 - filter((SUBSTR(A.ADMISSION_METHOD_CODE,1,1)='2' AND A.ADMISSION_DTTM>=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND ADMINISTRATIVE_CATEGORY_CODE='01'))
   6 - access(D.EPISODE_ID=E.EPISODE_ID)
   7 - filter((SUBSTR(D.DIAGNOSIS_CODE,1,1)='C' OR SUBSTR(D.DIAGNOSIS_CODE,1,1)='V' OR (SUBSTR(D.DIAGNOSIS_CODE,1,3)>='D37' AND
              SUBSTR(D.DIAGNOSIS_CODE,1,3)<='D48') OR D.DIAGNOSIS_CODE='Z511'))
  10 - filter((E.EPISODE_ORDER=1 AND E.MAIN_SPECIALTY_CODE<>'501' AND E.MAIN_SPECIALTY_CODE<>'560' AND
              E.MAIN_SPECIALTY_CODE<>'610'))
  11 - access(E.SPELL_ID=:B1)
  12 - filter((D.DIAGNOSIS_SEQUENCE=1 AND SUBSTR(D.DIAGNOSIS_CODE,1,1)<>'O'))
  13 - access(D.EPISODE_ID=E.EPISODE_ID)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - this is an adaptive plan
   - 6 Sql Plan Directives used for this statement
   -

Take note, by the way, that we’ve been told that the plan was “adaptive” and we’ve used 6 SQL Plan Directives, which would (probably) have been telling the optimizer to use dynamic sampling of the data in various ways to get to the best possible plan.

Given that the query starts as three separate query blocks and uses two of the tables twice it would have been nice to see the Query Block / Object Alias information for this query as that would have made it just a little easier to connect the plan to the query.

At first sight we can spot that we’ve got a hash join anti at operation 2 and a nested loop semi at operation 9 and given my earlier comments about how the optimizer can treat subqueries it would be easy to jump to the conclusion that the anti represented the “not exists” subquery and the semi was the “exists” subquery and work onwards from there. However operation 1 is a filter, and when we check the Predicate Information for operation 1 it holds the slighltly cryptic content “filter (IS NOT NULL)” this is an example of how filter subqueries “go missing” from the predicate information when you pull the execution plan from memory. Operation 1 is a multi-child filter with operation 2 and operation 9 as its child operations; for each row it receives from operation 2 it calls operation 9. If we could see the full predicate information for operation 1 we would see that it was the existence subquery. The nested loop semi isn’t the result of Oracle executing the existence subquery as a sem-join, it’s a semi-join that has appeared for some other reason – that we will pursue in a little while.

Let’s examine operation 2 (and its descendents) in detail.  It’s a hash join so it’s first child will be used as the build table and it’s second child will be used as the probe table.  The first child is a full tablescan (operation 4)  reading every partition of a range partitioned (operation 3)  table – IP_SPELLS table, which is the one table in the main body of the query. The second child is a view called VW_SQ_1 – an example of an internally named view that can appear when Oracle unnests, but doesn’t merge, a subquery – it represents the body of the “not exists” subquery. Oracle has used a hash join with IP_DIAGNOSIS as the build table and IP_EPISODES as the probe table.

To make things a little easier to comprehend I’ve created tables and indexes that let me emulate this query and plan, using the alias format option to report the query block names and fully qualified aliases that were finally used. Then I’ve edited the output to put the query block names and aliases beside the operation they refer to (rather than leaving the in a block under the body of the plan) and I’ve inserted line-breaks to help highlight the separate query blocks . This is the result:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Query Block  / Alias                |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |                                     |
|*  1 |  FILTER                               |                    | SEL$2B969259                        |
|*  2 |   HASH JOIN ANTI                      |                    |                                     |
|   3 |    PARTITION RANGE ALL                |                    |                                     |
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          | SEL$2B969259 / A@SEL$4              |

|   5 |    VIEW                               | VW_SQ_1            | SEL$8D33959D / VW_SQ_1@SEL$F49409E0 |
|*  6 |     HASH JOIN                         |                    | SEL$8D33959D                        |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       | SEL$8D33959D / D@SEL$1              |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        | SEL$8D33959D / E@SEL$1              |

|   9 |   NESTED LOOPS SEMI                   |                    | SEL$2B0A33EA                        |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        | SEL$2B0A33EA / E@SEL$3              |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  | SEL$2B0A33EA / E@SEL$3              |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | SEL$2B0A33EA / D@SEL$3              |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS | SEL$2B0A33EA / D@SEL$3              |
----------------------------------------------------------------------------------------------------------

As I’ve said before – if an execution plan is too complex to read in one go you can pick virtually any line of the plan and examine that line and its descendants in isolations as a way of getting started. Now I want to point out that if you can see the final query blocks this simply then any operations that starts a query block is a useful starting point for reading part of the plan because each query block has been optimized separately, and once you’ve understood how that block operates you can replace it (mentally) with a one line “here’s a rowsource”. So we might separate this plan into pieces as following:


----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Query Block  / Alias                |
----------------------------------------------------------------------------------------------------------
|   5 |    VIEW                               | VW_SQ_1            | SEL$8D33959D / VW_SQ_1@SEL$F49409E0 |
|*  6 |     HASH JOIN                         |                    | SEL$8D33959D                        |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       | SEL$8D33959D / D@SEL$1              |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        | SEL$8D33959D / E@SEL$1              |
----------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Query Block  / Alias                |
----------------------------------------------------------------------------------------------------------
|   9 |   NESTED LOOPS SEMI                   |                    | SEL$2B0A33EA                        |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        | SEL$2B0A33EA / E@SEL$3              |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  | SEL$2B0A33EA / E@SEL$3              |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | SEL$2B0A33EA / D@SEL$3              |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS | SEL$2B0A33EA / D@SEL$3              |
----------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Query Block  / Alias                |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |                                     |
|*  1 |  FILTER                               |                    | SEL$2B969259                        |
|*  2 |   HASH JOIN ANTI                      |                    |                                     |
|   3 |    PARTITION RANGE ALL                |                    |                                     |
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          | SEL$2B969259 / A@SEL$4              |
|   5 |    Rowsource "not exists"             | VW_SQ_1            | SEL$8D33959D / VW_SQ_1@SEL$F49409E0 |
|   9 |   Rowsource "exists"                  |                    | SEL$2B0A33EA                        |
----------------------------------------------------------------------------------------------------------

  • The first sub-plan is the unnested “not exists” subquery – which is a very simple plan to analyze.
  • The second sub-plan is the “exists” subquery – which is a very simple plan to analyze
  • The third sub-plan (with the substitutes for the other two query blocks) says: do a hash (anti) join between IP_SPELLS and the “not exists” rowsource and for each row that is produced create and check the “exists” rowsource to see if it produces a match, and forward the row if it does.

In this particular case the shape of the two subsidiary query blocks, and the pattern that we use to stitch the pieces together makes it look as if “first child first” applies across the whole plan. This, I think, is a key reason why people have had difficulty interpreting complex plans in the past. If every plan you’ve previously examined looks as if “first child first” always works you’re going to be baffled by a plan where the combination of query blocks seems to breaks that rule.

Putting the first pieces together, this is what happens as the FILTER operation calls its first child:

  • Operation 1 (FILTER) calls operation 2 (hash join anti) which calls operation 3 (partition range all) which calls operation 4 (table access full of IP_SPELLS)
  • Operation 4 is the first operation to generate a rowsource,which is passes up to operation 3.
  • Operation 3 immediately passes the rowsource on up to operation 2 (making operation 3 the second operation to generate a rowsource)
  • Operation 2 creates its build table then calls operation 5 (VIEW – its second rowsource) to supply the probe table.
  • Operation 5 calls operation 6 (hash join) which calls its first child, operation 7 (table scan full of IP_DIAGNOSIS)
  • Operation 7 is the third  operation to generate a rowsource, which it passes up to operation 6 to use as its build table.
  • Operation 6 creates its build table from the rowsource then calls operation 8 (table access full of IP_EPISODE)
  • Operation 8 is the fourth operation to generate a rowsource, which it passes up to operation 6 as its probe table
  • Operation 6 probes its build table as rows arrive from operation 8, and passes join results up to operation 5
  • Operation 6 is the fifth operation to generate a rowsource
  • Operation 5 immediately passes the rows up to its parent (operation 2), making it the sixth operaton to generate a rowsource.
  • Operation 2 is an “anti” hash join, so as each row arrives from operation 5 it probes its build table and if it finds a match it marks the row in the build table as one to be discarded. When operation 2 has exhausted the input from operation 5 it can pass any unmarked rows up to its parent (operation 1), making it the seventh operation to generate a rowsource.

So we’re now at operation 1 with an incoming rowsource which is all the rows in IP_SPELLS where a certain type of match in the IP_DIAGNOSIS and IP_EPISODE tables does not exist. This is how the order looks so far if we strip the plan to a minimum and edit in an “order” column:

----------------------------------------------------------------------------
| Id  | Operation                             | Name               | Order |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |       |
|*  1 |  FILTER                               |                    |       |
|*  2 |   HASH JOIN ANTI                      |                    |     7 |
|   3 |    PARTITION RANGE ALL                |                    |     2 |
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          |     1 |
|   5 |    VIEW                               | VW_SQ_1            |     6 |
|*  6 |     HASH JOIN                         |                    |     5 |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       |     3 |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        |     4 |
----------------------------------------------------------------------------

So now we come to the part of the plan that matches the “exists” subquery. For each row that operation 1 (FILTER) receives from its first child it calls operation 9 to run the “exists” subquery, which is a correlated subquery with a join between IP_EPISODES and IP_DIAGNOSIS. So why does Oracle use a nested loop join in the subquery rather than a hash join, and why. in particular, does that nested loop turn into a semi-join? What’s the difference between the “not exists” and the “exists”?

Imagine that for every row in IP_EPISODES there are 10 matching rows in IP_DIAGNOSIS.  When we check for “not exists” correlated to IP_EPISODES we will have to join to every single matching IP_DIAGNOSIS row because it might be the 10th which fails the test because it matches some extra filter predicate on IP_DIAGNOSIS. So the join is going to be a “high volume” join (hence a hash join is likely to become more appropriate).  On the other hand when we test for “exists” we may get lucky and only get as far as the first IP_DIAGNOSIS for an IP_EPISODES to prove existence – so it makes more sense to adopt a nested loop so that we can stop on the first match – and that, in fact, is why the nested loop in this case is a nested loop semi, it’s a code path specially designed to stop early.  (In fact, it’s also capable of using the same caching mechanism as scalar subquery caching so it can be even more efficient than just “stop on first match” – it can even stop before trying because “I’ve done that one before”).

So for each row that operation 1 (FILTER) receives from its first child it calls its second child (operation 9) to see if it will return a  row.

  • Operation 9 (nested loop semi) calls operation 10 (table access by rowid) which calls operation 11 (index range scan)
  • Operation 11 will be the eighth operation to return a rowsource, containing rowids, to operation 10
  • Operation 10 will be the ninth operation to return a rowsource, containing row data, to operation 9
  • For each row it receives operation 9 will call its second child (operation 12 (table access by rowid)) which calls operation 13 (index range scan)
  • Operation 13 will be the tenth operation to return a rowsource, containing rowids, to operation 12
  • Operation 12 will be the eleventh operation to return a rowsource, containing row data, to operation 9
  • Operation 9 will stop on the first row it receives and pass it up to opereation 1, making operation 9 the twelfth operation to return a rowsource
  • Operation 1, if it receives a row from operation 9, will pass the current row to the client interface, making it the thirteenth operation to return a rowsource.

So the final ordering is:

----------------------------------------------------------------------------
| Id  | Operation                             | Name               | Order |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |       |
|*  1 |  FILTER                               |                    |    13 |
|*  2 |   HASH JOIN ANTI                      |                    |     7 |
|   3 |    PARTITION RANGE ALL                |                    |     2 |
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          |     1 |
|   5 |    VIEW                               | VW_SQ_1            |     6 |
|*  6 |     HASH JOIN                         |                    |     5 |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       |     3 |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        |     4 |
|   9 |   NESTED LOOPS SEMI                   |                    |    12 |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     9 |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     8 |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       |    11 |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |    10 |
----------------------------------------------------------------------------

Numbers

As a closing point it’s worth picking up a couple of numbers.

One number that raised the original question is the E-Rows of 2688K at operation 12 – it’s obviously wrong give that it’s supposed to be the number of rows you get from the table after getting 5 rowids from the index range scan at operation 13. Odd little glitches like this seem to appear from time to time as new internal features are tweaked and odd bits of code aren’t made totally consistent. It’s a pity that it’s there, but if an odd cardinality glitch hasn’t been echoed into the cost (which could result in a change in execution plan) it’s not really important. Since I created a model of this query to get at the query block names I’ve also run the model against 19.3 and the error is still present in that version.

You will note that the E-rows at operation 2 is 156 – when you compare it with the 15,592 for operation 3 you can see that it’s a classic 1% guess (in this case for the effects of a “not exists” subquery).

It’s also significant that the E-Rows for operation 3 is only 15,592 when the A-Rows is 425K: the error is fairly large and if there’s a performance problem with this query this error might be the first point of investigation. Maybe we need a histogram on administrative_category_code, or need to do something to help Oracle with the (1% guess from) substr(admission_method_code,1,1) – but we may have some trouble because you can’t mix and match virtual columns and column groups in 12.1.0.2

 

April 24, 2020

Conversion Errors

Filed under: Oracle — Jonathan Lewis @ 11:03 am BST Apr 24,2020

I’ve been meaning to write this note for at least three years and was prompted to write up my draft notes this morning as a follow-up to yesterday’s note on the perils of applying a to_date() function to a date column. But then I took a look at the most recent questions on the Oracle Developer Forum and discovered that Tim Hall (@oraclebase) had (inevitably) already done the necessary write-up, so I’ve just left a brief note here (more for my own benefit than anything else) of the highlights with a link to his page.

Key features available in 12.2 to avoid conversion errors are:

  • The validate_conversion() function that returns a 1 or 0 depending whether on not an expression can be converted successfully to a specific type using a particular – returns null if the expression evaluates to null.
  • Extension to generic conversion functions (e.g. to_date()) that allow a “default” value to be used to replace the supplied value if the attempted conversion raises a conversion error.

Code Sample – report rows with a valid date (in French) that is earlier than the start of this year’s (English) tax year:


rem
rem     Script:         validate_conversion.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1 (v1 varchar2(42));
insert into t1 values('15-June-2016');
insert into t1 values('15-Juin-2016');
commit;

prompt  ==================================================================
prompt  Single predicate test - only rows that validate as dates in French
prompt  ==================================================================

select  *
from    t1
where   validate_conversion(v1 as date, 'dd-month-yyyy' , 'nls_date_language=french') = 1
/

prompt  ===========================================================
prompt  Valid French dates that are before 6th April 2017 (English)
prompt  ===========================================================

select
        *
from    t1
where
        validate_conversion(v1 as date, 'dd-month-yyyy' , 'nls_date_language=french') = 1
and     to_date(v1, 'dd-month-yyyy' , 'nls_date_language=french') < to_date('06-Apr-2017','dd-mon-yyyy')
/

prompt  =========================================
prompt  Repeat the above with reversed predicates
prompt  =========================================

select
        *
from    t1
where
        to_date(v1, 'dd-month-yyyy' , 'nls_date_language=french') < to_date('06-Apr-2017','dd-mon-yyyy')
and     validate_conversion(v1 as date, 'dd-month-yyyy' , 'nls_date_language=french') = 1
/

prompt  ====================================================
prompt  Repeat the above but force the order of evaluation
prompt  This will raise error ORA-01843: not a invalid month
prompt  ====================================================

select
        /*+ ordered_predicates */
        *
from    t1
where
        to_date(v1, 'dd-month-yyyy' , 'nls_date_language=french') < to_date('06-Apr-2017','dd-mon-yyyy')
and     validate_conversion(v1 as date, 'dd-month-yyyy' , 'nls_date_language=french') = 1
/

prompt  ========================================================================
prompt  Handle the requirement with the 12.2 extended "to_date()" functionality
prompt  ========================================================================

select
        *
from
        t1
where
        to_date(
                v1 default '06-Avril-2017' on conversion error,
                'dd-month-yyyy',
                'nls_date_language=french'
        ) < to_date(
                '06-Apr-2017',
                'dd-mon-yyyy',
                'nls_date_language=English'
        )
/

Is there any guarantee that a validate_conversion() function will always be called before a call that attempts to use the conversion in another predicate. I doubt it, there’s no indication in the manuals and no general guarantee from Oracle about order of execution of predicates, so I wouldn’t risk it. Maybe the only safe use would be a CASE expression (which short-circuits) with a “when successful then” clause for evaluation of the actual conversion. You’d need to take a little extra care to remember to handle nulls correctly.

Maybe replace the table with a non-mergeable inline view that eliminates the failures? Possibly not, the optimizer might still do a simple filter pushdown.

 

 

 

 

April 23, 2020

date_to_date

Filed under: Execution plans,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:00 pm BST Apr 23,2020

Every now and again someone posts a piece of SQL on the Oracle Developer Forum that includes a predicate with an expression like to_date(date_column). This is a problem for several reasons – not the least being the type of performance problem that showed up in a post from a couple of years back that has just been resurrected.

Before I examine the performance detail, here’s a simple demo of the “wrong data” problem that can go unnoticed, cut-n-paste from a 12.2.0.1 session of SQL*Plus:


SQL> create table t1 (d1 date);

Table created.

SQL> insert into t1 values(sysdate);

1 row created.

SQL> select * from t1 where d1 = to_date(d1);

no rows selected

SQL> alter session set nls_date_format = 'dd-mon-yyyy hh24:mi:ss';

Session altered.

SQL> select * from t1 where d1 = to_date(d1);

D1
--------------------
22-apr-2020 15:12:36

1 row selected.


Note particularly how changing the nls_date_format can change the result of a query! (There’s another simple example on the referenced blog note.)

What’s going on? to_date(date_col) is equivalent to to_date(to_char(date_col)) using the nls_date_format to do the two conversions, and the most common default format is one that truncates the date column to date-only. So in may systems  to_date(date_col) is nearly (though doing it an expensive way) the same as trunc(date_col).

But let’s go further and see how we wreck the benefit of an index, even if we’ve made sure that we still get the correct results. The following is a minimalist model of a common billing requirement: conversion between currencies:


rem
rem     Script:         date_to_date.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        'GBP'                           from_currency,
        'USD'                           to_currency,
        trunc(sysdate - 1000) + rownum  conversion_date,
        'Corporate'                     conversion_type,
        round(
                1.25 + dbms_random.value/4,
                6
        )                               conversion_rate,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1
where
        rownum <= 1000 -- > comment to avoid WordPress format issue
;

create unique index t1_i1 on t1(from_currency, to_currency, conversion_date, conversion_type)
/
alter table t1 add constraint t1_pk primary key(
        from_currency, to_currency, conversion_date, conversion_type
)
/

create table driver(
        invoice_currency        varchar2(3),
        billing_currency        varchar2(3),
        client_type             varchar2(10),
        invoice_date            date
);

insert into driver values(
        'GBP', 'USD','Corporate',trunc(sysdate)
);

commit;

execute dbms_stats.gather_table_stats(null,'driver');

I’ve created table t1 to model the exchange rates between US dollars and UK pounds over a range of about three years ending “today”. There’s also a conversion_type column in the unique key to this table that allows us to have multiple reasons for exchanges, allowing multiple exchange rates on the same day. I really ought to have a check constraint on this table that says something like: check (conversion_date = trunc(conversion_date)).

I’ve also created a “driver” table that holds the data that might be exactly the data we need to extract an exchange rate for a single invoice. So let’s run the SQL that gets the appropriate exchange rate for this one invoice:


set serveroutput off
alter session set statistics_level = all;

prompt  =====================================
prompt  First run with simple date comparison
prompt  =====================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     t1.conversion_date = driver.invoice_date
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  =================================
prompt  Now run with to_date(date_column)
prompt  =================================

select
        /*+
                leading(driver t1)
                use_nl_with_index(t1)
        */
        driver.*,
        t1.conversion_rate
from
        driver, t1
where
        t1.from_currency = driver.invoice_currency
and     t1.to_currency = driver.billing_currency
and     t1.conversion_type = driver.client_type
and     to_date(t1.conversion_date) = to_date(driver.invoice_date)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

In the first case I’ve used the correct join predicate between these tables; in the second I’ve put in a redundant to_date() function call at both ends of the predicate. (If you think this is unrealistic – it’s an exact match for the production code I reported in the blog note I cited above).

Here are the two execution plans – with their rowsource execution stats:

----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      10 |      8 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      10 |      8 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |       9 |      8 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX UNIQUE SCAN         | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       2 |      8 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_DATE"="DRIVER"."INVOICE_DATE"
              AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")


----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |      1 |00:00:00.01 |      15 |      4 |
|   1 |  NESTED LOOPS                |        |      1 |      1 |      1 |00:00:00.01 |      15 |      4 |
|   2 |   NESTED LOOPS               |        |      1 |      1 |      1 |00:00:00.01 |      14 |      4 |
|   3 |    TABLE ACCESS FULL         | DRIVER |      1 |      1 |      1 |00:00:00.01 |       7 |      0 |
|*  4 |    INDEX RANGE SCAN          | T1_I1  |      1 |      1 |      1 |00:00:00.01 |       7 |      4 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |      1 |      1 |00:00:00.01 |       1 |      0 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."FROM_CURRENCY"="DRIVER"."INVOICE_CURRENCY" AND
              "T1"."TO_CURRENCY"="DRIVER"."BILLING_CURRENCY" AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE")
       filter((TO_DATE(INTERNAL_FUNCTION("T1"."CONVERSION_DATE"))=TO_DATE(INTERNAL_FUNCTION("DRIVE
              R"."INVOICE_DATE")) AND "T1"."CONVERSION_TYPE"="DRIVER"."CLIENT_TYPE"))

Three things to take note of:

  1. The index unique scan at operation 4 has changed to an index range scan.
  2. The predicate information for operation 4 has changed from a pure access predicate to an access predicate plus a very messy filter predicate
  3. The range scan that now appears at operation 4 gets 7 buffers (that’s one root block and 6 leaf blocks) to find the one rowid we need, and will have applied the messy filter predicate to all 1,000 index entries where the two currency codes were GBP/USD.

In the case of the older article the problem query was spending almost all of its time in a join like this, doing 59 buffer gets to find a single rowid for every invoice presented – possibly doing thousands of comparisons as it went.

 

April 22, 2020

Eureka!

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:08 pm BST Apr 22,2020

I woke up last night with a brilliant solution to a problem that’s been bugging me for more than a year. How does a call to report_sql_monitor() manage to produce output like this:

SQL Plan Monitoring Details (Plan Hash Value=4262489872)
======================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read  | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs  | Bytes |   (%)    |   (# samples)   |
======================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        13 |     +2 |     1 |        1 |       |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  3 |     NESTED LOOPS                |       |      5M |  14M |        13 |     +2 |     1 |       4M |       |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      5M | 4686 |        13 |     +2 |     1 |       4M |  1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        14 |     +1 |    5M |       4M |  5494 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        13 |     +2 |    5M |       4M | 17743 | 260MB |          |                 |
======================================================================================================================================================


If you’re wondering why this is a problematic output take a look at the A-Rows column for operation 4 (and 5), and the Execs colukmn for operations 5 (and 6). Given the nature of a nested loop join the number of Execs of operation 5 should match the number of rows (A-rows) generated by operation 4, and the number of Execs of operation 6 should match the number of rows generated by operation 5.

But Oracle claims to have generated 4 million rows then executed 5 million times in both cases. How does that happen?

The very simple thought I had last night was “formatting” – if the actual value is 4,500,000 maybe the A-rows rounds down and the Execs rounds up, so I set up a little test to check the hypothesis. Since the version where I’d first seen the anomaly was 12.1.0.2 that’s the version I tested first.

rem
rem     Script:         report_sql_monitor_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem 

drop table t2 purge;
drop table t1 purge;

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',25,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4500100 -- > comment to avoid WordPress format issue
/

create index t1_i1 on t1(id);

create table t2 as select * from t1;
create index t2_i1 on t2(id);

alter system flush shared_pool;
alter system flush buffer_cache;

select  /*+
                monitor
                qb_name(main)
                leading(@inline t1@inline t2@inline)
                use_nl_with_index(@inline t2@inline)
        */
        count(n2)
from    (
        select
                /*+
                        qb_name(inline)
                */
                t1.n1, t2.n1 n2
        from    t1, t2
        where   t1.id <= 4499999
--      where   t1.id <= 4500000
--      where   t1.id <= 4500001
        and     t2.id =  t1.n1
        )
;

prompt  =======================================
prompt  Now run report_sql_monitor.sql from SYS
prompt  =======================================

You’ll note that I’ve allowed three possible values for the number of rows I get from t1 to drive the index probe into t2 – I thought I’d have to do several experiments to make the oddity appear. In fact it appeared on the very first attempt: A-Rows reported 4M, and Execs report a clearly incorrect 5M. (Repeating the test on 12.2.0.1 the anomaly disappeared – all the output reported 4M for 4,499,999; and 5M for 4,500,000.)

You’ll notice the “flush buffer_cache” command in this script, this came about because of some secondary testing. Because I wanted to check a few variations I commented out the “drop table” commands. The consequence of not dropping the tables was that the anomaly disappeared – in some way it was connected to reading blocks from disc and didn’t appear once the table and index were cached. In fact when I set the driving count to 4,998,000 in one of my tests I got a result that looked slightly more counter-intuitive than the original:

=====================================================================================================================================================
| Id |            Operation            | Name  |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                                 |       | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
=====================================================================================================================================================
|  0 | SELECT STATEMENT                |       |         |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  1 |   SORT AGGREGATE                |       |       1 |      |        11 |     +2 |     1 |        1 |      |       |          |                 |
|  2 |    NESTED LOOPS                 |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  3 |     NESTED LOOPS                |       |      4M |  14M |        11 |     +2 |     1 |       4M |      |       |          |                 |
|  4 |      TABLE ACCESS FULL          | T1    |      4M | 4686 |        11 |     +2 |     1 |       4M | 1050 | 260MB |          |                 |
|  5 |      INDEX RANGE SCAN           | T2_I1 |       1 |    2 |        12 |     +1 |    4M |       4M |  345 |  83MB |          |                 |
|  6 |     TABLE ACCESS BY INDEX ROWID | T2    |       1 |    3 |        11 |     +2 |    5M |       4M | 1048 | 262MB |          |                 |
=====================================================================================================================================================


Notice how I’ve (“correctly”) acquried 4M index rowids at operation 5, but (apparently) managed to access the table 5M times!

As I noted above, the oddity seems to have disappeared in 12.2.0.1 so it’s not worth pursuing further. I think it might relate either to Oracle double counting the Exec when its first access is a phyiscal read, or maybe it’s managing to increment the wrong counter when it visits a space management block to find the next batch of blocks to read. The detail is really not very important.

What might be important, though, is that sometimes the number of Execs you see is truthfully greater than ought to be possible because of the strange effects that can appear with read-consistency. For a very long time I believed that the oddity I’ve show here was something to do with read-consistency and query restart, but I couldn’t come up with a hypothesis to explain how that could be happening. So I was very relieved last night to come up with a simple explanation (which actually turned out to be wrong, but did at least help me recognise how the anomaly could happen in perfectly harmless circumstances).

Footnote:

What’s the difference between 4 million and 5 million – sometimes it’s one (just one, not one million). When testing under 12.2.0.1 I only had to add one row to the driving rowsource (changing 4,499,999 to 4,500,000) to see the A-rows and Execs change from 4M to 5M. If you forget the implicit rounding errors in reports like this you can end up chasing a tiny anomaly because it looks like a huge anomaly.

 

April 20, 2020

Execution Plans

Filed under: Execution plans,Oracle — Jonathan Lewis @ 3:37 pm BST Apr 20,2020

One of the most important skills needed when investigating badly performing SQL is the ability to read Execution Plans. It’s a topic I’ve written and spoken about frequently – even to the extent of doing full-day seminars – but there’s always scope for finding another way of presenting the method.

This is a note based on a few introductory Powerpoint slides I created for the (sadly cancelled) Polish OUG Workshop and Tanel Poder’s Virtual Conference taking a slightly different approach from one I normally use to get people stated on interpreting (serial) execution plans.

I want to begin with a query, rather than a plan:

rem
rem     Script:         poug_plan_01a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

explain plan for
select
        /*+
                qb_name(main)
        */
        ord.id,
        ord.valuation,
        ord.status,
        (select /*+ qb_name(company) */ max(com.name)  from companies  com  where com.id  = ord.id_company) company,
        (select /*+ qb_name(product) */ max(prd1.name) from products   prd1 where prd1.id = orl.id_product) product,
        orl.quantity
from
        orders          ord,
        order_lines     orl
where
        ord.date_placed > trunc(sysdate) - 7
and     orl.id_ord = ord.id
and     orl.id_product in (
                select  /*+ qb_name(class) */
                        prd2.id
                from    products prd2
                where   prd2.class = 'Group25'
        )
/

I’ve got a query with 4 “query blocks” – every time you see the key words “select”, “insert”, “merge”, etc. that’s a query block, and it’s very helpful to adopt a habit of naming the query blocks in your SQL statements, as I have done here, using the the qb_name() hint.

What’s the optimizer going to do with this query, and how is Oracle going to run the query?

There are a number of possibilities – some of them version dependent, some dictated by the statistics Oracle has about the data, some dictated by the available indexes and constraints, but there’s one key feature that will come into play: the optimizer “likes” statements that consist of a single query block because it’s easy to optimize statements of that shape, so the optimizer will attempt to transform this query in ways that will reduce the number of query blocks it has to handle.

In newer versions of Oracle the optimizer uses “cost based query transformation” almost everywhere, but in older versions of Oracle a number of the transformation were “heuristic” (i.e.: “if I see an X I’m going to transform it into a Y”). Cost based query transformation means the optimizer will try to  work out the efficiency of applying a transformation and may then decide not to do it.

Heuristic Transformations: Sometimes the optimizer trace file (10053 trace) will tell you that it has performed a heuristic transformation by printing  a message like (e.g.):  SU: Performing unnesting that does not require costing.

I’m going to do two things with this query – first I’ll tell the optimizer that it should not do any cost-based transformations, then I’ll give it free rein to do whatever it fancies. All I have to do for the first case is add the hint /*+ no_query_transformation */ to the query, which gets me following execution plan under 12.2.0.1:


select * from table(dbms_xplan.display(null,null,'alias'));

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    34 |  3567   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE              |             |     1 |    35 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| COMPANIES   |     1 |    35 |     2   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | COM_PK      |     1 |       |     1   (0)| 00:00:01 |
|   4 |  SORT AGGREGATE              |             |     1 |    35 |            |          |
|   5 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    35 |     2   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 |
|*  7 |  FILTER                      |             |       |       |            |          |
|*  8 |   HASH JOIN                  |             |  3500 |   116K|    52  (24)| 00:00:01 |
|*  9 |    TABLE ACCESS FULL         | ORDERS      |   700 | 16100 |    12  (34)| 00:00:01 |
|  10 |    TABLE ACCESS FULL         | ORDER_LINES | 50000 |   537K|    37  (14)| 00:00:01 |
|* 11 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    12 |     2   (0)| 00:00:01 |
|* 12 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - COMPANY
   2 - COMPANY / COM@COMPANY
   3 - COMPANY / COM@COMPANY
   4 - PRODUCT
   5 - PRODUCT / PRD1@PRODUCT
   6 - PRODUCT / PRD1@PRODUCT
   7 - MAIN
   9 - MAIN    / ORD@MAIN
  10 - MAIN    / ORL@MAIN
  11 - CLASS   / PRD2@CLASS
  12 - CLASS   / PRD2@CLASS

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("COM"."ID"=:B1)
   6 - access("PRD1"."ID"=:B1)
   7 - filter( EXISTS (SELECT /*+ QB_NAME ("CLASS") */ 0 FROM "PRODUCTS" "PRD2"
              WHERE "PRD2"."ID"=:B1 AND "PRD2"."CLASS"='Group25'))
   8 - access("ORL"."ID_ORD"="ORD"."ID")
   9 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-7)
  11 - filter("PRD2"."CLASS"='Group25')
  12 - access("PRD2"."ID"=:B1)

I’ll say more about this plan later, but for the moment I want to pick up two key points.

  • First, you can almost see the 4 query blocks in the body of execution plan. The join between orders and order_lines appears in operations 8 – 10; the query for max(companies.name) is highly visible in operations 1 – 3, as is the query for max(products.name) in operations 4 – 6; but the “IN” subquery is slightly less obvious in the combination of the FILTER operation at operation 7 and the indexed access in operations 11 and 12. Looking at operation 7 in the Predicate Information section of the plan you will note that the “IN” subquery has changed to an “EXISTS” subquery despite the hint that the optimizer shouldn’t do any cost based query transformation – and that’s because this was a heuristic transformation, not a costed transformation, the optimizer did it because it could!
  • The second thing I particularly want to draw your attention to is the section labelled Query Block Name / Object Alias – which appears as the result of including the ‘alias’ format option in my call to dbms_xplan.display(). This gives us two bits of information: for each operation in the plan (except for a few gaps) it has told us which “final” (also referred to as an “outline_leaf”) query block owns that operation; similarly, for any operation involving an object name in the plan it has given us the “fully qualified” alias, i.e. the underlying object alias combined with the query block where the alias was introduced.

For complex plans this alias information is so useful that it’s a bit of a shame that it doesn’t appear in the body of the plan when you ask for it – so before I go on I’m going to edit the output to show it the way I’d like to see it:


-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Query Block | Alias        |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    34 |  3567   (1)| 00:00:01 |             |              |
|   1 |  SORT AGGREGATE              |             |     1 |    35 |            |          | COMPANY     |              |
|   2 |   TABLE ACCESS BY INDEX ROWID| COMPANIES   |     1 |    35 |     2   (0)| 00:00:01 | COMPANY     | COM@COMPANY  |
|*  3 |    INDEX UNIQUE SCAN         | COM_PK      |     1 |       |     1   (0)| 00:00:01 | COMPANY     | COM@COMPANY  |
|   4 |  SORT AGGREGATE              |             |     1 |    35 |            |          | PRODUCT     |              |
|   5 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    35 |     2   (0)| 00:00:01 | PRODUCT     | PRD1@PRODUCT |
|*  6 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 | PRODUCT     | PRD1@PRODUCT |
|*  7 |  FILTER                      |             |       |       |            |          |             |              |
|*  8 |   HASH JOIN                  |             |  3500 |   116K|    52  (24)| 00:00:01 | MAIN        |              |
|*  9 |    TABLE ACCESS FULL         | ORDERS      |   700 | 16100 |    12  (34)| 00:00:01 | MAIN        | ORD@MAIN     |
|  10 |    TABLE ACCESS FULL         | ORDER_LINES | 50000 |   537K|    37  (14)| 00:00:01 | MAIN        | ORL@MAIN     |
|* 11 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    12 |     2   (0)| 00:00:01 | CLASS       | PRD2@CLASS   |
|* 12 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 | CLASS       | PRD2@CLASS   |
-------------------------------------------------------------------------------------------------------------------------

In this initial example the extra columns don’t really add much value – although my alias column does show us very clearly that PRODUCTS (the Name) at operation 11 comes from the CLASS subquery while PRODUCTS at operation 5 comes from the scalar subquery named PRODUCT. Such things are not always so obvious (some readers may immediately think of problems seeing multiple copies of table gl_code_combinations in very long execution plans).

Let’s move on to the second call to the query where we let the 12.2 optimizer do whatever it thinks best with my query – this time I’ll go straight to the edited version of the plan (and also leave the Query Block/Alias section in place):


-------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Query Block  | Alias        |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |    71 |  8236 |    89  (18)| 00:00:01 |              |              |
|*  1 |  HASH JOIN OUTER      |             |    71 |  8236 |    89  (18)| 00:00:01 | SEL$040EE093 |              |
|*  2 |   HASH JOIN OUTER     |             |    71 |  5751 |    75  (19)| 00:00:01 |              |              |
|*  3 |    HASH JOIN          |             |    71 |  3266 |    65  (20)| 00:00:01 |              |              |
|*  4 |     TABLE ACCESS FULL | ORDERS      |   700 | 16100 |    12  (34)| 00:00:01 | SEL$040EE093 | ORD@MAIN     |
|*  5 |     HASH JOIN         |             |  1008 | 23184 |    53  (17)| 00:00:01 |              |              |
|*  6 |      TABLE ACCESS FULL| PRODUCTS    |   200 |  2400 |    13   (8)| 00:00:01 | SEL$040EE093 | PRD2@CLASS   |
|   7 |      TABLE ACCESS FULL| ORDER_LINES | 50000 |   537K|    37  (14)| 00:00:01 | SEL$040EE093 | ORL@MAIN     |
|   8 |    TABLE ACCESS FULL  | COMPANIES   | 10000 |   341K|    10  (10)| 00:00:01 | SEL$040EE093 | COM@COMPANY  |
|   9 |   TABLE ACCESS FULL   | PRODUCTS    | 10000 |   341K|    13   (8)| 00:00:01 | SEL$040EE093 | PRD1@PRODUCT |
-------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$040EE093
   4 - SEL$040EE093 / ORD@MAIN
   6 - SEL$040EE093 / PRD2@CLASS
   7 - SEL$040EE093 / ORL@MAIN
   8 - SEL$040EE093 / COM@COMPANY
   9 - SEL$040EE093 / PRD1@PRODUCT

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("PRD1"."ID"(+)="ORL"."ID_PRODUCT")
   2 - access("COM"."ID"(+)="ORD"."ID_COMPANY")
   3 - access("ORL"."ID_ORD"="ORD"."ID")
   4 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-7)
   5 - access("ORL"."ID_PRODUCT"="PRD2"."ID")
   6 - filter("PRD2"."CLASS"='Group25')

Note
-----
   - this is an adaptive plan

It’s a little unsatisfactory that some of the lines in the plan don’t record a query block name at all – however it is still fairly obvious that the entire query has been collapsed into a single query block that is nothing more than a five-table hash join. To get this single query block the optimizer has “unnested” three subqueries and eliminated two ‘group by’s. It’s when the optimzer transforms the query this much that the alias information can be so useful – how easy would it be to decide which occurrence of PRODUCTS at operations 6 and 9 corresponded to which part of the original query?

First Child First – Recursive Descent (FCF)

This is where we finally get to the key rule for reading a (serial) execution plan. A (section of a) plan involving a single query block obeys the sound-bite: “first child first – recursive descent”. (or FCF for short)

For a plan involving multiple query blocks – each query block individually follows the rule, and then you have to learn specific rules for stitching together multiple query blocks. I’ll supply examples of possible stitch-ups over the next few weeks. For now I’ll just show “FCF” applied to the second plan, then take a slightly closer look at the first plan.

Each operation in a query block produces (or “is”, or “represents”) a rowsource. When we talk about “the order of execution” of a plan there’s an implicit interpretation of the phrase to mean “the order in which the operations produce their rowsource”.

FCF encapsulates the idea that to produce a rowsource an operation calls each of its child operations in turn to produce their rowsources then takes some action to combine the child rowsources. The order in which the child operations are called is, in the first case, exactly the order in which the child operations are printed in the execution plan.

There are two important details to add to this.

  • First, although a parent operation will call its child operations in turn, the parent may cycle through the child operations multiple times. The obvious example of this is the nested loop join where the parent will fetch a row from its first child then fetch (zero or more) rows from its second child, then fetch the next row from its first child then call its second child again, and repeat until is has consumed the entire rowsource from the first child.
  • Secondly, an operation may be blocking or non-blocking, and it’s worth remembering this when thinking about whether you want to tune a query for latency (time to first row(s) returned) or throughput (time to last row returned). Some operations will have to create their entire rowsource before they pass anything up to their parent (blocking); some operations will create and pass their rowsource piece by piece on demand.

Let’s apply FCF to the single block plan – which I’ll repeat here without the predicate and alias information:

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |    71 |  8236 |    89  (18)| 00:00:01 |
|*  1 |  HASH JOIN OUTER      |             |    71 |  8236 |    89  (18)| 00:00:01 |
|*  2 |   HASH JOIN OUTER     |             |    71 |  5751 |    75  (19)| 00:00:01 |
|*  3 |    HASH JOIN          |             |    71 |  3266 |    65  (20)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL | ORDERS      |   700 | 16100 |    12  (34)| 00:00:01 |
|*  5 |     HASH JOIN         |             |  1008 | 23184 |    53  (17)| 00:00:01 |
|*  6 |      TABLE ACCESS FULL| PRODUCTS    |   200 |  2400 |    13   (8)| 00:00:01 |
|   7 |      TABLE ACCESS FULL| ORDER_LINES | 50000 |   537K|    37  (14)| 00:00:01 |
|   8 |    TABLE ACCESS FULL  | COMPANIES   | 10000 |   341K|    10  (10)| 00:00:01 |
|   9 |   TABLE ACCESS FULL   | PRODUCTS    | 10000 |   341K|    13   (8)| 00:00:01 |
-------------------------------------------------------------------------------------

We start from the top – ignoring the “operation 0 SELECT” which isn’t really a plan operation.

  • Operation 1 is a “hash join outer” – we expect exactly two child operations, the first will supply the “build” table to the hash join, the second will supply the “probe” table. We will have to get the entire rowsource from the first child before calling the second child to start supplying its rowsource. Looking at the text indents, the first child is operation 2, the second is operation 9 so we have to start by calling operation 2.
  • Operation 2 is a “hash join outer” – so, again, we expect it to have two child operations that it will use to construct a rowsource to pass up to its parent. Its first child is operation 3, its second child is operation 8 so we have to start by calling operation 3.
  • Operation 3 is a “hash join” – so, again, we expect it to have two child operations that it will use to construct a rowsource to pass up to its parent. Its first child is operation 4, its second child is operation 5 so we have to start by calling operation 4.
  • Operation 4 is a “table access full” of orders – which is an operation that has no child operations and can immediately start to return a rowsource to its parent. Operation 4 is the first thing that “happens” in this plan.
  • After acquiring the entire rowsource from operation 4 and building an in-memory hash table from it, operation 3 can now call its second child (operation 5) to supply a rowsource to probe the hash table.
  • But operation 5 is a “hash join” – so, yet again, we expect to see two child operations that it will use to construct the rowsource it has to pass up to its parent. Its first child is operation 6, its second child is operation 7 so we have to start by calling operation 6
  • Operation 6 is a “table access full” of products – which is an operation that has no child operations and can immediately start to return a rowsource to its parent. Operation 6 is the second thing that “happens” in this plan.
  • After acquiring the entire rowsource from operation 6 and creating a build table from it, operation 5 can now call its second child (operation 7) to supply a rowsource to use as the probe table.
  • Operation 7 is a “table access full” of order_lines – which is an operation that has no child operations and can immediately start to return a rowsource to its parent. Operation 7 is the third thing that “happens” in this plan
  • Operation 5 can now use the rowsource from operation 7 to probe its build table and start passing the results up to its parent (operation 3) So operation 5 is the fourth thing that “happens” in this plan.
  • Operation 3 can now use the rowsource from operation 5 (its second child) to probe the build table it created from operation 4 and start passing the results up to its parent (operation 2). So operation 3 is the fifth thing that “happens” in this plan..
  • Operation 2 can now absorb the entire row source from operation 3 (its first child) to construct its build table, after which it has to call its second child (operation 8) to acquire a rowsource to use as its probe table.
  • Operation 8 is a “table access full” of companies – which is an operation that has no child operations and can immediately start to return a rowsource to its parent (operation 2). Operation 8 is the sixth thing that “happens” in this plan.
  • Operation 2 can now use the rowsource from operation 8 to probe its build table and start passing the results up to its parent (operation 1) So operation 2 is the seventh thing that “happens” in this plan.
  • Operation 1 can now absorb the entire row source from operation 2 (its first child) to construct its build table, after which it has to call its second child (operation 9) to acquire a rowsource to use as its probe table.
  • Operation 9 is a “table access full” of products – which is an operation that has no child operations and can immediately start to return a rowsource to its parent (operation 1). Operation 9 is the eighth thing that “happens” in this plan.
  • Operation 1 can now use the rowsource from operation 9 to probe its build table and start passing the results up to its parent (“operation 0″/ end-user) So operation 1 is the ninth thing that “happens” in this plan.

Extracting the ordering information the order of operation is:

  • Operation 4 is the first thing that happens
  • Operation 6 is the second thing that happens
  • Operation 7 is the third thing that happens
  • Operation 5 is the fourth thing that happens
  • Operation 3 is the fifth thing that happens
  • Operation 8 is the sixth thing that happens
  • Operation 2 is the seventh thing that happens
  • Operation 9 is the eighth thing that happens
  • Operation 1 is the ninth thing that happens

The rule is very simple but the process seems very long-winded when you have to write down every step completely (fortunately it doesn’t take much practice for this to become a rapid mental exercise in many cases). Start at the top and keep repeating (recursive descent) “what’s the first child” until you get to the operation that doesn’t have any child operations – that’s the first operation in the order of execution. Then back up one step to see if there’s a second (then third, then …) child to consider. and repeat the process of descending and ascending.

There’s one further important observation to make about this particular case.  The order of operation started with the orders table. The first two tables to be joined were the products and order_lines tables. The optimizer has produced a plan where the order of operation does not match the join order. This is an artefact of the optimizer’s ability to “swap join inputs” for hash joins when evaluating a join order.

Multiple Query Blocks

I’ll finish with a brief look at the plan I got when I blocked cost-based query transformation.  This time I’ll show the plan with the edited query block names still in place:


-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     | Query Block | Alias        |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |    34 |  4017   (1)| 00:00:01 |             |              |
|   1 |  SORT AGGREGATE              |             |     1 |    35 |            |          | COMPANY     |              |
|   2 |   TABLE ACCESS BY INDEX ROWID| COMPANIES   |     1 |    35 |     2   (0)| 00:00:01 | COMPANY     | COM@COMPANY  |
|*  3 |    INDEX UNIQUE SCAN         | COM_PK      |     1 |       |     1   (0)| 00:00:01 | COMPANY     | COM@COMPANY  |
|   4 |  SORT AGGREGATE              |             |     1 |    35 |            |          | PRODUCT     |              |
|   5 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    35 |     2   (0)| 00:00:01 | PRODUCT     | PRD1@PRODUCT |
|*  6 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 | PRODUCT     | PRD1@PRODUCT |
|*  7 |  FILTER                      |             |       |       |            |          |             |              |
|*  8 |   HASH JOIN                  |             |  3950 |   131K|    52  (24)| 00:00:01 | MAIN        |              |
|*  9 |    TABLE ACCESS FULL         | ORDERS      |   790 | 18170 |    12  (34)| 00:00:01 | MAIN        | ORD@MAIN     |
|  10 |    TABLE ACCESS FULL         | ORDER_LINES | 50000 |   537K|    37  (14)| 00:00:01 | MAIN        | ORL@MAIN     |
|* 11 |   TABLE ACCESS BY INDEX ROWID| PRODUCTS    |     1 |    12 |     2   (0)| 00:00:01 | CLASS       | PRD2@CLASS   |
|* 12 |    INDEX UNIQUE SCAN         | PRD_PK      |     1 |       |     1   (0)| 00:00:01 | CLASS       | PRD2@CLASS   |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("COM"."ID"=:B1)
   6 - access("PRD1"."ID"=:B1)
   7 - filter( EXISTS (SELECT /*+ QB_NAME ("CLASS") */ 0 FROM "PRODUCTS" "PRD2"
              WHERE "PRD2"."ID"=:B1 AND "PRD2"."CLASS"='Group25'))
   8 - access("ORL"."ID_ORD"="ORD"."ID")
   9 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-7)
  11 - filter("PRD2"."CLASS"='Group25')
  12 - access("PRD2"."ID"=:B1)

Under the Query Block column you can see the four separate blocks clearly labelled, and if you use the column to split the plan into 4 separate pieces (which will lose you operation 0 and operation 7) then you should be able to work out what’s happening in the 4 separare sub-plans quite easily.

After that it’s necessary to work out how these 4 subplans should be connected, and in almost all cases you’ll find that you need to look at the query itself to get some clues about the connections. There are a couple of rules that I’ll mention for this example, though.

a) Scalar subqueries in the select list appear as separate sub-plans ABOVE the driving plan. So we know that the COMPANY and PRODUCT sub-plans represent the two scalar subqueries in our select list which will (notionally) run once for every row returned by the main query. What we don’t know is how many times those subqueries will run (partly because we don’t know how many distinct products and companies will be reported by the main query, and partly because we don’t know how effective the “scalar subquery caching” is going to be.

b) When we see a FILTER operation there’s a good chance that there’s going to be a subquery in the “where” clause of the main query – though there are other interpretations.  In our case we know we have an IN subquery in the original text, and can see that this has turned to an EXISTS in the predicate information section so we can connect together the MAIN and CLASS sub-plans using the “filter-subquery” mechanism which behaves a little like a nested loop join, viz: for each row supplied by the first child call the second child (and subquent children in turn) to see if the row should be passed up to the parent operation. We have only two child operations in this case, but if there were several child operations Oracle would call them in order, stopping at the earliest possible moment.

The FILTER operation covers a multitude of sins, so I’ll be working through more examples in greater detail in future notes.

Costing

A final comment to make about this example, and other plans involving multiple query blocks. The optimizer really has very little idea of how many times any sub-plan / subquery will operate and for a very long time the final Cost (operation 0) of the query would cater only for the cost of the main query block – which meant that sometime you could get a plan where the total cost was lower than the cost of one of the sub-plans. As time passed various algorithms were introduced that resulted in costs that appearing that attempted to estimate the number of times that some of the sub-plan / subquery components were likely to run.

In this example the cost of the main query (operation 8) is 52 with a predicted rowsource of 3,500 rows. The total cost of the plan is shown as 3,567 – which means the increment cost of running the existence subquery 3,500 times (at a cost of something between 1 and 2 each time) has been added, and (possibly) some multiple (the number of rows surviving the FILTER which, unfortunately, is not reported) of the cost of running the other two subqueries has been added to that. A quick check of the plan with the inline scalar subqueries removed indicates that their cost may have been added just once each, and the cost of the filter subquery (slightly over 1) was added 3,500 times.

Clearly there’s plenty of scope for the optimizer to produce poor estimates of run-time costs as it compares different cost-based transformations of a query, so it’s important to be able to recognise the patterns caused by different transformations and taking advantage of hinting to block particular transformations if the optimizer has fooled by its arithmetic into making a bad choice.

 

 

 

 

April 10, 2020

raw timestamp

Filed under: Histograms,Oracle,Troubleshooting — Jonathan Lewis @ 6:59 pm BST Apr 10,2020

Many years ago I wrote a note with some comments about conveting the low_value and high_value columns of user_tab_columns to see what values they actually held. At the time the dbms_stats package held a few procedures to convert raw values and it was necessary to wrap these procedures in user-defined functions if you wanted a function to do the job.

In recent versions of Oracle (18c onwards) the package has finally acquired several functions to do a proper job, these are:

  • convert_raw_to_bin_double()
  • convert_raw_to_bin_float()
  • convert_raw_to_date()
  • convert_raw_to_number()
  • convert_raw_to_nvarchar()
  • convert_raw_to_rowid()
  • convert_raw_to_varchar()

You might note that these functions do not appear in the PL/SQL Packages and Types Reference, but they are documented in the script $ORACLE_HOME/rdbms/admin/dbmsstat.sql that creates the package:


-- Convert the internal representation of a minimum or maximum value
-- into a datatype-specific value. The minval and maxval fields
-- of the StatRec structure as filled in by get_column_stats or
-- prepare_column_values are appropriate values for input.

One thing you’ll notice about the list is that there’s no convert_raw_to_timestamp(), and a question came up recently on the oracle-l list server asking how to do this. This note answers that question and, in passing, demonstrates the typical use of the other functions.

As ever we start by creating some data:


rem
rem     Script:         raw_to_timestamp.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem

create table t1(
        t0 timestamp(0),
        t3 timestamp(3),
        t6 timestamp(6),
        t9 timestamp(9),
        ts timestamp
);

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.123456789','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

insert into t1
with t_tab as (
        select to_timestamp('01-Apr-2020 21:15:10.987654321','dd-mon-yyyy hh24:mi:ss.ff9') ts
        from dual
)
select  t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts, t_tab.ts
from    t_tab
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 2'
        );
end;
/

I’ve creaed a table to hold timestamps constrained to different levels of precision. The maximum precision allowed is 9 decimal places for the seconds, the default (column ts) is 6. I’ve then created two rows in the table using slightly different timestamps for the rows but giving all the columns in a single row the same value. Then I’ve gathered stats – including a histogram – on the table and all its columns.

I can now query user_tab_cols to pick up the low ahd high values:


select
        column_name, low_value, high_value
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  HIGH_VALUE
-------------------- -------------------------- --------------------------
T0                   7878040116100B             7878040116100C
T3                   7878040116100B0754D4C0     7878040116100B3AE3AF00
T6                   7878040116100B075BCDE8     7878040116100B3ADE6770
T9                   7878040116100B075BCD15     7878040116100B3ADE68B1
TS                   7878040116100B075BCDE8     7878040116100B3ADE6770


And here’s a dump of the columns so that you can see the actual values held in the table in their internal representation.


select
        dump(t0,16), 
        dump(t3,16), 
        dump(t6,16), 
        dump(t9,16), 
        dump(ts,16) 
from 
        t1;


DUMP(T0,16)
-------------------------------------------------------------------------------------
DUMP(T3,16)
-------------------------------------------------------------------------------------
DUMP(T6,16)
-------------------------------------------------------------------------------------
DUMP(T9,16)
-------------------------------------------------------------------------------------
DUMP(TS,16)
-------------------------------------------------------------------------------------
Typ=180 Len=7: 78,78,4,1,16,10,b
Typ=180 Len=11: 78,78,4,1,16,10,b,7,54,d4,c0
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,15
Typ=180 Len=11: 78,78,4,1,16,10,b,7,5b,cd,e8

Typ=180 Len=7: 78,78,4,1,16,10,c
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,e3,af,0
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,68,b1
Typ=180 Len=11: 78,78,4,1,16,10,b,3a,de,67,70

You’ll notice that, despite the attempt to limit the precision, all the columns other than the first use 11 bytes. The first column is different, with zero precision we use only 7 bytes and if you look closely (and know your internal storage formats) you’ll realise that those 7 bytes are using the standard format for stored dates. The last 4 bytes hold the fraction of the second stored as a number of nano-seconds as a simple 32-bit binary number. (Warning: it’s possible that you will see a difference in this set of bytes if your machine uses a different endianness from mine – I can’t test that for myself at present.)

So let’s see what we get if we try to convert the raw values using the call to dbms_stats.convert_raw_to_date(). I’ll start by setting the nls_XXX formats to get the full conntent of a date column or a timestamp column reported from SQL*Plus.


alter session set nls_date_format = 'dd-Mon-yyyy hh24:mi:ss';
alter session set nls_timestamp_format = 'dd-Mon-yyyy hh24:mi:ss.ff9';

set linesize 165

column t0 format a32
column t3 format a32
column t6 format a32
column t9 format a32
column ts format a32

select  t0, t3, t6, t9, ts from t1;


select
        column_name, 
        dbms_stats.convert_raw_to_date(low_value)       date_low,
        dbms_stats.convert_raw_to_date(high_value)      date_high
from 
        user_tab_cols
where
        table_name = 'T1'
order by 
        column_name
/



T0                               T3                               T6                               T9                               TS
-------------------------------- -------------------------------- -------------------------------- -------------------------------- --------------------------------
01-Apr-2020 21:15:10.000000000   01-Apr-2020 21:15:10.123000000   01-Apr-2020 21:15:10.123457000   01-Apr-2020 21:15:10.123456789   01-Apr-2020 21:15:10.123457000
01-Apr-2020 21:15:11.000000000   01-Apr-2020 21:15:10.988000000   01-Apr-2020 21:15:10.987654000   01-Apr-2020 21:15:10.987654321   01-Apr-2020 21:15:10.987654000

2 rows selected.


COLUMN_NAME          DATE_LOW             DATE_HIGH
-------------------- -------------------- --------------------
T0                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:11
T3                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T6                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
T9                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10
TS                   01-Apr-2020 21:15:10 01-Apr-2020 21:15:10

5 rows selected.

Since I’ve inserted only two rows, with each row holding a single value with different precision, we expect those values to be reflected in the low_value and high_value for the columns – and we’re not disappointed (probably) but, as you might have expected, Oracle has processed the timestamp data type into a date data type by simply ignoring the last 4 bytes and dealing with the first 7 bytes as if they were a date. We need to do a little more check that we can convert the low and high values into exactly the values that appear in the table itself.

So we can write a messy bit of SQL that extracts the first 7 bytes, converts them using the convert_to_date() call, then extracts the last 4 bytes and converts them to a number using a simple to_number() call (with an nvl() thrown in to deal with the special case of there being no nanosecond component), divides by 1e9, converts the result to an interval in seconds using the numtointervalds() function, and adds that to the date. To keep things clean I’ll only apply the mess to the low_value column. You’ll note that I’ve


select 
        column_name, 
        low_value,
--
--      convert bytes 8 onwards to numeric with an nvl()
--      to handle the case of zero precision when there 
--      are no bytes to show the number of nanoseconds
--
--      Note - I've used substr() to bytes 8 onwards turns
--      into characters 15 - 22.
--
        nvl(
                to_number(
                        hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                ),
                0
        ) nanoseconds,
--
--      repeat the conversion, and wrap with numtodsinterval()
--      to show the resulting interval in seconds.
--
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                )+0,
                'SECOND'
        ) interval_val,
--
--      Converted the 1st 7 bytes (14 characters) to a date,
--      coerce the result to a timestamp, then convert the 
--      nanoseconds bytes to an interval and add
--
        to_timestamp(
                dbms_stats.convert_raw_to_date(hextoraw(substr(low_value,1,14)))
        ) + 
        numtodsinterval(
                nvl(
                        to_number(
                                hextoraw(substr(low_value,15,8)),'XXXXXXXX'
                        )/1e9,
                        0
                ),
                'SECOND'
        ) timestamp_val
from 
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
/

COLUMN_NAME          LOW_VALUE                  NANOSECONDS INTERVAL_VAL                     TIMESTAMP_VAL
-------------------- -------------------------- ----------- -------------------------------- --------------------------------
T0                   7878040116100B                       0 +000000000 00:00:00.000000000    01-Apr-2020 21:15:10.000000000
T3                   7878040116100B0754D4C0       123000000 +000000000 00:00:00.123000000    01-Apr-2020 21:15:10.123000000
T6                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000
T9                   7878040116100B075BCD15       123456789 +000000000 00:00:00.123456789    01-Apr-2020 21:15:10.123456789
TS                   7878040116100B075BCDE8       123457000 +000000000 00:00:00.123457000    01-Apr-2020 21:15:10.123457000

5 rows selected.

If you go back to the original output we got from selecting the actual values from the table you can see that the timestamp_val column we’ve generated correctly matches the column values for the lower of the two original values we inserted into the table.

Q.E.D.

March 22, 2020

ANSI hinting

Filed under: ANSI Standard,Execution plans,Hints,Oracle — Jonathan Lewis @ 8:32 pm GMT Mar 22,2020

I’ve made casual remarks in the past about how “ANSI”-style SQL introduces extra complications in labelling or identifying query blocks – which means it’s harder to hint correctly. This is a note to show how the optimizer first transforms “ANSI” SQL into “Oracle” syntax. I’m going to write a simple 4-table join in classic Oracle form and check the execution plan with its query block names and fully qualified table aliases; then I’ll translate to the ANSI equivalent and repeat the check for query block names and aliases , finally I’ll rewrite the query in classic Oracle syntax that reproduces the query block names and fully qualified table aliases that we got from the ANSI form.

We start by creating and indexing 4 tables (with a script that I’ve been using for various tests for several years, but the results I’ll show come from 19c):

rem
rem     Script:         ansi_hint_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2014
rem

create table t1
as
select 
        trunc((rownum-1)/4)     t1_n1,
        trunc((rownum-1)/4)     t1_n2,
        case mod(rownum,20) when 0 then rownum else -1 end      flagged1,
        rpad(rownum,180)        t1_v1
from all_objects 
where rownum <= 3000 --> comment to avoid wordpress format issue
;

create table t2
as
select 
        mod(rownum,200)         t2_n1,
        mod(rownum,200)         t2_n2,
        case mod(rownum,20) when 0 then rownum else -1 end      flagged2,
        rpad(rownum,180)        t2_v1
from all_objects 
where rownum <= 3000 --> comment to avoid wordpress format issue
;

create table t3
as
select 
        trunc((rownum-1)/4)     t3_n1,
        trunc((rownum-1)/4)     t3_n2,
        case mod(rownum,20) when 0 then rownum else -1 end      flagged3,
        rpad(rownum,180)        t3_v1
from all_objects 
where rownum <= 3000 --> comment to avoid wordpress format issue
;

create table t4
as
select 
        trunc((rownum-1)/4)     t4_n1,
        trunc((rownum-1)/4)     t4_n2,
        case mod(rownum,20) when 0 then rownum else -1 end      flagged4,
        rpad(rownum,180)        t4_v1
from all_objects 
where rownum <= 3000 --> comment to avoid wordpress format issue
;

create index t1_i1 on t1(t1_n1);
create index t2_i1 on t2(t2_n1);
create index t3_i1 on t3(t3_n1);
create index t4_i1 on t4(t4_n1);

Then we check the execution plan for a simple statement with what looks like a single named query block:


explain plan for
select
        /*+ qb_name(main) */
        *
from
        t1, t2, t3, t4
where
        t2.t2_n1 = t1.t1_n2
and     t3.t3_n1 = t2.t2_n2
and     t4.t4_n1 = t3.t3_n2
;

select * from table(dbms_xplan.display(null,null,'outline alias'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3619951144

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   192K|   140M|    61  (22)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   192K|   140M|    61  (22)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | T4   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 48000 |    26M|    41  (13)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | T3   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 12000 |  4500K|    26   (8)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |  3000 |   559K|    13   (8)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| T1   |  3000 |   565K|    13   (8)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - MAIN / T4@MAIN
   4 - MAIN / T3@MAIN
   6 - MAIN / T2@MAIN
   7 - MAIN / T1@MAIN

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"MAIN" "T4"@"MAIN")
      SWAP_JOIN_INPUTS(@"MAIN" "T3"@"MAIN")
      SWAP_JOIN_INPUTS(@"MAIN" "T2"@"MAIN")
      USE_HASH(@"MAIN" "T4"@"MAIN")
      USE_HASH(@"MAIN" "T3"@"MAIN")
      USE_HASH(@"MAIN" "T2"@"MAIN")
      LEADING(@"MAIN" "T1"@"MAIN" "T2"@"MAIN" "T3"@"MAIN" "T4"@"MAIN")
      FULL(@"MAIN" "T4"@"MAIN")
      FULL(@"MAIN" "T3"@"MAIN")
      FULL(@"MAIN" "T2"@"MAIN")
      FULL(@"MAIN" "T1"@"MAIN")
      OUTLINE_LEAF(@"MAIN")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T4"."T4_N1"="T3"."T3_N2")
   3 - access("T3"."T3_N1"="T2"."T2_N2")
   5 - access("T2"."T2_N1"="T1"."T1_N2")

Note in the Query Block Name / Object Alias information that all 4 tables were “sourced from”, or have aliases qualified by, “@MAIN”, and in the final plan all the tables are used in a query block called MAIN.

Now look at the basic ANSI equivalent:


explain plan for
select 
        /*+ qb_name(main) */
        *
from
        t1
join 
        t2
on      t2.t2_n1 = t1.t1_n2
join 
        t3
on      t3.t3_n1 = t2.t2_n2
join 
        t4
on      t4.t4_n1 = t3.t3_n2
;

select * from table(dbms_xplan.display(null,null,'outline alias'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3619951144

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   192K|   140M|    61  (22)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   192K|   140M|    61  (22)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | T4   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 48000 |    26M|    41  (13)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | T3   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 12000 |  4500K|    26   (8)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |  3000 |   559K|    13   (8)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| T1   |  3000 |   565K|    13   (8)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$43767242
   2 - SEL$43767242 / T4@SEL$3
   4 - SEL$43767242 / T3@SEL$2
   6 - SEL$43767242 / T2@SEL$1
   7 - SEL$43767242 / T1@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T4"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T3"@"SEL$2")
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T2"@"SEL$1")
      USE_HASH(@"SEL$43767242" "T4"@"SEL$3")
      USE_HASH(@"SEL$43767242" "T3"@"SEL$2")
      USE_HASH(@"SEL$43767242" "T2"@"SEL$1")
      LEADING(@"SEL$43767242" "T1"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$2" "T4"@"SEL$3")
      FULL(@"SEL$43767242" "T4"@"SEL$3")
      FULL(@"SEL$43767242" "T3"@"SEL$2")
      FULL(@"SEL$43767242" "T2"@"SEL$1")
      FULL(@"SEL$43767242" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$58A6D7F6")
      OUTLINE(@"SEL$3")
      MERGE(@"SEL$58A6D7F6" >"SEL$3")
      OUTLINE(@"SEL$9E43CB6E")
      OUTLINE(@"MAIN")
      MERGE(@"SEL$9E43CB6E" >"MAIN")
      OUTLINE_LEAF(@"SEL$43767242")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T4"."T4_N1"="T3"."T3_N2")
   3 - access("T3"."T3_N1"="T2"."T2_N2")
   5 - access("T2"."T2_N1"="T1"."T1_N2")

Check the Plan Hash Value – it gives you a strong clue that the execution plans are the same, and a close examination of the body of the plan and the Predicate information confirm that the two queries operate in exactly the same way at exactly the same cost. But there’s a significant difference in the query blocks and table aliases.

The Query Block Name / Alias Alias information tells us that query block “main” has disappeared and the query operates completely from a query block with the internally generated name SEL$43767242; moreover we can see that tables t1 and t2 appear to be sourced from a query block called sel$1, while t3 comes from sel$2 and t4 comes from sel$3.

Finally here’s a messy Oracle form to reproduce the ANSI query block names and table aliases:


explain plan for
select  /*+ qb_name(main) */
        *
from    (
        select  /*+ qb_name(sel$3) */
                *
        from
                (
                select  /*+ qb_name(sel$2) */
                        *
                from    (
                        select 
                                /*+ qb_name(sel$1) */
                                *
                        from    
                                t1,
                                t2
                        where   t2.t2_n1 = t1.t1_n2
                        ) v1,
                        t3
                where   t3.t3_n1 = v1.t2_n2
                )       v2,
                t4
        where   t4.t4_n1 = v2.t3_n2
        )
;

select * from table(dbms_xplan.display(null,null,'outline alias'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------
Plan hash value: 3619951144

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   192K|   140M|    61  (22)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   192K|   140M|    61  (22)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | T4   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  3 |   HASH JOIN          |      | 48000 |    26M|    41  (13)| 00:00:01 |
|   4 |    TABLE ACCESS FULL | T3   |  3000 |   565K|    13   (8)| 00:00:01 |
|*  5 |    HASH JOIN         |      | 12000 |  4500K|    26   (8)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   |  3000 |   559K|    13   (8)| 00:00:01 |
|   7 |     TABLE ACCESS FULL| T1   |  3000 |   565K|    13   (8)| 00:00:01 |
-----------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$43767242
   2 - SEL$43767242 / T4@SEL$3
   4 - SEL$43767242 / T3@SEL$2
   6 - SEL$43767242 / T2@SEL$1
   7 - SEL$43767242 / T1@SEL$1

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T4"@"SEL$3")
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T3"@"SEL$2")
      SWAP_JOIN_INPUTS(@"SEL$43767242" "T2"@"SEL$1")
      USE_HASH(@"SEL$43767242" "T4"@"SEL$3")
      USE_HASH(@"SEL$43767242" "T3"@"SEL$2")
      USE_HASH(@"SEL$43767242" "T2"@"SEL$1")
      LEADING(@"SEL$43767242" "T1"@"SEL$1" "T2"@"SEL$1" "T3"@"SEL$2"
              "T4"@"SEL$3")
      FULL(@"SEL$43767242" "T4"@"SEL$3")
      FULL(@"SEL$43767242" "T3"@"SEL$2")
      FULL(@"SEL$43767242" "T2"@"SEL$1")
      FULL(@"SEL$43767242" "T1"@"SEL$1")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$2")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$58A6D7F6")
      OUTLINE(@"SEL$3")
      MERGE(@"SEL$58A6D7F6" >"SEL$3")
      OUTLINE(@"SEL$9E43CB6E")
      OUTLINE(@"MAIN")
      MERGE(@"SEL$9E43CB6E" >"MAIN")
      OUTLINE_LEAF(@"SEL$43767242")
      ALL_ROWS
      DB_VERSION('19.1.0')
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T4"."T4_N1"="T3"."T3_N2")
   3 - access("T3"."T3_N1"="T2"."T2_N2")
   5 - access("T2"."T2_N1"="T1"."T1_N2")

Again a quick check of the Plan Hash Value confirms that the messier query is a match for the previous query with its ANSI transformation, and the plan body and Query Block Name / Object Alias information confirm the match throughout in the naming.

Any time you write ANSI syntax this layering of nested inline views is what happens to your query before any other transformation is applied – and sometimes (though very rarely in recent versions of Oracle) this can result in unexpected limitations in the way the optimizer subsequently transforms the query.

Apart from “accidents”, though, the big issue with the “ANSI rewrite” comes from the side effects of all the extra query blocks. In anything but the simplest cases you have to work a little harder to figure out the query block names you need to use if you want to apply hints to fix an optimizer problem – you can’t create your own meaningful names for every query block in the query you wrote. Fortunately this task is made a little easier if you check the execution plan of the query after adding the hint /*+ no_query_transformation */, as this tends to produce a plan that looks like a step by step “translation” of the way the query was written (apart from the ANSI transformation, of course). This might be enough to identify the base-level query blocks that the optimizer starts with when you use ANSI syntax.

 

« Previous PageNext Page »

Powered by WordPress.com.