Oracle Scratchpad

April 29, 2020

Execution Plans

Filed under: Execution plans,Hints,Oracle — Jonathan Lewis @ 1:52 pm BST Apr 29,2020

A couple of days ago I discussed an execution plan that displayed some variation in the way it handled subqueries and even threw in a little deception by displaying an anti-join that was the result of transforming a “not exists” subquery and a semi-join that looked at first sight as if it were going to be the result of transforming an “exists” subquery.

As part of the dissection I reverse engineered the query into a set of tables that would allow me to reproduce the execution plan so that I could report the “final query blocks” (outline_leafs). As a brief appendix to that blog note I’m publishing here the script to create those tables and three plans that I went through to get to the plan I needed.


rem
rem     Script:         anti_semi.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.1.0.2
rem

create table ip_spells 
partition by range (admission_dttm) interval (numtoyminterval(1,'MONTH')) (
        partition p_start values less than (to_date('01-Jan-2020'))
)
as
with g as (
        select rownum id from dual
        connect by level <= 1e4 -- > avoid wordpress format issue
)
select
        rownum                                  spell_id,
        to_char(mod(rownum,75),'FM00')          admission_method_code,
        to_date('01-jan_2020') + rownum / 1000  admission_dttm,
        to_char(mod(rownum,57),'FM00')          administrative_category_code,
        lpad(rownum,10)                         v1,
        rpad('x',100)                           padding
from
        g,g
where
        rownum <= 365000
;

alter table ip_spells add constraint sp_pk primary key(spell_id);

create table ip_episodes
as
select
        spell_id,
        spell_id        episode_id,
        1+mod(rownum,6) episode_order,
        to_char(mod(rownum,125)+ 500,'FM999')   main_specialty_code,
        v1,
        padding
from
        ip_spells
order by
        dbms_random.value
;

alter table ip_episodes add constraint ep_pk primary key(episode_id);
alter table ip_episodes add constraint ep_fk_sp foreign key (spell_id) references ip_spells;
create index idx02_ip_episodes on ip_episodes(spell_id);


create table ip_diagnosis
as
select
        episode_id,
        chr(mod(rownum,25) + 65) ||
                to_char(dbms_random.value(30,512),'FM999')      diagnosis_code,
        mod(rownum,5)                                           diagnosis_sequence,
        lpad(rownum,10)                 v1,
        rpad('x',100)                   padding
from
        (select rownum id from dual connect by level <= 5),
        ip_episodes
;

alter table ip_diagnosis add constraint di_fk_ep foreign key(episode_id) references ip_episodes;
create index idx01_ip_diagnosis on ip_diagnosis(episode_id);

The original ip_spells table was range partitioned and there was a date-based predicate in the query that encouraged me to use the date column as the partitioning column. I also decided to generate data for one year at 1,000 rows per day from the start of a year, with interval partitioning of one month to get a small number of partitions.

The original plan suggested that the number of ip_episodes was similar to the number of ip_spells, so I just used a copy of the rows from ip_spells to create ip_epsisodes, and then gave it some appropriate primary and foreign key constraints and indexes.

Finally, the ip_diagnosis table looked as if it held an average of 5 rows per ip_episodes, so I generated it from ip_episodes by joining to a 5-row set generated by the usual “connect by” trick with dual.

I’ve only got a small data set, and most of the indexes are sequence based with excellent clustering_factors, so I wasn’t going to be surprised if my data and stats didn’t immediately produce the execution plan of the original query.

Here’s the original query (just as a reminder), and the first plan I got with no hinting (running 12.1.0.2):


select
        * 
from 
        ip_spells a
where 
        not exists (
                select
                        1
                from
                        ip_episodes e
                inner join 
                        ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                           substr(d.diagnosis_code,1,1) = 'C'
                        or substr(d.diagnosis_code,1,3) between 'D37' and 'D48'
                        or substr(d.diagnosis_code,1,1)  = 'V'
                        or d.diagnosis_code = 'Z511'
                )
        )
and     exists (
                select
                        1
                from
                        ip_episodes e
                left join 
                        ip_diagnosis d 
                on
                        d.episode_id = e.episode_id
                where
                        a.spell_id = e.spell_id
                and     (
                            e.episode_order = '1'
                        and e.main_specialty_code not in ('501','560','610')
                        and d.diagnosis_sequence = 1
                        and substr(d.diagnosis_code,1,1) <> 'O'
                        )
                )
and     substr(a.admission_method_code,1,1) = '2'                       -- 1% selectivity on substr()
and     a.admission_dttm >= to_date('01-jan-2011', 'dd-mon-yyyy')
and     a.administrative_category_code = '01'                           -- 1 / 57 by definition
;


Plan hash value: 1492475845

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |     1 |   131 |  1299   (5)| 00:00:06 |       |       |
|*  1 |  FILTER                               |                    |       |       |            |          |       |       |
|   2 |   PARTITION RANGE ALL                 |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  3 |    TABLE ACCESS FULL                  | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   4 |   NESTED LOOPS SEMI                   |                    |     1 |    20 |    11   (0)| 00:00:01 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    10 |     4   (0)| 00:00:01 |       |       |
|*  6 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|*  7 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 40931 |   399K|     7   (0)| 00:00:01 |       |       |
|*  8 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
|   9 |   NESTED LOOPS SEMI                   |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

As you can see, both subqueries ran as FILTER subqueries. If I were to include the query block information you would see that operations 4 to 8 are the (transformed) “not exists” subquery, operations 9 to 13 are the transformed “exists” subquery, and operations 1 to 3 represent the main body of the query.

I needed to see the “not exists” subquery unnested and transformed into a hash anti-join, so my first attempt at hinting was to add an /*+ unnest */ hint to that subquery, producing the following plan:


Plan hash value: 147447036
------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                    |     1 |   133 |  1652   (4)| 00:00:07 |       |       |
|*  1 |  FILTER                                 |                    |       |       |            |          |       |       |
|   2 |   NESTED LOOPS ANTI                     |                    |     1 |   133 |  1641   (4)| 00:00:07 |       |       |
|   3 |    PARTITION RANGE ALL                  |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                   | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    VIEW PUSHED PREDICATE                | VW_SQ_1            |     1 |     2 |    11   (0)| 00:00:01 |       |       |
|   6 |     NESTED LOOPS SEMI                   |                    |     1 |    20 |    11   (0)| 00:00:01 |       |       |
|   7 |      TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    10 |     4   (0)| 00:00:01 |       |       |
|*  8 |       INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|*  9 |      TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 40931 |   399K|     7   (0)| 00:00:01 |       |       |
|* 10 |       INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
|  11 |   NESTED LOOPS SEMI                     |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED  | IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                    | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 14 |    TABLE ACCESS BY INDEX ROWID BATCHED  | IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 15 |     INDEX RANGE SCAN                    | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------------

You can see the unnested subquery in the name vw_sq_1 at operation 5; but we have a nested loop anti at operation 2 when I want a hash join anti, and we also see the option “pushed predicate” for the view at operation 5.

If I published the query block information in this case you would still see 3 “final” query blocks. Operations 11 to 15 would be the “exists” subquery; operation 5 to 10 would be the query block for the non-mergeable view that the optimizer produced by unnesting the “not exists” subquery; and operations 1 to 4 represent the overall parent query block.

This example highlights a detail which is a little easy to miss in earlier discussions of the query and its plans. Some operations in a plan look as if they could be associated with two query block names – the query block for which they are the top line, and the query block by which they are used.

Taking this plan as an example, operation 5 is clearly the starting point of the query block from operations 5 to 10, and operation 11 is clearly the starting point for the query block from operations 11 to 15. On the other hand when we collapse query blocks to understand the overall structure of how the query operates we get the following plan – in which we view “Rowsource 2” and “Rowsource 3” as simple data sources in the main query block – and it would be nice to see operations 5 and 11 with the same query block name as operations 1 to 4.

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name            | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                 |     1 |   133 |  1652   (4)| 00:00:07 |       |       |
|*  1 |  FILTER                                 |                 |       |       |            |          |       |       |
|   2 |   NESTED LOOPS ANTI                     |                 |     1 |   133 |  1641   (4)| 00:00:07 |       |       |
|   3 |    PARTITION RANGE ALL                  |                 |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                   | IP_SPELLS       |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    Rowsource 2 : VIEW PUSHED PREDICATE  | VW_SQ_1         |     1 |     2 |    11   (0)| 00:00:01 |       |       |
|  11 |   Rowsource 3 : NESTED LOOPS SEMI       |                 |     1 |    30 |    11   (0)| 00:00:01 |       |       |
---------------------------------------------------------------------------------------------------------------------------

We still have a little work to do to get where we want to be: the /*+ unnest */ hint has got us part way to the plan we want –  so (taking my cue from the  pushed predicate option) to get from the nested loop anti join to the hash anti join I decided to try changing the hint in the “not exists” subquery to /*+ unnest no_push_pred */ – and this is the plan that appeared as a result:


Plan hash value: 2721384176
----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |     1 |   144 |  6633  (11)| 00:00:26 |       |       |
|*  1 |  FILTER                               |                    |       |       |            |          |       |       |
|*  2 |   HASH JOIN ANTI                      |                    |     1 |   144 |  6622  (11)| 00:00:26 |       |       |
|   3 |    PARTITION RANGE ALL                |                    |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|*  4 |     TABLE ACCESS FULL                 | IP_SPELLS          |    64 |  8384 |   935   (6)| 00:00:04 |     1 |1048575|
|   5 |    VIEW                               | VW_SQ_1            | 40931 |   519K|  5685  (12)| 00:00:23 |       |       |
|*  6 |     HASH JOIN                         |                    | 40931 |   799K|  5685  (12)| 00:00:23 |       |       |
|*  7 |      TABLE ACCESS FULL                | IP_DIAGNOSIS       | 40931 |   399K|  4761  (13)| 00:00:19 |       |       |
|   8 |      TABLE ACCESS FULL                | IP_EPISODES        |   365K|  3564K|   906   (5)| 00:00:04 |       |       |
|   9 |   NESTED LOOPS SEMI                   |                    |     1 |    30 |    11   (0)| 00:00:01 |       |       |
|* 10 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_EPISODES        |     1 |    17 |     4   (0)| 00:00:01 |       |       |
|* 11 |     INDEX RANGE SCAN                  | IDX02_IP_EPISODES  |     1 |       |     3   (0)| 00:00:01 |       |       |
|* 12 |    TABLE ACCESS BY INDEX ROWID BATCHED| IP_DIAGNOSIS       | 18250 |   231K|     7   (0)| 00:00:01 |       |       |
|* 13 |     INDEX RANGE SCAN                  | IDX01_IP_DIAGNOSIS |     5 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------

It looks the right shape, it’s got the right mechanisms in place, and (very importantly – though not a 100% guarantee) it’s got the same plan_hash_value as the orginally published plan. So at this point I felt the final query block names it reported would reflect the ones that would have been used in the original plan.

In a production system, of course, you don’t just stick a couple of hints into a query and hope it will be enough to stabilise the plan. Here’s the full set of hints that appeared in the Outline Data when I added my two hints to the query:


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      PARTIAL_JOIN(@"SEL$2B0A33EA" "D"@"SEL$2")
      USE_NL(@"SEL$2B0A33EA" "D"@"SEL$2")
      LEADING(@"SEL$2B0A33EA" "E"@"SEL$3" "D"@"SEL$2")
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "D"@"SEL$2")
      INDEX_RS_ASC(@"SEL$2B0A33EA" "D"@"SEL$2" ("IP_DIAGNOSIS"."EPISODE_ID"))
      BATCH_TABLE_ACCESS_BY_ROWID(@"SEL$2B0A33EA" "E"@"SEL$3")
      INDEX_RS_ASC(@"SEL$2B0A33EA" "E"@"SEL$3" ("IP_EPISODES"."SPELL_ID"))
      USE_HASH(@"SEL$8D33959D" "E"@"SEL$1")
      LEADING(@"SEL$8D33959D" "D"@"SEL$1" "E"@"SEL$1")
      FULL(@"SEL$8D33959D" "E"@"SEL$1")
      FULL(@"SEL$8D33959D" "D"@"SEL$1")
      PQ_FILTER(@"SEL$2B969259" SERIAL)
      USE_HASH(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
      LEADING(@"SEL$2B969259" "A"@"SEL$4" "VW_SQ_1"@"SEL$F49409E0")
      NO_ACCESS(@"SEL$2B969259" "VW_SQ_1"@"SEL$F49409E0")
      FULL(@"SEL$2B969259" "A"@"SEL$4")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$5")
      MERGE(@"SEL$2")
      OUTLINE(@"SEL$64EAE176")
      OUTLINE(@"SEL$6")
      OUTLINE(@"SEL$F49409E0")
      MERGE(@"SEL$1")
      OUTLINE(@"SEL$8C3A16E3")
      MERGE(@"SEL$64EAE176")
      OUTLINE(@"SEL$4B169FC8")
      UNNEST(@"SEL$8C3A16E3")
      OUTLINE_LEAF(@"SEL$2B969259")
      OUTLINE_LEAF(@"SEL$8D33959D")
      OUTER_JOIN_TO_INNER(@"SEL$4B169FC8" "D"@"SEL$2")
      OUTLINE_LEAF(@"SEL$2B0A33EA")
      ALL_ROWS
      DB_VERSION('12.1.0.2')
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

When I put my two hints into the query text and reran the test under 19.3 the FILTER operation disappeared and the “exists” subquery also unnested (to become vw_sq_2), turning into a nested loop semi-join. With the full set of 40 hints in place the plan from 12.1.0.2 re-appeared.

I did actually have an alternative strategy for the manual hint test. The plan with the nested loop anti join reported the following query block information:


Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$2B969259
   4 - SEL$2B969259 / A@SEL$4
   5 - SEL$D276C01C / VW_SQ_1@SEL$F49409E0
   6 - SEL$D276C01C
   7 - SEL$D276C01C / E@SEL$1
   8 - SEL$D276C01C / E@SEL$1
   9 - SEL$D276C01C / D@SEL$1
  10 - SEL$D276C01C / D@SEL$1
  11 - SEL$2B0A33EA
  12 - SEL$2B0A33EA / E@SEL$3
  13 - SEL$2B0A33EA / E@SEL$3

This prompted me to tell Oracle to do a hash join in query block SEL$2B969259 between A@SEL$4 and VW_SQ_1@SEL$F49409E0 (in that order) by adding a simple set of hints to the start of the query while leaving (just) the /*+ unnest */ hint in the “not exists” subquery.


select
        /*+ 
                leading (@sel$2b969259  a@sel$4  vw_sq_1@sel$f49409e0)
                use_hash(@sel$2b969259  vw_sq_1@sel$f49409e0)
                no_swap_join_inputs(@sel$2b969259 vw_sq_1@sel$f49409e0)
        */
        * 
from 
 

Again, this produced the required execution path.

It’s not always this easy to reproduce an execution plan by looking at a query text – sometimes the actual plan depends on knowing about things like unique constraints, foreign key constraints, and not null constraints. But if you can get your hands on the SQL to create the objects involved it’s usually possible to re-create the plan by hinting, which then allows you to discover how you change the hints to modify the plan.

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.