Oracle Scratchpad

January 10, 2017

Join Elimination 12.2

Filed under: 12c,Bugs,Execution plans,Oracle — Jonathan Lewis @ 1:03 pm GMT Jan 10,2017

From time to time someone comes up with the question about whether or not the order of tables in the from clause of a SQL statement should make a difference to execution plans and performance. Broadly speaking the answer is no, although there are a couple of boundary cases were a difference can appear unexpectedly.

When considering join permutations the optimizer has a few algorithms for picking an initial join order and then deciding how to permute from that order, and one of the criteria with the very lowest priority (i.e. when all other factors are equal) is dictated by the order the tables appear in the from clause so if you have enough tables in the from clause it’s possible for the subset of join orders considered to change if you change the from clause in a way that causes the initial join order to change.

It’s been over 11 years since I wrote the article I’ve linked to in the previous paragraph and in that time no-one has yet approached me with other examples of a plan changing due to a change in the from clause order (though, with all the transformations now available to the optimizer, I wouldn’t be surprised if a few cases have appeared occasionally, so feel free to let me know if you think you’ve got an interesting example that I can experiment on).

A little while ago, though, while testing a feature enhancement in 12.2, I finally came across a case where a real difference appeared. Here’s the query I was using – I’ll give you the SQL to reproduce the tables at the end of the article:


select 
        count(c.small_vc_c)
from 
        grandparent     g, 
        parent          p,
        child           c
where
        c.small_num_c between 200 and 215
and     p.id   = c.id_p
and     p.id_g = c.id_g
and     g.id   = p.id_g
;

As you will see later on the three tables grandparent, parent, child have the obvious primary keys and referential integrity constraints. This means that grandparent has a single-column primary key, parent has a two-column primary key, and child has a three-column primary key. The query joins the three tables along their primary keys and then selects data only from the child table, so it’s a good candidate for join elimination.

In earlier versions of Oracle join elimination could take place only if the primary key you joined to was a single column key, so 12.1 and earlier would be able to eliminate just the grandparent from this three-table join; but in 12.2 multi-column primary keys also allow join elimination to take place, so we might hope that the plan we get from this query would eliminate both the grandparent and parent tables. Here’s the plan (pulled from memory after execution):

SQL_ID  8hdybjpn2884b, child number 0
-------------------------------------
select  count(c.small_vc_c) from  grandparent g,  parent  p,  child  c
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 4120004759

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    26   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    26   (4)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

It didn’t work quite as expected. The optimizer has managed to eliminate table parent – so that looks like “single column primary key” join elimination has worked, but “multi-column” join elimination hasn’t appeared. On the other hand, I’ve not followed my usual rules for writing SQL so let’s try again. If I follow the pattern I usually follow, my from clause would have been in the order child  -> parent -> grandparent – listing the tables in the order I expect to visit them. Here’s the plan – again pulled from memory – after making this visual change the SQL:


SQL_ID  1uuq5vf4bq0gt, child number 0
-------------------------------------
select  count(c.small_vc_c) from  child  c,  parent  p,  grandparent g
where  c.small_num_c between 200 and 215 and p.id   = c.id_p and p.id_g
= c.id_g and g.id   = p.id_g

Plan hash value: 1546491375

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    26 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    26   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

So join elimination based on multi-column primary keys does work – but you might have to get a bit lucky in the order you list the tables in the from clause.

Footnote.

If you’re wondering whether or not switching from Oracle syntax to ANSI syntax would make a difference, it does: with ANSI syntax both grandparent and parent are eliminated if the SQL lists the tables in the order grandparent -> parent -> child (i.e. the order which doesn’t work properly for Oracle syntax) and only the parent is eliminated for the order child -> parent -> grandparent. In other words, both syntax options have a point of failure but they fail the opposite way around.

Code:


rem
rem     Script:         join_elimination_12c2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2017
rem     

-- Environment details eliminated

define m_pad=100

/*
        IDs will be 1 to 1000
*/

create table grandparent 
as
select 
        rownum                  id,
        trunc((rownum-1)/5)     small_num_g,
        rpad(rownum,10)         small_vc_g,
        rpad(rownum,&m_pad)     padding_g
from 
        all_objects 
where 
        rownum <= 1000 -- > comment to avoid wordpress format issue
;

/*
        Each GP has two (scattered) children here
        Parent IDs are 1 to 2,000
*/

create table parent 
as
select 
        1+mod(rownum,1000)      id_g,
        rownum                  id,
        trunc((rownum-1)/5)     small_num_p,
        rpad(rownum,10)         small_vc_p,
        rpad(rownum,&m_pad)     padding_p
from 
        all_objects 
where 
        rownum <= 2000 -- > comment to avoid wordpress format issue
;

/*
        Simple trick to get 5 (clustered) children per parent
        Child IDs are 1 to 12,000
*/

create table child 
as
select 
        id_g,
        id                      id_p,
        rownum                  id,
        trunc((rownum-1)/5)     small_num_c,
        rpad(rownum,10)         small_vc_c,
        rpad(rownum,&m_pad)     padding_c
from 
        parent  p,
        (
                select /*+ no_merge */ 
                        rownum 
                from    parent p 
                where   rownum <= 5 -- > comment to avoid wordpress format issue
        )       d
;


create unique index g_pk on grandparent(id);
create unique index p_pk on parent     (id_g, id)       compress 1;
create unique index c_pk on child      (id_g, id_p, id) compress 2;

alter table grandparent add constraint g_pk primary key (id);
alter table parent      add constraint p_pk primary key (id_g, id);
alter table child       add constraint c_pk primary key (id_g, id_p, id);

alter table parent add constraint p_fk_g foreign key (id_g)       references grandparent;
alter table child  add constraint c_fk_p foreign key (id_g, id_p) references parent;

rem
rem     Don't need to collect stats because it's 12c
rem

prompt  ===============================================================
prompt  Join all three tables with the FROM clause ordered gp -> p -> c
prompt  The final plan is GP->C, The optimizer eliminated P before
prompt  considering GP
prompt  ===============================================================

select 
        count(c.small_vc_c)
from 
        grandparent     g, 
        parent          p,
        child           c
where
        c.small_num_c between 200 and 215
and     p.id   = c.id_p
and     p.id_g = c.id_g
and     g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt  ===============================================================
prompt  Join all three tables with the FROM clause ordered c -> p -> gp
prompt  The final plan is a tablescan of C only. The optimizer managed 
prompt  to eliminate GP first and P second
prompt  ===============================================================

select 
        count(c.small_vc_c)
from 
        child           c,
        parent          p,
        grandparent     g 
where
        c.small_num_c between 200 and 215
and     p.id   = c.id_p
and     p.id_g = c.id_g
and     g.id   = p.id_g
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt  ==================================================
prompt  Convert to ANSI standard in the order gp -> p -> c
prompt  and both gp and p eliminated.
prompt  ==================================================

select 
        count(c.small_vc_c)
from 
        grandparent     g
join
        parent          p
on      p.id_g = g.id
join
        child           c
on      c.id_p = p.id
and     c.id_g = p.id_g
where
        c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));

prompt  ===================================================
prompt  Convert to ANSI standard in the order c -> p -> gp
prompt  and only p is eliminated. 
prompt  ===================================================

select 
        count(c.small_vc_c)
from 
        child           c
join
        parent          p
on      p.id   = c.id_p 
and     p.id_g = c.id_g 
join
        grandparent     g
on      g.id = p.id_g 
where
        c.small_num_c between 200 and 215
;

select * from table(dbms_xplan.display_cursor(null,null,'outline'));


It’s possible, of course, that with different system stats, or I/O calibration, or extent sizes, or segment space management, or block sizes, sundry other parameter details that you won’t be able to reproduce the results without messing about a little bit, but I don’t think I’ve done anything special in the setup that would make a real difference.

Footnote:

If you’re wondering why the “traditional” and “ANSI” syntax should exhibit this flaw for joins in the opposite direction – remember that ANSI SQL is first transformed into an equivalent Oracle form and – in the simple cases – the first two tables form the first query block then each table after that introduces a new query block, so the optimizer strategy does (approximately) the following translation:


select ... from grandparent join parent join child

==>

select ... from (select ... from grandparent join parent) join child

The optimizer then optimizes the inline query, which eliminates grandparent leaving a join between parent and child, which then allows parent to be eliminated.

Conversely we get:

select ... form child join parent join grandparent 

==>

select ... from (select ... from child join parent) join grandparent

In this form the optimizer eliminates parent from the inline view and is left with a join between child and grandparent – so no further elimination.

Update Dec 2019

This bug is still present in 19.3.0.0 (and 18.3.0.0).

Dani Schnider has published a blog note about the benefits this feature (multi-column join elimination) offers when using Data Vault and suggests that anyone using the product should upgrade to at least 12.2: Data Vault Queries and Join Elimination

Update July 2021

The bug is still present in 19.11.0.0 and 21.3.0.0

Responding to a comment on twitter about “ANSI”style and join order, here are the execution plans from the two ANSI-style queries in the supplied code, as reported in 19.11.0.0, first with grandparent join parent join child:

 
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |       |       |    27 (100)|          |
|   1 |  SORT AGGREGATE    |       |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL| CHILD |    85 |  1275 |    27   (8)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))

Then with child join parent join grandparent

-----------------------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |       |       |    27 (100)|          |
|   1 |  SORT AGGREGATE     |       |     1 |    23 |            |          |
|   2 |   NESTED LOOPS      |       |    85 |  1955 |    27   (8)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| CHILD |    85 |  1615 |    27   (8)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN| G_PK  |     1 |     4 |     0   (0)|          |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("C"."SMALL_NUM_C"<=215 AND "C"."SMALL_NUM_C">=200))
   4 - access("G"."ID"="C"."ID_G")

The changes in plan aren’t the best possible examples for refuting the suggestion that ANSI syntax dictates the run-time join order; but the fact that Oracle has managed to discard the table in the middle (for the child -> parent -> grandparent example) does show that the optimizer doesn’t really care very much what your query looked like if it can find a way to rewrite it.

Update June 2023

The bug is still present in 23.1 (FREE). The bug referenced on comment #1 by Mark Jefferys is not visible (or no longer exists), but the symptoms are described in a document created in December 2018 “ANSI SQL And Oracle Native SQL With 3-way Join Giving Different Explain Plans (Doc ID 2483430.1)”.

8 Comments »

  1. For reference, this was filed as (currently unpublished) Bug 22228669.

    Mark Jefferys – Oracle Support

    Comment by Mark Jefferys — January 11, 2017 @ 1:24 am GMT Jan 11,2017 | Reply

  2. Mark,
    Thanks for the reference.
    Much appreciated.

    Comment by Jonathan Lewis — January 11, 2017 @ 7:01 am GMT Jan 11,2017 | Reply

  3. […] two-column example in 11g won’t do join elimination while a two-column example in 12.2 can (though it doesn’t always) – so upgrading to 12.2 MIGHT cause more people to see this anomaly […]

    Pingback by Join Elimination | Oracle Scratchpad — March 9, 2017 @ 6:39 pm GMT Mar 9,2017 | Reply

  4. I had got to the point where I had thought it is generally better to have single column primary and foreign keys in a data warehouse, at least defined, so that you get join elimination. You could use multi-column FKs from 12.2, but then you have the additional hurdle of getting the from clause in the right order.

    Then I came across your post (albeit from 2006) on Meaningless Keys https://jonathanlewis.wordpress.com/2006/12/29/meaningless-keys/. I wasn’t sure how much this was aimed at OLTP rather than data warehouse.

    It seemed to me that if you go for single-column PKs and FKs you get meaningless keys with one set of problems, but if (from 12.2) you use multi-column PKs you can get a different set of problems. Is this a fair assessment, or do you have a different view?

    Secondly, FKs can be used to enforce referential integrity and lead to foreign key join elimination. I haven’t been able to produce an example where their presence changes the execution plan of a query (other than by join elimination). I wondered if you have such an example?

    Comment by David Kurtz — November 12, 2018 @ 4:04 pm GMT Nov 12,2018 | Reply

    • David,

      My view would be that using a natural key is almost always desirable. The fact that join elimination works at all is a bonus, and the fact that it doesn’t always work unless you tweak the SQL (at present) is a minor detail.

      I can’t find an example of a foreign key changing an execution plan in my library, but that might be a failure to do the right search rather than the absence of an example. If I’d written up an example it would probably have been from a version prior to table elimination i.e. a long time ago.

      Comment by Jonathan Lewis — November 12, 2018 @ 9:20 pm GMT Nov 12,2018 | Reply

  5. […] of the way the optimizer works with simple query blocks, in the other case the note was about an anomaly with table elimination that could appear with both “ANSI” and “traditional” Oracle […]

    Pingback by Table order | Oracle Scratchpad — November 19, 2018 @ 1:30 pm GMT Nov 19,2018 | Reply

  6. […] multi-column join elimination is supported, as Jonathan Lewis wrote 3 years ago in his blog post Join Elimination 12.2. The benefit of this feature for Data Vault I want to explain in this blog post in […]

    Pingback by Data Vault Queries and Join Elimination | Data Warehousing with Oracle — December 31, 2019 @ 12:04 pm GMT Dec 31,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.