Oracle Scratchpad

February 11, 2010

Ignoring Hints – 2

Filed under: CBO,Execution plans,Hints,Ignoring Hints — Jonathan Lewis @ 7:28 pm GMT Feb 11,2010

Here’s a little puzzle that someone sent to me a couple of days ago – it’s a case where the optimizer seems to be ignoring a hint.

create table t1
as
select
	rownum id,
	object_name name
from
	all_objects
where
	rownum <= 1000
;

create table t2
as
select
	mod(rownum,20)+1 id
from
	all_objects
where
	rownum <= 1000
;

alter table t1 add constraint t1_pk primary key(id);
create index ind_t2 on t2(id);

-- gather stats at this point

set autotrace traceonly explain

select
	/*+ ordered use_hash(t2) */ *
from
	t1, t2
where
	t1.id = t2.id
and	t1.id = 1
;

select
	/*+ ordered use_hash(t2) */ *
from
	t1, t2
where
	t1.id = t2.id
and	t2.id = 1
;

set autotrace off

Notice that the only difference between these two queries is the filter predicate: t1.id = 1 in the first statement and t2.id = 1 in the second. Given the fact that the join predicate is then: t1.id = t2.id, these two queries are logically identical. So why does one of them “ignore” the use_hash() hint and do a nested loop ? Remember: the optimizer does not ignore hints (with the usual caveats about legality, spelling, syntax and bugs).

Here are the two execution plans:

------------------------------------------------+-----------------------------------+
| Id  | Operation                     | Name    | Rows  | Bytes | Cost  | Time      |
------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT              |         |       |       |     3 |           |
| 1   |  NESTED LOOPS                 |         |    50 |  1050 |     3 |  00:00:01 |
| 2   |   TABLE ACCESS BY INDEX ROWID | T1      |     1 |    18 |     2 |  00:00:01 |
| 3   |    INDEX UNIQUE SCAN          | T1_PK   |     1 |       |     1 |  00:00:01 |
| 4   |   INDEX RANGE SCAN            | IND_T2  |    50 |   150 |     1 |  00:00:01 |
------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
3 - access("T1"."ID"=1)
4 - access("T2"."ID"=1)

------------------------------------------------+-----------------------------------+
| Id  | Operation                     | Name    | Rows  | Bytes | Cost  | Time      |
------------------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT              |         |       |       |     4 |           |
| 1   |  HASH JOIN                    |         |    50 |  1050 |     4 |  00:00:01 |
| 2   |   TABLE ACCESS BY INDEX ROWID | T1      |     1 |    18 |     2 |  00:00:01 |
| 3   |    INDEX UNIQUE SCAN          | T1_PK   |     1 |       |     1 |  00:00:01 |
| 4   |   INDEX RANGE SCAN            | IND_T2  |    50 |   150 |     1 |  00:00:01 |
------------------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - access("T1"."ID"="T2"."ID")
3 - access("T1"."ID"=1)
4 - access("T2"."ID"=1)

Have you spotted the critical clue ? It’s in the predicate sections.

Notice how the first plan has two predicates and the second plan has three – and the “missing” predicate is the join predicate. You’re looking at an example of transitive closure:  if t1.id1 = 1 and t2.id1 = t1.id1 then t2.id1 = 1 So Oracle has taken the predicate from one table and copied to the other table – but in just one of the cases it has also eliminated the join. Because there is no longer a join predicate it’s not possible for the optimizer to do a hash join (in earlier versions of Oracle you might have seen a cartesian merge join at this point) so the use_hash() hint has to be ignored – it’s out of context.

The question, though, is why are the two cases not symmetrical – the queries are logically equivalent (and to the human eye very obviously so) so they should have been transformed in the same way. My best guess is that the optimzer checks the significance of the existing predicate as it generates the  new predicate – and if it sees that it’s guaranteed to supply a single row it discards the join condition. But when it’s working the other way round (i.e. starting with the non-unique predicate on t2) it doesn’t check the significance of the predicates that it has generated.

It’s an odd little detail, but I can’t think of  a case where it will have any noticeable impact on performance – though someone may prove me wrong on that.

Footnote:

I’ve been reminded by a couple of the comments that I should have mentioned that I ran this example on 10.2.0.3. Others have now run it on other versions of Oracle. If you go back to 9i you’ll see that the join predicate disappears in both cases and 9i will do a cartesian merge join).

[Further reading on “ignoring hints”]

20 Comments »

  1. Oracle version 10?

    Another point for having the right version of the explain-plan-table.

    Comment by lascoltodelvenerdi — February 12, 2010 @ 10:11 am GMT Feb 12,2010 | Reply

  2. I tested this against 10.2.0.4 and found working.

    Comment by Asif Momen — February 12, 2010 @ 11:15 am GMT Feb 12,2010 | Reply

  3. Hi Nice findings,

    Why can’t you ask ASKTom regarding this, as he will be able to answer this. This is an interesting find as it can change the execution plan in some kind of cases where we chose which is the first filtering condition

    Thanks,
    Sarath

    Comment by Sarath — February 12, 2010 @ 1:31 pm GMT Feb 12,2010 | Reply

  4. I agree with your results for Oracle Versions 10.2.0.3 and 11.1.0.6.

    Comment by Tony Sleight — February 12, 2010 @ 1:46 pm GMT Feb 12,2010 | Reply

  5. I ran the test on Oracle Database 11.2.0.1 and received the same results as those posted above. I also generated a 10053 trace in the script, and the resulting trace file possibly shows what is happening.

    For the first query:

    ...
    **************************
    Predicate Move-Around (PM)
    **************************
    PM:     PM bypassed: Outer query contains no views.
    PM:     PM bypassed: Outer query contains no views.
    query block SEL$1 (#0) unchanged
    FPD: Considering simple filter push in query block SEL$1 (#0)
    "T1"."ID"="T2"."ID" AND "T1"."ID"=1
    try to generate transitive predicate from check constraints for query block SEL$1 (#0)
    finally: "T2"."ID"=1 AND "T1"."ID"=1
    ...
    

    For the second query:

    ...
    **************************
    Predicate Move-Around (PM)
    **************************
    PM:     PM bypassed: Outer query contains no views.
    PM:     PM bypassed: Outer query contains no views.
    query block SEL$1 (#0) unchanged
    FPD: Considering simple filter push in query block SEL$1 (#0)
    "T1"."ID"="T2"."ID" AND "T2"."ID"=1
    try to generate transitive predicate from check constraints for query block SEL$1 (#0)
    finally: "T1"."ID"="T2"."ID" AND "T2"."ID"=1 AND "T1"."ID"=1
     
    FPD:   transitive predicates are generated in query block SEL$1 (#0)
    "T1"."ID"="T2"."ID" AND "T2"."ID"=1 AND "T1"."ID"=1
    ...
    

    From the above, I would suggest that the optimizer checks for the presence of a unique key constraint on the predicates already in the WHERE clause to see if the join may be removed before the optimizer generates the “T2”.”ID”=1 predicate, although more testing might be needed.

    What is possibly interesting is that in both cases, the 10053 trace shows that both hints are actually used.

    For the first query:

    Dumping Hints
    =============
      atom_hint=(@=0x2b32e92e60f8 err=0 resol=1 used=1 token=807 org=1 lvl=2 txt=ORDERED )
      atom_hint=(@=0x2b32e92e5dc8 err=0 resol=1 used=1 token=922 org=1 lvl=3 txt=USE_HASH ("T2") )
    ====================== END SQL Statement Dump ======================
    

    For the second query:

    Dumping Hints
    =============
      atom_hint=(@=0x2b32e92e60f8 err=0 resol=1 used=1 token=807 org=1 lvl=2 txt=ORDERED )
      atom_hint=(@=0x2b32e92e5dc8 err=0 resol=1 used=1 token=922 org=1 lvl=3 txt=USE_HASH ("T2") )
    ====================== END SQL Statement Dump ======================
    

    Comment by Charles Hooper — February 12, 2010 @ 11:25 pm GMT Feb 12,2010 | Reply

  6. Interesting test case, incidentally I’ve done some research on the subject recently too.

    This transitivity knowledge “retention” is controlled by _optimizer_transitivity_retain parameter, which changed to true in 10.2.0.1.

    Even in past Oracle versions, when you see MERGE JOIN CARTESIAN, the join is really just a NESTED LOOP join as cartesiam merge join is internally implemented as nested loop join without filtering.

    I guess the human-readable execution plan text generation functions fail to recognize it sometimes…

    Comment by Tanel Poder — February 13, 2010 @ 10:07 am GMT Feb 13,2010 | Reply

    • Tanel,

      “the join is really just a NESTED LOOP join as cartesiam merge join is internally implemented as nested loop join without filtering”

      I’ve got a draft note pointing out that all three join mechanisms are basically nested loop joins – except the merge and hash joins have different strategies for moving data from the buffer cache to local memory (or temp) before starting the loop.

      Even so, there is an important difference between the (Cartesian) Merge and the Nested Loop – the second data set is always copied into a local buffer by the “buffer sort” operation before the “nested loop” bit begins.

      It seems that Oracle from 10g2 recognises that this copy is a waste of resources when there is only one row in the first data set – and so switches to a “non-copied” nested loop.

      There’s a link to a couple of articles on the buffer sort in the article I linked to above about Cartesian Merge joins.

      Comment by Jonathan Lewis — February 14, 2010 @ 6:28 pm GMT Feb 14,2010 | Reply

      • Cool!

        Where I was getting at is that the same rowsource function (qerjotFetch) is used for both driving a nested loop join and a cartesian merge join, while hash join has its of set of functions (starting with qerhj). Of course what happens in the child rowsources under the join branch in the exec plan tree also makes a difference (like the buffering you mentioned etc).

        Comment by Tanel Poder — February 15, 2010 @ 8:03 am GMT Feb 15,2010 | Reply

  7. Thank you, Jonathan & Tanel, both. You have, in 7 sentences, crushed my Oracle knowledge ego like an 18-wheel, fully-laden, interstate juggernaut driving over a chicken egg, with horn blaring and CB radio gobbledygook carried in your “We Brake for Nobody” wake.

    Comment by Nigel — February 15, 2010 @ 2:13 pm GMT Feb 15,2010 | Reply

  8. […] Jonathan Lewis-Ignoring Hints-2 […]

    Pingback by Blogroll Report 05/02/2009 – 12/02/2010 « Coskan’s Approach to Oracle — March 10, 2010 @ 2:28 am GMT Mar 10,2010 | Reply

  9. […] a 10053 trace might help solve part of the mystery.  An interesting comment by Tanel Poder in this blog article suggests that a MERGE JOIN CARTESIAN is similar to a NESTED LOOP operation, just without filtering, […]

    Pingback by Improving Performance by Using a Cartesian Join 2 « Charles Hooper's Oracle Notes — May 18, 2010 @ 12:11 pm BST May 18,2010 | Reply

  10. Jonathan,
    I was going through your blog by clicking the Random Page, then i landed on this article and thought of giving it a try on my Oracle versions and found the results not at all resembling to that of yours.
    In 10.2.0.1.0 – NESTED LOOPs and TABLE ACCESS FULL for t1 not using the index at although i have created the primary key and collected the stats. and the join filter is also not there in the plan. and it is the same for both the runs.
    In 11.2.0.1.0 and 12.1.0.1.0 – MERGE JOIN CARTESIAN and TABLE ACCESS FULL for t1 and INDEX RANGE SCAN for t2’s index followed by BUFFER short and the join filter is also not there in the plan. and it is the same for both the runs
    I have checked the plans using the display_cursor with ALLSTATS option. I wonder why my results not matching with yours. first thought is that i am using completely different versions than of yours. But still i am i thinking to me somewhere near to yours result but i am far-off do not know why. Secondly why the index is not getting used for t1. I tried using the hint also by “index(t1 t1_pk)” but to no avail. I have collected the statistics also before running the query as you have mentioned.
    Thanks and appreciate your contribution towards Oracle Community

    Comment by jagdeepsangwan — January 24, 2014 @ 11:31 am GMT Jan 24,2014 | Reply

    • Jagdeep,

      I don’t have all your versions available to test, but I’ve just tried 10.2.0.5, 11.2.0.4, and 12.1.0.1 and they all behave as shown in the example.

      I’d probably have to look at the 10053 traces from your tests to find out why your results are different (but please don’t post them), however, a starting guess would be that you have a parameter setting that affects the transitive closure effects. The simplest guess I can make is that your optimizer_features_enable set to some earlier version – if that’s the case then using the ‘outline’ option with dbms_xplan would show it – it’s possible some other parameters would show up in the same way.

      Comment by Jonathan Lewis — January 24, 2014 @ 12:16 pm GMT Jan 24,2014 | Reply

      • Jonathan,
        Sorry for the delay in reply, i was not having access to the internet for sometime at home due to some technical issues. I will check/try your suggestion so that my tests resembles yours. to be honest i have not till tried 10053 traces, i should now be looking into that now. and i will check your other suggestions also
        Thanks again for quick reply

        Comment by jagdeepsangwan — January 25, 2014 @ 7:29 am GMT Jan 25,2014 | Reply

        • Jonathan,
          I tried your suggestion of checking ‘outline’ using dbms_xplan to check optimizer_features_enable but was not going nowhere to get the results similar to you although OFE was set to current version only. And i thought to start from the scratch and look, and then i found that the tables i created with a prefix of “jt_” (to identify that these are the table that i have created and for maintaining my db).
          In the query while hinting i was the using table name with the prefix while i was using the aliases below in the query. after providing the alias in hints i got the results which are same as that of yours.
          Lesson Learned: A little discrepancy/mistake can make your query plan/execution a lot different
          Note: I have checked this on 10.2.0.1.0 only as this is the only version on my personal laptop at home.
          Thanks again and appreciate you for taking time out to look into this.

          Comment by jagdeepsangwan — January 25, 2014 @ 10:57 am GMT Jan 25,2014

        • Jagdeep,

          It’s amazing how easy it is to miss a small text error when thinking about (what appears to be) a large logical error.

          Comment by Jonathan Lewis — January 26, 2014 @ 12:14 pm GMT Jan 26,2014

    • Jagdeep,

      It’s nice to hear that someone’s using the random page link, by the way. I do it myself from time to time and surprise myself with what I rediscover.

      Comment by Jonathan Lewis — January 24, 2014 @ 12:18 pm GMT Jan 24,2014 | Reply

      • Jonathan,
        i use the random page link to surprise myself what i have been missing about Oracle processing, untill i do not have some specific plan to read/learn a thing
        I am randomizing now :)

        Comment by jagdeepsangwan — January 25, 2014 @ 7:33 am GMT Jan 25,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.