Oracle Scratchpad

December 13, 2006

Cartesian Merge Join

Filed under: CBO, Execution plans, Troubleshooting — Jonathan Lewis @ 8:42 pm UTC Dec 13,2006

Have you ever had an execution plan which gave you a Cartesian join that you knew just couldn’t be happening ?

Surely you put a join condition in for every table ! Well maybe you did, and maybe the optimizer took some of your join conditions away. When the (not always) terrible Cartesian join appears, make sure you check the predicates section of your execution plan.  Here’s a demonstration of why:


create table t1 as
select
        trunc((rownum-1)/15)	n1,
        trunc((rownum-1)/15)	n2,
        rpad(rownum,215)	v1
from    all_objects
where   rownum <= 3000
;          

create index t_i1 on t1(n1);

After gathering stats on the table and index, look at what 9i gives you for the execution plan of the following query (using dbms_xplan.display):


select
        tA.n2,
        tB.n2
from
        t1	tA,
        t1	tB
where
        tA.n1 = 15
and	tB.n1 = tA.n1
;           

/-----------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |   225 |  3600 |    32 |
|   1 |  MERGE JOIN CARTESIAN         |             |   225 |  3600 |    32 |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1          |    15 |   120 |     2 |
|*  3 |    INDEX RANGE SCAN           | T_I1        |    15 |       |     1 |
|   4 |   BUFFER SORT                 |             |    15 |   120 |    30 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1          |    15 |   120 |     2 |
|*  6 |     INDEX RANGE SCAN          | T_I1        |    15 |       |     1 |
-----------------------------------------------------------------------------           

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TA"."N1"=15)
   6 - access("TB"."N1"=15)

Note how the join predicate has disappeared, and we are doing a (perfectly reasonable) Cartesian merge join. The problem is transtive closure.

If ta.n1 = 15, and tb.n1 = ta.n1, then the optimizer can infer that tb.n1 = 15. And in 9i it throws away the middle (join) step, leaving you with the predicates listed above. (Note - there are cases where the join cannot be thrown away, so the behaviour is not entirely consistent).

 WHen you get to 10g, things are different. There is a (hidden) parameter _optimizer_transitivity_retain with the description: “retain equi-join pred upon transitive equality pred generation”, and this is set to true. As a consequence, this is the execution plan I got from 10g (autotrace in this case).


---------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |   225 |  3600 |     6 |
|*  1 |  HASH JOIN                   |      |   225 |  3600 |     6 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |    15 |   120 |     2 |
|*  3 |    INDEX RANGE SCAN          | T_I1 |    15 |       |     1 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1   |    15 |   120 |     2 |
|*  5 |    INDEX RANGE SCAN          | T_I1 |    15 |       |     1 |
---------------------------------------------------------------------        

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TB"."N1"="TA"."N1")
   3 - access("TA"."N1"=15)
   5 - access("TB"."N1"=15)

In this case, we have an improvement - the join condition has been kept so Oracle is able to do a much more efficient hash join. Bear in mind, though, you win some, you lose some. When Oracle generates a new predicate and keeps the join predicate it may, in effect, double-count the impact of the generated predicate - leaving you with a join cardinality that is much too low and leading to an inappropriate execution plan from that point onwards.

Footnote:  You’ll notice the “buffer sort” that appeared in the 9i Cartesian join. Since Oracle is joining every row to every row, a sort isn’t really necessary; but I think Oracle is taking advantage of the buffer sort mechanism to copy the second row source out of its data buffers so that the run-time engine doesn’t have to keep hitting latches and re-visiting data buffers for the second row source as it processes each row from the first row source.

Despite this optimisation, the arithmetic for the cost still seems to be the traditional nested loop calculation: cost of 1st rowsource + (cardinality of 1st rowsource * cost of 2nd rowsource).

26 Comments »

  1. Jonathan,
    I was wondering if event 10076 would help in the first case.On 9206 though I was unable to see the MERGE JOIN CARTESIAN and I see the 10g plan. Which 9i version are you checking this against

    Thanks
    Fairlie

    Comment by Fairlie Rego — December 14, 2006 @ 12:43 am UTC Dec 14,2006

  2. Fairlie, I’m using 9.2.0.6 and 9.2.0.8

    Event 10076: “CBO Enable cartesian product join costing” changes the arithmetic for the cost so that it switches from the nested loop costing method to the merge join costing method (in my case dropping to 19). The merge cost - for very small merges - being “cost of getting first sorted data set” + “cost of getting second sorted data set” + “a little bit”.

    Event 10179: “CBO turn off transitive predicate replacement” is the one that takes this example back to a hash join, generating the extra predicate by transitive closure but leaving the join predicate in place.

    However, I would guess that your system has parameter query_rewrite_enabled set to true - which also changes the optimizer’s treatment of this case in 9i. (It’s mentioned in my book, chapter 6: page 144, in fact).

    Comment by Jonathan Lewis — December 14, 2006 @ 8:03 am UTC Dec 14,2006

  3. Thanks, Jonathan.

    Comment by Fairlie — December 14, 2006 @ 10:50 am UTC Dec 14,2006

  4. I still remember the problem which I faced due to transitive closure behavior when I moved (migrated) data warehouse db from AIX to HP-Superdom, version 9205.
    You (Jonathan) hinted me to look for the predicators of the query. I have discussed this on my blog, ‘transitive closure behavior when query_rewrite_enabled parameter set to true/false’.

    Thanks once again Jonathan for bringing to our view of 10g transitive closure behavior changes.

    Jaffar

    Comment by Syed Jaffar Hussain — December 14, 2006 @ 2:04 pm UTC Dec 14,2006

  5. Could it be that the “BUFFER SORT” is really, internally, a NOSORT operation, even if not reflected in the plan ?

    Comment by Alberto Dell'Era — December 14, 2006 @ 10:29 pm UTC Dec 14,2006

  6. Alberto, I don’t think so. To me the NOSORT operation is intend to show that the data is ordered as if sorted, even when the sorting mechanism has not been invoked. I haven’t checked it, but I’ll bet you don’t get a 10032 output for a sort (order by nosort) operation.
    On the other hand, the buffer sort seems to invoke the sort operation (with a special flag, I guess) but leaves the data unsorted.

    Comment by Jonathan Lewis — December 15, 2006 @ 7:43 am UTC Dec 15,2006

  7. [...] In an earlier article I mentioned the buffer sort in a footnote; I thought I would expand a little more on what I think it does and why it appears as a buffer sort in an execution plan rather than the more traditional sort (join). [...]

    Pingback by Buffer Sorts « Oracle Scratchpad — December 18, 2006 @ 11:01 pm UTC Dec 18,2006

  8. [...] Constraints Filed under: Infrastructure, Execution plans, Cost Based Optimizer — Jonathan Lewis @ 8:17 am UTC Dec 21,2006 A little while ago I discussed one side-effect of transitive closure and predicate generation. Coincidentally, David Aldridge and Jeff Hunter have both come up with further examples of predicate generation - in their examples generated from check constraints - that can cause problems. [...]

    Pingback by Constraints « Oracle Scratchpad — December 30, 2006 @ 8:10 pm UTC Dec 30,2006

  9. [...] Transitive Closure Filed under: Uncategorized, Tuning, Execution plans, CBO — Jonathan Lewis @ 6:52 pm UTC Jan 1,2007 I have mentioned transitive closure in the past, and the comments on this blog item go into some depth about some of the effects of transitive closure.  However, a recent incoming link has prompted me to point out a couple of further details about the appearance (or non-appearance) of transitive closure. [...]

    Pingback by Transitive Closure « Oracle Scratchpad — January 1, 2007 @ 6:52 pm UTC Jan 1,2007

  10. Hello Mr Lewis
    After a database upgrade from 9.2 to Oracle 10.2, we came across the opposite problem - in Oracle 9i, we had nested loop join but in 10G we are having a costly Merge Join Cartesian. By supplying hints like /*+ ordered index() */ managed to get a better explain plan having a combination of Nested Loops & Hash Join and sql performed well. Also tried without hints when just changed an equality join condition with an outer join and it produced a good explain plan with Nested Loop Outer and sql performed well. My question why is this change in behaviour in Oracle 10G because our developers are worried that the whole application has to go through full regression testing and also Oracle 9iAS Forms have dynamic sqls. DBMS_STATS was run with estimate_percent=20, method_opt=’for all indexed columns size 1′, granularity=’all’. The largest table in join is partitioned.

    Comment by TJ Mitra — March 15, 2007 @ 12:35 am UTC Mar 15,2007

  11. Also, I wish to mention another thing which you have highlighted at the very beginning. For explain plan with Nested Loop Outer, I see in the Predicate information of the Plan Table Output the join condition appearing as access(”MC.DISTRICT_ID”=”D”.”DISTRICT_ID(+)) which disappears in the explain plan with Cartesian Merge Join and it appears as filter
    filter(”MC.DISTRICT_ID”=”D”.”DISTRICT_ID)

    Comment by TJ Mitra — March 15, 2007 @ 6:17 am UTC Mar 15,2007

  12. TJ Mitra, I can’t answer your question - there are too many unknowns - but I would guess that the join cardinalities have changed and the join order may have changed, leading to an odd join option.

    Your developers should be worried - the whole application really ought to go through a full regression test before upgrading from 9i to 10g. But you might be able to avoid this by setting optimizer_features_enable to 9.2.0 and only using the upgrade for code-path improvements whilst avoiding optimizer changes.

    Comment by Jonathan Lewis — March 15, 2007 @ 9:09 pm UTC Mar 15,2007

  13. [...] point that I alway stress in my tutorial on explain plan, and which I also made in a recent blog entry: ” make sure you check the predicates section of your execution plan”. It’s [...]

    Pingback by ORA-01722: upgrade error « Oracle Scratchpad — April 9, 2007 @ 9:01 am UTC Apr 9,2007

  14. Hi Jonathan,

    Merge Cartesian Join issue is coming in 9i intermittently. So how to override this operation. Which hint I need to use so that I can avoid this Merge Cartesian join.

    Thanks
    Dee

    Comment by Deepank — September 17, 2007 @ 7:44 am UTC Sep 17,2007

  15. Dee,

    There is no silver bullet for this issue. It’s the way the optimizer happens to operate, and sometimes it does it in inappropriate situations - perhaps in association with a statistics issue. If you can identify the few statements that are going wrong one possible workaround is to change the join predicate that is disappearing from something like:
            t2.colX = t1.colY
    to something like
            t2.colX = t1.colY + 0

    Obviously you will need to work out if there are any side effects of this code change - and you will probably want to remove the hack on the upgrade to 10g.

    Comment by Jonathan Lewis — September 17, 2007 @ 10:58 am UTC Sep 17,2007

  16. Thanks Jonathan for you feedback. I will try that. One thing more when this issue occurred in our database. I dropped and re-created one of the table on which Cartesian-Merge happening and the query started taking normal execution plan with Hash Join which its taking earlier. I could drop this table because it was staging/temporary kind of table. I don’t know how it resolved this issue.

    I will send test cases also if you want to have a look.

    Comment by Deepank — September 19, 2007 @ 9:23 am UTC Sep 19,2007

  17. Deepank,

    Dropping and recreating a table will probably change the statistics - which can lead to changes in execution plan. One possibility is that the statistics before the rebuild gave the optimizer an estimated cardinality of one from that table - and a Cartesian merge join against a single row is not a parformance issue.

    Bear in mind that a Cartesian merge join is not a problem anyway if both the data set are very small - and what counts is the optimizer estimate of the data size, not what’s really there.

    Comment by Jonathan Lewis — September 19, 2007 @ 11:36 am UTC Sep 19,2007

  18. Yeah changing the statistics might have done the trick. In my case one table is small and other table is really big containing several millions rows.

    Comment by Deepank — September 19, 2007 @ 12:14 pm UTC Sep 19,2007

  19. Could set _optimizer_cartesian_enabled to false to avoid Cartesian Merge Join? Are there any side effects for this setting?
    In our database (10.2.0.2.0), few statements sometimes go wrong with Cartesian-Merge instead of taking normal execution plan with Hash Join. The tables involved in the statements are on commit delete rows temporary tables.

    Comment by Ying — October 11, 2007 @ 7:16 am UTC Oct 11,2007

  20. As a general rule I don’t fiddle with parameters to fix SQL statements - unless there’s a basic configuration issue and lots of statements are performing badly as a consequence.

    In your case, I would try to understand why GTTs (global temporary tables) are associated with inappropriate Cartesian merge joins. As a suggestion, it’s probably because a cardinality estimate comes up with the value 1 for the number of rows in one of the tables involved - at which point a Cartesian Merge Join is often a good idea.

    Perhaps you need to adopt a strategy of using dbms_stats.set_table_stats etc. to create some ‘representative’ statistics on the GTT definition.

    Comment by Jonathan Lewis — October 12, 2007 @ 9:44 pm UTC Oct 12,2007

  21. Thanks much Jonathan for your feedback. Set table statistics on GTT will make Cartesian Merge join go away. Our application are heavily using GTT and relying on dynamic sampling to generate accurate statistics. We are trying to understand what the root cause of the issue is.

    Yeah in all cases when Cartesian join is involved, the GTT table cardinality showed 1. We don’t think value 1 is real number of rows in GTT table. But what make optimizer comes with cardinality estimate 1? Is that because the data in GTT has less than certain blocks? We noticed that dynamic sampling being used.

    We are implementing DIY parallelism using scheduler for the batch. Perhaps Cartesian Merge join is good for some jobs (sessions), but it becomes more expensive and inappropriate when the data volume in GTT increased. We notice dynamic sampling is not taking for the consecutive jobs and execution plan is not changed (Cartesian Merge join is used and cardinality estimate for the GTT is 1) even significantly larger numbers of records are used in GTT. So some jobs completed quickly, while others run forever until eating up all temp space and failed with ORA-01652. Try to seek if a plan can be recalculated when the data volume in GTT has changed?

    Comment by Ying — October 15, 2007 @ 6:34 pm UTC Oct 15,2007

  22. Hello Jonathan:

    In one of my system, I try to run ASH report but it is doing so many long operations. Then I got 10053 trace and check what it is doing. I am never able to receive ASH report. I found it is doing lot of MERGE JOIN CARTES
    IAN. Is there any way to avoid MERGE JOIN CARTESIAN in 10g ? Please provide any clue.

    Thanks

    ~ Keyur

    Comment by Keyur — December 11, 2007 @ 10:43 pm UTC Dec 11,2007

  23. Keyur,

    I haven’t seen this problem happening - but taking a clue from Ying in comment #21, perhaps the problem is occurring because of a statistics issue. Maybe you need to investigate the option for collecting statistics on the “hidden tables”.

    Before you do that, though, you might check that you haven’t set some non-standard optimizer parameters in a way that restricts the optimizer’s ability to do something sensible. You can always use a call to ‘alter session’ before you run the ASH report if you need some special setting for other parts of your system.

    If you think that block cartesian merge joins may help, there seem to be two related parameters that you might want to experiment with at the session level:

    _optimizer_cartesian_enabled
    _optimizer_mjc_enabled

    Comment by Jonathan Lewis — December 12, 2007 @ 10:43 am UTC Dec 12,2007

  24. [...] popular page, with 6,100 views, is : Cartesian Merge Join  pushing July’s most popular posting, Bind Variables, into second place with 5,900 [...]

    Pingback by Targets « Oracle Scratchpad — January 11, 2008 @ 10:41 pm UTC Jan 11,2008

  25. Hi Jonathan,

    We have several environments which we are getting no optimal execution plans with MERGE JOIN CARTESIAN.

    Do you know if there is a similar parameter to ‘_optimizer_transitivity_retain ‘ for Oracle9i.

    This is an typical example the execution plans we are getting:

    Paso Número Nombre del Paso
    14 SELECT STATEMENT
    13 NESTED LOOPS
    10 NESTED LOOPS
    7 MERGE JOIN [CARTESIAN]
    4 . VIEW
    3 HASH JOIN
    1 CISADM.XT265S2 INDEX [FAST FULL SCAN]
    2 CISADM.XT265S4 INDEX [FAST FULL SCAN]
    6 BUFFER [SORT]
    5 CISADM.XC029S2 INDEX [FULL SCAN]
    9 CISADM.CI_APAY_CLR_STG TABLE ACCESS [BY INDEX ROWID]
    8 CISADM.XT003S2 INDEX [RANGE SCAN]
    12 CISADM.CI_ACCT_APAY TABLE ACCESS [BY INDEX ROWID]
    11 CISADM.XT002P0 INDEX [UNIQUE SCAN]

    Thank you very much in advance
    Santi

    Comment by Santi — May 14, 2008 @ 3:07 pm UTC May 14,2008

  26. Santi,
    I don’t know of any such parameter, but the cartesian merge join can appear when the two data sets are very small - so perhaps you need to check whether you are seeing a cardinality problem, rather than a problem caused by predicate changes.

    Check the rest of the blog for any comments about using dbms_xplan to generate plans with predicates.

    Comment by Jonathan Lewis — May 14, 2008 @ 4:17 pm UTC May 14,2008

RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.