Cartesian Merge Join

December 13, 2006

Cartesian Merge Join

Filed under: CBO,Execution plans,Troubleshooting — Jonathan Lewis @ 8:42 pm GMT Dec 13,2006

Have you ever had an execution plan which gave you a Cartesian join that you knew just couldn’t be happening ?

Surely you included a join condition for every table! Well maybe you did and maybe the optimizer took some of your join conditions away. When the (not always) terrible Cartesian join appears, make sure you check the predicates section of your execution plan. Here’s a demonstration of why:

create table t1 as
select
        trunc((rownum-1)/15)	n1,
        trunc((rownum-1)/15)	n2,
        rpad(rownum,215)	v1
from    all_objects
where   rownum <= 3000    -- > comment to avoid wordpress format issue
;          

create index t_i1 on t1(n1);

After gathering stats on the table and index, look at what 9i gives you for the execution plan of the following query (using dbms_xplan.display):

select
        tA.n2,
        tB.n2
from
        t1	tA,
        t1	tB
where
        tA.n1 = 15
and     tB.n1 = tA.n1
;           

------------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |   225 |  3600 |    32 |
|   1 |  MERGE JOIN CARTESIAN         |             |   225 |  3600 |    32 |
|   2 |   TABLE ACCESS BY INDEX ROWID | T1          |    15 |   120 |     2 |
|*  3 |    INDEX RANGE SCAN           | T_I1        |    15 |       |     1 |
|   4 |   BUFFER SORT                 |             |    15 |   120 |    30 |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1          |    15 |   120 |     2 |
|*  6 |     INDEX RANGE SCAN          | T_I1        |    15 |       |     1 |
-----------------------------------------------------------------------------           

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("TA"."N1"=15)
   6 - access("TB"."N1"=15)

Note how the join predicate has disappeared, and we are doing a (perfectly reasonable) Cartesian merge join. The problem is transtive closure.

If ta.n1 = 15, and tb.n1 = ta.n1, then the optimizer can infer that tb.n1 = 15. And in 9i it throws away the middle (join) step, leaving you with the predicates listed above. (Note – there are cases where the join cannot be thrown away, so the behaviour is not entirely consistent).

WHen you get to 10g, things are different. There is a (hidden) parameter _optimizer_transitivity_retain with the description: “retain equi-join pred upon transitive equality pred generation”, and this is set to true. As a consequence, this is the execution plan I got from 10g (autotrace in this case).

---------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost  |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |   225 |  3600 |     6 |
|*  1 |  HASH JOIN                   |      |   225 |  3600 |     6 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1   |    15 |   120 |     2 |
|*  3 |    INDEX RANGE SCAN          | T_I1 |    15 |       |     1 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1   |    15 |   120 |     2 |
|*  5 |    INDEX RANGE SCAN          | T_I1 |    15 |       |     1 |
---------------------------------------------------------------------        

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("TB"."N1"="TA"."N1")
   3 - access("TA"."N1"=15)
   5 - access("TB"."N1"=15)

In this case, we have an improvement – the join condition has been kept so Oracle is able to do a much more efficient hash join. Bear in mind, though, you win some, you lose some. When Oracle generates a new predicate and keeps the join predicate it may, in effect, double-count the impact of the generated predicate – leaving you with a join cardinality that is much too low and leading to an inappropriate execution plan from that point onwards.

Footnote:

You’ll notice the “buffer sort” that appeared in the 9i Cartesian join. Since Oracle is joining every row to every row, a sort isn’t really necessary; but I think Oracle is taking advantage of the buffer sort mechanism to copy the second row source out of its data buffers so that the run-time engine doesn’t have to keep hitting latches and re-visiting data buffers for the second row source as it processes each row from the first row source.

Despite this optimisation, the arithmetic for the cost still seems to be the traditional nested loop calculation: cost of 1st rowsource + (cardinality of 1st rowsource * cost of 2nd rowsource).

[More on Buffer Sorts]

Comments (53)

53 Comments »

Jonathan,
I was wondering if event 10076 would help in the first case.On 9206 though I was unable to see the MERGE JOIN CARTESIAN and I see the 10g plan. Which 9i version are you checking this against

Thanks
Fairlie

Comment by Fairlie Rego — December 14, 2006 @ 12:43 am GMT Dec 14,2006 | Reply
Fairlie, I’m using 9.2.0.6 and 9.2.0.8

Event 10076: “CBO Enable cartesian product join costing” changes the arithmetic for the cost so that it switches from the nested loop costing method to the merge join costing method (in my case dropping to 19). The merge cost – for very small merges – being “cost of getting first sorted data set” + “cost of getting second sorted data set” + “a little bit”.

Event 10179: “CBO turn off transitive predicate replacement” is the one that takes this example back to a hash join, generating the extra predicate by transitive closure but leaving the join predicate in place.

However, I would guess that your system has parameter query_rewrite_enabled set to true – which also changes the optimizer’s treatment of this case in 9i. (It’s mentioned in my book, chapter 6: page 144, in fact).

Comment by Jonathan Lewis — December 14, 2006 @ 8:03 am GMT Dec 14,2006 | Reply
Thanks, Jonathan.

Comment by Fairlie — December 14, 2006 @ 10:50 am GMT Dec 14,2006 | Reply
I still remember the problem which I faced due to transitive closure behavior when I moved (migrated) data warehouse db from AIX to HP-Superdom, version 9205.
You (Jonathan) hinted me to look for the predicators of the query. I have discussed this on my blog, ‘transitive closure behavior when query_rewrite_enabled parameter set to true/false’.

Thanks once again Jonathan for bringing to our view of 10g transitive closure behavior changes.

Jaffar

Comment by Syed Jaffar Hussain — December 14, 2006 @ 2:04 pm GMT Dec 14,2006 | Reply
Could it be that the “BUFFER SORT” is really, internally, a NOSORT operation, even if not reflected in the plan ?

Comment by Alberto Dell'Era — December 14, 2006 @ 10:29 pm GMT Dec 14,2006 | Reply
Alberto, I don’t think so. To me the NOSORT operation is intend to show that the data is ordered as if sorted, even when the sorting mechanism has not been invoked. I haven’t checked it, but I’ll bet you don’t get a 10032 output for a sort (order by nosort) operation.
On the other hand, the buffer sort seems to invoke the sort operation (with a special flag, I guess) but leaves the data unsorted.

Comment by Jonathan Lewis — December 15, 2006 @ 7:43 am GMT Dec 15,2006 | Reply
[…] In an earlier article I mentioned the buffer sort in a footnote; I thought I would expand a little more on what I think it does and why it appears as a buffer sort in an execution plan rather than the more traditional sort (join). […]

Pingback by Buffer Sorts « Oracle Scratchpad — December 18, 2006 @ 11:01 pm GMT Dec 18,2006 | Reply
[…] Constraints Filed under: Infrastructure, Execution plans, Cost Based Optimizer — Jonathan Lewis @ 8:17 am UTC Dec 21,2006 A little while ago I discussed one side-effect of transitive closure and predicate generation. Coincidentally, David Aldridge and Jeff Hunter have both come up with further examples of predicate generation – in their examples generated from check constraints – that can cause problems. […]

Pingback by Constraints « Oracle Scratchpad — December 30, 2006 @ 8:10 pm GMT Dec 30,2006 | Reply
[…] Transitive Closure Filed under: Uncategorized, Tuning, Execution plans, CBO — Jonathan Lewis @ 6:52 pm UTC Jan 1,2007 I have mentioned transitive closure in the past, and the comments on this blog item go into some depth about some of the effects of transitive closure. However, a recent incoming link has prompted me to point out a couple of further details about the appearance (or non-appearance) of transitive closure. […]

Pingback by Transitive Closure « Oracle Scratchpad — January 1, 2007 @ 6:52 pm GMT Jan 1,2007 | Reply
Hello Mr Lewis
After a database upgrade from 9.2 to Oracle 10.2, we came across the opposite problem – in Oracle 9i, we had nested loop join but in 10G we are having a costly Merge Join Cartesian. By supplying hints like /*+ ordered index() */ managed to get a better explain plan having a combination of Nested Loops & Hash Join and sql performed well. Also tried without hints when just changed an equality join condition with an outer join and it produced a good explain plan with Nested Loop Outer and sql performed well. My question why is this change in behaviour in Oracle 10G because our developers are worried that the whole application has to go through full regression testing and also Oracle 9iAS Forms have dynamic sqls. DBMS_STATS was run with estimate_percent=20, method_opt=’for all indexed columns size 1′, granularity=’all’. The largest table in join is partitioned.

Comment by TJ Mitra — March 15, 2007 @ 12:35 am GMT Mar 15,2007 | Reply
Also, I wish to mention another thing which you have highlighted at the very beginning. For explain plan with Nested Loop Outer, I see in the Predicate information of the Plan Table Output the join condition appearing as access(“MC.DISTRICT_ID”=”D”.”DISTRICT_ID(+)) which disappears in the explain plan with Cartesian Merge Join and it appears as filter
filter(“MC.DISTRICT_ID”=”D”.”DISTRICT_ID)

Comment by TJ Mitra — March 15, 2007 @ 6:17 am GMT Mar 15,2007 | Reply
TJ Mitra, I can’t answer your question – there are too many unknowns – but I would guess that the join cardinalities have changed and the join order may have changed, leading to an odd join option.

Your developers should be worried – the whole application really ought to go through a full regression test before upgrading from 9i to 10g. But you might be able to avoid this by setting optimizer_features_enable to 9.2.0 and only using the upgrade for code-path improvements whilst avoiding optimizer changes.

Comment by Jonathan Lewis — March 15, 2007 @ 9:09 pm GMT Mar 15,2007 | Reply
[…] point that I alway stress in my tutorial on explain plan, and which I also made in a recent blog entry: ” make sure you check the predicates section of your execution plan”. It’s […]

Pingback by ORA-01722: upgrade error « Oracle Scratchpad — April 9, 2007 @ 9:01 am BST Apr 9,2007 | Reply
Hi Jonathan,

Merge Cartesian Join issue is coming in 9i intermittently. So how to override this operation. Which hint I need to use so that I can avoid this Merge Cartesian join.

Thanks
Dee

Comment by Deepank — September 17, 2007 @ 7:44 am BST Sep 17,2007 | Reply
Dee,

There is no silver bullet for this issue. It’s the way the optimizer happens to operate, and sometimes it does it in inappropriate situations – perhaps in association with a statistics issue. If you can identify the few statements that are going wrong one possible workaround is to change the join predicate that is disappearing from something like:
t2.colX = t1.colY
to something like
t2.colX = t1.colY + 0

Obviously you will need to work out if there are any side effects of this code change – and you will probably want to remove the hack on the upgrade to 10g.

Comment by Jonathan Lewis — September 17, 2007 @ 10:58 am BST Sep 17,2007 | Reply
Thanks Jonathan for you feedback. I will try that. One thing more when this issue occurred in our database. I dropped and re-created one of the table on which Cartesian-Merge happening and the query started taking normal execution plan with Hash Join which its taking earlier. I could drop this table because it was staging/temporary kind of table. I don’t know how it resolved this issue.

I will send test cases also if you want to have a look.

Comment by Deepank — September 19, 2007 @ 9:23 am BST Sep 19,2007 | Reply
Deepank,

Dropping and recreating a table will probably change the statistics – which can lead to changes in execution plan. One possibility is that the statistics before the rebuild gave the optimizer an estimated cardinality of one from that table – and a Cartesian merge join against a single row is not a parformance issue.

Bear in mind that a Cartesian merge join is not a problem anyway if both the data set are very small – and what counts is the optimizer estimate of the data size, not what’s really there.

Comment by Jonathan Lewis — September 19, 2007 @ 11:36 am BST Sep 19,2007 | Reply
Yeah changing the statistics might have done the trick. In my case one table is small and other table is really big containing several millions rows.

Comment by Deepank — September 19, 2007 @ 12:14 pm BST Sep 19,2007 | Reply
Could set _optimizer_cartesian_enabled to false to avoid Cartesian Merge Join? Are there any side effects for this setting?
In our database (10.2.0.2.0), few statements sometimes go wrong with Cartesian-Merge instead of taking normal execution plan with Hash Join. The tables involved in the statements are on commit delete rows temporary tables.

Comment by Ying — October 11, 2007 @ 7:16 am BST Oct 11,2007 | Reply
As a general rule I don’t fiddle with parameters to fix SQL statements – unless there’s a basic configuration issue and lots of statements are performing badly as a consequence.

In your case, I would try to understand why GTTs (global temporary tables) are associated with inappropriate Cartesian merge joins. As a suggestion, it’s probably because a cardinality estimate comes up with the value 1 for the number of rows in one of the tables involved – at which point a Cartesian Merge Join is often a good idea.

Perhaps you need to adopt a strategy of using dbms_stats.set_table_stats etc. to create some ‘representative’ statistics on the GTT definition.

Comment by Jonathan Lewis — October 12, 2007 @ 9:44 pm BST Oct 12,2007 | Reply
Thanks much Jonathan for your feedback. Set table statistics on GTT will make Cartesian Merge join go away. Our application are heavily using GTT and relying on dynamic sampling to generate accurate statistics. We are trying to understand what the root cause of the issue is.

Yeah in all cases when Cartesian join is involved, the GTT table cardinality showed 1. We don’t think value 1 is real number of rows in GTT table. But what make optimizer comes with cardinality estimate 1? Is that because the data in GTT has less than certain blocks? We noticed that dynamic sampling being used.

We are implementing DIY parallelism using scheduler for the batch. Perhaps Cartesian Merge join is good for some jobs (sessions), but it becomes more expensive and inappropriate when the data volume in GTT increased. We notice dynamic sampling is not taking for the consecutive jobs and execution plan is not changed (Cartesian Merge join is used and cardinality estimate for the GTT is 1) even significantly larger numbers of records are used in GTT. So some jobs completed quickly, while others run forever until eating up all temp space and failed with ORA-01652. Try to seek if a plan can be recalculated when the data volume in GTT has changed?

Comment by Ying — October 15, 2007 @ 6:34 pm BST Oct 15,2007 | Reply
Hello Jonathan:

In one of my system, I try to run ASH report but it is doing so many long operations. Then I got 10053 trace and check what it is doing. I am never able to receive ASH report. I found it is doing lot of MERGE JOIN CARTES
IAN. Is there any way to avoid MERGE JOIN CARTESIAN in 10g ? Please provide any clue.

Thanks

~ Keyur

Comment by Keyur — December 11, 2007 @ 10:43 pm GMT Dec 11,2007 | Reply
Keyur,

I haven’t seen this problem happening – but taking a clue from Ying in comment #21, perhaps the problem is occurring because of a statistics issue. Maybe you need to investigate the option for collecting statistics on the “hidden tables”.

Before you do that, though, you might check that you haven’t set some non-standard optimizer parameters in a way that restricts the optimizer’s ability to do something sensible. You can always use a call to ‘alter session’ before you run the ASH report if you need some special setting for other parts of your system.

If you think that block cartesian merge joins may help, there seem to be two related parameters that you might want to experiment with at the session level:
```
_optimizer_cartesian_enabled
_optimizer_mjc_enabled
```
Comment by Jonathan Lewis — December 12, 2007 @ 10:43 am GMT Dec 12,2007 | Reply
[…] popular page, with 6,100 views, is : Cartesian Merge Join pushing July’s most popular posting, Bind Variables, into second place with 5,900 […]

Pingback by Targets « Oracle Scratchpad — January 11, 2008 @ 10:41 pm GMT Jan 11,2008 | Reply
Hi Jonathan,

We have several environments which we are getting no optimal execution plans with MERGE JOIN CARTESIAN.

Do you know if there is a similar parameter to ‘_optimizer_transitivity_retain ‘ for Oracle9i.

This is an typical example the execution plans we are getting:

Paso Número Nombre del Paso
14 SELECT STATEMENT
13 NESTED LOOPS
10 NESTED LOOPS
7 MERGE JOIN [CARTESIAN]
4 . VIEW
3 HASH JOIN
1 CISADM.XT265S2 INDEX [FAST FULL SCAN]
2 CISADM.XT265S4 INDEX [FAST FULL SCAN]
6 BUFFER [SORT]
5 CISADM.XC029S2 INDEX [FULL SCAN]
9 CISADM.CI_APAY_CLR_STG TABLE ACCESS [BY INDEX ROWID]
8 CISADM.XT003S2 INDEX [RANGE SCAN]
12 CISADM.CI_ACCT_APAY TABLE ACCESS [BY INDEX ROWID]
11 CISADM.XT002P0 INDEX [UNIQUE SCAN]

Thank you very much in advance
Santi

Comment by Santi — May 14, 2008 @ 3:07 pm BST May 14,2008 | Reply
Santi,
I don’t know of any such parameter, but the cartesian merge join can appear when the two data sets are very small – so perhaps you need to check whether you are seeing a cardinality problem, rather than a problem caused by predicate changes.

Check the rest of the blog for any comments about using dbms_xplan to generate plans with predicates.

Comment by Jonathan Lewis — May 14, 2008 @ 4:17 pm BST May 14,2008 | Reply
Why is the hash join “much more efficient” than the Cartesian one? The hash join involves additional work – computing and comparing hash values.

Comment by Todor Botev — September 10, 2008 @ 12:48 pm BST Sep 10,2008 | Reply
Todor,
With the Cartesian join, we have to load the entire second data set into a pga workarea, and then scan it once for every row in the first data set.

This does suggest a increased use of CPU, but you are right to point out that this isn’t necessarily the case. The sizes of the two data sets (and the setting for the pga_aggregate_target) could be a relevant factor in the relative efficiency of the two mechanisms.

Comment by Jonathan Lewis — September 11, 2008 @ 11:30 am BST Sep 11,2008 | Reply
Jonathan,
Thank you for your answer!

My knowledge of the internal mechanics of the hash join comes from your book – I found the explanation there very detailed and understandable. But I suppose I overlook something because I still do not understand:

– performing a hash join, we still need to load one of the data sets in the pga in a form of a hash table (with the extra work of computing the hash values)

– the loaded data set will be scanned for every row of the other data set in a form of walking through the hash chain (with the extra work of comparing every single row).

So where do we save anything?

Comment by Todor Botev — September 11, 2008 @ 12:16 pm BST Sep 11,2008 | Reply
Todor,

Sorry it’s taken so long to reply to this.
The saving is because we DON’T scan the loaded data set.

As we read a row from the probe (second) table, we do some arithmetic to the join columns that let’s us say: if there is a matching row from the build (first) table it will be in cell 99 of the hash table. So we only check any rows we find in the relevant cell, not every row in the table.

Interestingly, you can sometimes run into performance problems (excessive CPU) with hash joins because the distribution of data across the hash table is very uneven and a large number of rows end up in one cell.

Comment by Jonathan Lewis — September 26, 2008 @ 2:06 pm BST Sep 26,2008 | Reply
Hi Jonathan,

As from your last comment regarding probing for hash table. Can you please explain more about what do you mean by cell. As, general understanding is each row for parent table scans whole hash data which is there in memory or temp tablespace for child table. Please correct me if i am wrong on above sentence. Also, can you please give some more information.

Comment by Taral Desai — February 15, 2010 @ 6:11 pm GMT Feb 15,2010 | Reply
- Taral,
  
  Read note 30 – where I point out that we don’t scan the whole hash data set for each row in the parent set.
  
  Think of a hash table as a large set of buckets (and I mean water carrying buckets) arranged in a large square pattern – say 64 buckets in an 8 by 8 square.
  
  To use the hash table you have some piece of arithmetic which says: if I want the number 1,436 it will be in the bucket in row 3 column 8; if I want the number 9,132 it will be in the bucket in row 8 column 2. (That’s all that a “hash function” does.)
  
  So if you have a number I know where to look for a match. And Oracle tries to use a lot of buckets in a very big square – so there aren’t many “wrong” rows in a bucket when you look in it.
  
  Comment by Jonathan Lewis — February 15, 2010 @ 7:59 pm GMT Feb 15,2010 | Reply
I am encountering the CARTESIAN MERGE JOIN on 10.2.0.4 with a query that references an inline view in it’s WHERE clause as. This inline view queries an actual view itself.

select …
where col1 in ()

The inline view can range from 1 to 50 results and the performance is terrible with a few dozen results in the inline view. The optimizer is estimating only 1 row from the inline view.

Setting the parameter, _optimizer_mjc_enabled to false forces a NESTED LOOP instead of the CMJ.

I tried with setting the CARDINALITY and DYNAMIC_SAMPLING hint, but neither one of these prevented the CMJ and still showed an estimation of 1.

Would there be anything else possible to preclude the CMJ from occurring other than setting the hidden parameter?

Comment by Brian — August 4, 2010 @ 5:13 pm BST Aug 4,2010 | Reply
- Brian,
  
  You might consider using the opt_param() hint to set _optimizer_mjc_enabled to false for the duration of the query – but it’s probably an unsupported method (unless you get approval through an SR to Oracle).
  
  The cardinality of 1 MIGHT be a side effect of the “in subquery” applying a selectivity of 5% to the target of the “IN” so a cardinality hint that scales up the single table cardinality of the target table by a factor of 20 (or more) might help – but it’s hard to say without a detailed investigation of the query, and the cardinality hint is a little subtle in anything other than very trivial cases.
  
  Is it possible that you’re seeing a bug which I thought I’d described somewhere on the blog but can’t find at present … there are cases in 10g where the cardinality of a query with subquery is ALWAYS 1 thanks to an error in a bug-fix relating to when the normal 5% selectivity is applied. If this bug appears as the last step of a query then it doesn’t really matter, but if it appears as the last step of a non-mergeable inline view then it can have a huge impact. A workaround that may apply is to put a push_subq hint into the subquery in the inline view.
  
  Comment by Jonathan Lewis — August 9, 2010 @ 12:33 pm BST Aug 9,2010 | Reply
Jonathan,

If the cursor_sharing parameter is set to SIMILAR at the instance level, do you think that it can cause Cartesian merge join for many queries? I see a lot of Cartesian joins in one such database(11G – 11.1.0.7.0). I am investigating on the root cause. If it is not the reason, what else can lead to Cartesian joins? Most of the time, the query executes fast (using nested loop join), but at times it takes a long time to execute (using Cartesian merge join).

Thanks,
Ajith

Comment by Ajith — August 17, 2010 @ 12:01 pm BST Aug 17,2010 | Reply
- Ajith,
  
  This seems a little unlikely because you still get bind variable peeking that results (bugs excepted) in optimization against known constants. I suppose it’s possible that you could get more Cartesian Merge Joins on SIMILAR than you would on FORCED because the former allows re-optimization in various circumstances – so you get through more plans.
  
  You may be getting CMJs because the estimated cardinality of the driving table is much less than one. It the estimate is correct then the CMJ is reasonable – if the estimate is wrong then the query will take longer than expected. When you find a CMJ in memory, use the “peeked_binds” option in dbms_xplan.display_cursor ( http://optimizermagic.blogspot.com/2008/02/displaying-and-reading-execution-plans.html ) to see if you can find the bind variables and reproduce the plan using constants.
  
  Comment by Jonathan Lewis — August 18, 2010 @ 9:13 pm BST Aug 18,2010 | Reply
Thank you Jonathan,

I will do that when we get CMJs in memory next time and will use “+peeked_binds” option with display_cursor.

Thank you again for your time,
Ajith

Comment by Ajith — August 19, 2010 @ 2:16 pm BST Aug 19,2010 | Reply
I have query as below, both tables are partitioned (partition by range of data_desc, one day one partition.)

select count(1) from
( select ‘2010-08-02’ as data_desc,
t2.lev,
count(1)
from dn_coinlog_his t1,
dn_character_mes1 t2
where t2.area_id = t1.area_id
and t2.character_id = t1.characterid
and t2.data_desc = t1.data_desc
and t1.data_desc = ‘2010-08-02’
group by t2.lev);

However, it did NOT use transtive closure to generate execution plan with “partition range single”.
I traced it with 10053 and found it passed the CBQT validity, however, Oracle did NOT rewrite it
with t1.data_desc = ‘2010-08-02′ and t2.data_desc =’2010-08-02’.

Below is abstract from 10053 event.

Predicate Information:

**************************
Predicate Move-Around (PM)
**************************
PM: Considering predicate move-around in SEL$1 (#0).
PM: Checking validity of predicate move-around in SEL$1 (#0).
CBQT: Validity checks passed for afg5yjfy225db.
Query block (0x2a973e5398) before join elimination:
SQL:******* UNPARSED QUERY IS *******
SELECT ‘2010-08-02′ “DATA_DESC”,”T2″.”LEV” “LEV”,COUNT(1) “COUNT(1)” FROM “DI_USER”.”DN_COINLOG_HIS” “T1″,”DI_USER”.”DN_CHARACTER_MES1″ “T2”
WHERE “T2″.”AREA_ID”=”T1″.”AREA_ID”
AND “T2″.”CHARACTER_ID”=”T1″.”CHARACTERID”
AND “T2″.”DATA_DESC”=”T1″.”DATA_DESC”
AND “T1”.”DATA_DESC”=’2010-08-02’
GROUP BY “T2″.”LEV”
Query block (0x2a973e5398) unchanged

Could you show me why Oracle rewrite it with PM, sometime it not?
By the way, I have wrote you about this issue with trace file. Sorry for disturb you again.

Thank you again for your time,
Amao

Comment by amao — August 20, 2010 @ 5:44 am BST Aug 20,2010 | Reply
Jonathan sir,

I extracted an execution plan from AWR, and am a bit puzzled on reading the following bit:
```
------------------------------------------------------------------------------------
| Id  | Operation                        | Name       | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|  17 |             MERGE JOIN CARTESIAN |            |     1 |    62 | 10837   (2)|
|  18 |              NESTED LOOPS        |            |     1 |    44 |   249   (1)|
|  19 |               FAST DUAL          |            |     1 |       |     2   (0)|
|  20 |               PARTITION RANGE ALL|            |     1 |    44 |   247   (1)|
|  21 |                TABLE ACCESS FULL | TBL_BONUSES|     1 |    44 |   247   (1)|
|  22 |              BUFFER SORT         |            |   508K|  8940K| 10590   (2)|
|  23 |               TABLE ACCESS FULL  | TBL_PLAYERS|   508K|  8940K| 10588   (2)|
```
It’s a cartesian join and so the estimated cardinality here should be 1*508K = 508K. But it is showing up as 1!!

Is this a bug or is there something else going on?

Version is 10.2.0.5 EE

Many thanks.

Comment by CJ — June 2, 2011 @ 5:20 pm BST Jun 2,2011 | Reply
- CJ,
  
  There are all sorts of little changes (and bugs) that appear in point releases; but it’s possible that the calculated cardinality from line 21 (which will end up in line 18) is very much smaller than one, even though it has to be reported as one. Then (very small * 508K) can come to one. Simple rounding can cause some confusion. This is only a guess, of course; you might be able to confirm it by looking at the 10053 trace.
  
  Comment by Jonathan Lewis — June 2, 2011 @ 8:52 pm BST Jun 2,2011 | Reply
Hi Jonathan

What do you think about this strange behavior I noticed?
If the list of tables in the FROM part of the SELECT statement contains tables that are not needed (none of their columns are selected nor used in the WHERE part of the statement), the CBO starts increasing the cardinality and introduces merge cartesian joins (MCJ) in the execution plan.

see the clip on my blog for an example:

http://db.parvu.org/?p=66

Thanks

Radu

Comment by Radu Parvu — October 5, 2013 @ 4:46 pm BST Oct 5,2013 | Reply
- Radu,
  
  I can’t get your video to expand to full screen, and it’s too small for me to read what’s going on.
  
  As it is your comments above sound like expected behaviour. If you include a table in the FROM clause with no join condition then SQL does what you tell it – it multiplies the previous result set by the number of rows in that table and has to use a Cartesian merge join to do so.
  
  Comment by Jonathan Lewis — October 5, 2013 @ 7:21 pm BST Oct 5,2013 | Reply
  - Note: please click on the screenshots to get them in larger size
    
    Comment by Radu Parvu — October 5, 2013 @ 7:35 pm BST Oct 5,2013 | Reply
Hi

I added four screenshots that hopefully you can see better.

I am not so sure that this is expected behavior: the result set is not modified at all by adding more tables to the list of tables. Especially the way cardinality increases looks really strange.

http://db.parvu.org/?p=66

Thanks

Radu

Comment by Radu Parvu — October 5, 2013 @ 7:35 pm BST Oct 5,2013 | Reply
- Your screen shots don’t show all the cardinalities clearly, but the ones that are clear show the estimates going up as each table is added. Whether or not they are good estimates I can’t tell because I can’t see your data; and you haven’t shown me that the result set doesn’t change because you’re only showing me the execution plans. However, there’s nothing in the display that suggests the optimizer isn’t doing exactly what I would expect.
  
  Comment by Jonathan Lewis — October 5, 2013 @ 7:50 pm BST Oct 5,2013 | Reply
The case was reproduced using the unmodified HR schema of a default Oracle installation.

Please access this linked sql file in order to see the details:

https://docs.google.com/file/d/0B_8ZazLGVPv3dDdSN3dDUU1aUXM/edit?usp=sharing

It is obvious that the result set of the four queries is the same but Oracle CBO chooses to do MCJ when not needed tables are added to the FROM list.

Reproduced on 11.2.0.3 and 10g R1

Your opinion very much valued!

Thanks!

Comment by Radu Parvu — October 6, 2013 @ 11:22 am BST Oct 6,2013 | Reply
- As a general rule I don’t download things just because someone can’t manage to publish them in a readable form. In your case, though I have made an exception – and this means I can now read that …
  
  a) your query is a “select DISTINCT” – which explains why the actual result set doesn’t change
  
  b) Neither “THE” cardinality nor “THE” estimate of cardinality goes up, they stay the same (106) by the time you get to the final step of the plan
  
  c) The intermediate cardinalities keep going up, but that’s because the optimizer is doing what you told it to do – which is to use Cartesian merge joins to scale up the current result set by the cardinality of each table you add to the from clause without a join condition. The database isn’t programmed to do what you intended to do, it’s programmed to do what you told it to do.
  
  Technically there may be an execution path where the optimizer could aggregate after each redundant join, keeping the intermediate result sets as small as possible – but you can’t expect the optimizer to deal with every single case of silly SQL perfectly.
  
  Comment by Jonathan Lewis — October 6, 2013 @ 12:56 pm BST Oct 6,2013 | Reply
  - OK, thanks (looks like the wordpress blogs make it difficult to publish code …)
    
    a & b agreed
    
    c agreed the SQL is silly but in real life we do encounter situations when silly code is generated by other software or humans. But I would debate that it was indicated anything to optimizer: nothing stops the optimizer to totally exclude the not-needed tables from the plan (as they have no selected columns nor any related conditions in the WHERE clause coupled with the DISTINCT operator). No hint is used at all.
    
    Thanks for your opinion.
    
    Comment by Radu Parvu — October 6, 2013 @ 1:37 pm BST Oct 6,2013 | Reply
    - Radu,
      
      I think that I understand the behavior that you are reporting. With Oracle Database you must always tell the optimizer how tables should be joined, otherwise you should expect Oracle to generate a cross-product, returning x * y rows, where x is the number of rows matched (based on the WHERE clause) in the first table, and y is the number of rows matched (based on the WHERE clause) in the second table (or a previously generated resultset). So, if you have three tables, each with 100 rows, you list the tables in the FROM clause, but do not join the tables in the WHERE clause, Oracle will return 100 * 100 * 100 = 1,000,000 rows even if there are declared foreign keys between the tables.
      
      Jonathan’s example in this blog article is significantly different from the example that you built. In his SQL statement he specified all of the correct joins in the WHERE clause, yet the query optimizer still built an execution plan that included a Cartesian join. I tried to provide you with a much more detailed explanation that explains how your example is different, in a comment attached to your blog article.
      
      As a final thought, try not to use the DISTINCT clause – there may be a few times when it is needed, but most of the time when it is added to a SQL statement so that the correct results are returned, that is an indication that the WHERE clause is missing one or more join predicates.
      
      Comment by Charles Hooper — October 6, 2013 @ 10:47 pm BST Oct 6,2013
    - Charles,
      
      Thanks for responding to Radu’s comments.
      
      Your suggestions (on his blog) that he may be extrapolating incorrectly from Oracle’s ability to do table elimination on referential integrity, or that he is expecting Oracle and SQL Server behave in a similar fashion look like viable explanations for his surprise.
      
      Comment by Jonathan Lewis — October 7, 2013 @ 9:01 am BST Oct 7,2013
    - Thank both for the useful comments! Especially to Charles for taking time to understand and address my concern.
      
      Have an excellent day!
      
      Comment by Radu Pârvu — October 7, 2013 @ 9:18 am BST Oct 7,2013
MJC in this case of transitive closure looks fine. What made oracle to retain the join via _optimizer_transitivity_retain ?

Comment by Anonymous — December 14, 2018 @ 2:33 am GMT Dec 14,2018 | Reply
- Anonymous,
  
  I can’t answer questions of the form: “Why is Oracle coded to do X ?”, I don’t have any inside information and the best I could do would be a guess.
  
  Comment by Jonathan Lewis — December 15, 2018 @ 6:51 pm GMT Dec 15,2018 | Reply