The first thing to look at is the meaning of “num_distinct” in the join selectivity formula. To understand what this is, you need to read the work on “selection without replacement” done by * Alberto Dell’Era*.

Once you can calculate the correct selectivity for a single predicate, the selectivity for multiple predicates AND’ed together on a single join is the product of the separate selectivities – but there are several special cases, including the multi-column sanity check, the index sanity check, a limit of 1/num_rows, the effects of histograms, and the effects of Oracle transforming your query through transitive closure: so you do need to check that you are applying the calculations to the right columns.

Depending on the version of Oracle there are also a couple of subtle bugs in the multi-column calculations that can make a (probably) small difference to the results.

]]>Referring to your book CBO on page 269, “Unfortunately there are problems that still need addressing—lots of them. Let’s try to find

a few questions about the limitations of the formulae.

• What are you supposed to do if you have two or more join columns?”

I would like to know whether the standard join cardinality formula is applicable if I have more than one join columns?Could you please explain join cardinality for 3 or more join columns.

When i tried with the following SQL,I couldn’t do it.

SQL> @xplan

Plan hash value: 4119620020

———————————————————————————————————–

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |

———————————————————————————————————–

| 0 | SELECT STATEMENT | | 96199 | 53M| | 114K (1)| 00:08:48 |

|* 1 | HASH JOIN | | 96199 | 53M| 4984K| 114K (1)| 00:08:48 |

|* 2 | TABLE ACCESS BY INDEX ROWID| PS_PC_RES_PA_TA121 | 96199 | 3851K| | 6517 (1)| 00:00:31 |

|* 3 | INDEX RANGE SCAN | PSAPC_RES_PA_TA121 | 299K| | | 227 (1)| 00:00:02 |

|* 4 | TABLE ACCESS BY INDEX ROWID| PS_PC_RES_PA_TA021 | 785K| 403M| | 72751 (1)| 00:05:36 |

|* 5 | INDEX RANGE SCAN | PSAPC_RES_PA_TA021 | 2902K| | | 2308 (1)| 00:00:11 |

———————————————————————————————————–

Predicate Information (identified by operation id):

—————————————————

1 – access(“PR”.”BUSINESS_UNIT_PO”=”PRT”.”BUSINESS_UNIT_PO” AND “PR”.”PO_ID”=”PRT”.”PO_ID” AND

“PR”.”LINE_NBR”=”PRT”.”LINE_NBR” AND “PR”.”SCHED_NBR”=”PRT”.”SCHED_NBR” AND

“PR”.”DISTRIB_LINE_NUM”=”PRT”.”DISTRIB_LINE_NUM” AND “PR”.”DST_ACCT_TYPE”=”PRT”.”DST_ACCT_TYPE”)

2 – filter(“PRT”.”ANALYSIS_TYPE”=’CCR’ OR “PRT”.”ANALYSIS_TYPE”=’CRV’)

3 – access(“PRT”.”PROCESS_INSTANCE”=28022762)

4 – filter(“PR”.”ANALYSIS_TYPE”=’CCR’ OR “PR”.”ANALYSIS_TYPE”=’CRV’)

5 – access(“PR”.”PROCESS_INSTANCE”=28022762)

Join Predicated & NDVs

PR.BUSINESS_UNIT_PO = 219

PR.PO_ID = 308320

PR.LINE_NBR = 142

PR.SCHED_NBR = 24

PR.DISTRIB_LINE_NUM = 56

PR.DST_ACCT_TYPE = 3

PRT.BUSINESS_UNIT_PO = 177

PRT.PO_ID = 139136

PRT.LINE_NBR = 133

PRT.SCHED_NBR = 23

PRT.DISTRIB_LINE_NUM = 42

PRT.DST_ACCT_TYPE = 2

Filter Predicates & NDVs

PRT.ANALYSIS_TYPE = 3

PR.ANALYSIS_TYPE = 5

Table row count

PS_PC_RES_PA_TA121 PRT Rows= 299160

PS_PC_RES_PA_TA021 PR Rows= 2902548

Thanks

]]>Your earlier objection was that:

“Where there is correlation between columns, probably indicates some inadequately modelled functional dependency that possibly should have been normalised.”

Now that I’ve pointed out that it is perfectly feasible to find correlation between measurements that have a common cause (which isn’t a logical fallacy, by the way), your argument is that the business definition may change. But if you change the business definition then you should go back and review the data model – and that’s where any information about the impact on the previously recognised correlation would be re-considered.

]]>