Stefan,

That’s the arithmetic.

At the moment I’m prepared to put the difference of 1 down to a choice of rounding until I run an automated range test that suggest otherwise. It may be that the optimizer always rounds the adjusted num_distinct up to the next integer value.

]]>looks promising: I get convincing results for my test cases. Alberto Dell’Era’s SWRU function shows me once again that a solid mathematical background has some value when you try to answer this kind of questions…

Regards

Martin

Hi Martin,

i don’t want to anticipate Jonathan’s blog series about upgrades, but you may want to try this algorithm.

cardinality(T1) = 1/400(= N_400) x 1000000 = 2500 rows (= ID line 2)

cardinality(T2) = 1/400(= N_400) x 1000000 = 2500 rows (= ID line 3)

cartesian_join(T1.T2) = cardinality(T1) x cardinality(T2) = 2500 rows x 2500 rows = 6250000 rows

Multi column sanity check with larger selectivity from one side (does not apply in this case)

NUM_DIST(T2side) = NUM_DIST(T2.N_72) x NUM_DIST(T2.N_750) = 72 x 750 = 54000

NUM_DIST(T1side) = NUM_DIST(T1.N_90) x NUM_DIST(T1.N_600) = 90 x 600 = 54000

NUM_DIST(JOIN) = SWRU(54000,1000000,2500) = 54000 x (1 – power(1-2500/1000000,1000000/54000)) = 2446.00097

JOIN_CARD = cartesian_join(T1.T2) / NUM_DIST(JOIN) = 6250000 / 2446.00097 = 2555.19114

Not quite sure about the difference of one, but maybe caused by some simplification or rounding issues. I am pretty sure Jonathan knows the answer to that :-))

Regards

Stefan

Martin,

Let me just point you in the critical direction – ** a paper** (dated 2007, so Oracle’s application of the algorithm may have changed) by Alberto Dell’Era.

playing with the post 11g example and its multi-column cardinality sanity check it seems that:

– the cardinality is still determined by one side of the join

– the cardinality does not get smaller than 2500 in the given example with different join columns

– the cardinality above 2500 (54 in the given example) is somehow related to 1/NDV (54000 for the example according to a CBO trace created with dbms_sqldiag.dump_trace); but I am not able to catch this “somehow”.

So I have to wait for the following articles …

Martin,

That is correct – but it’s an error, and the error has been addressed (arguably incorrectly) in the 11g example.

]]>since Stefan already started to handle it as a quiz night I would guess that the calculation in 10.2.0.5 is using the standard formula and uses the values from only one side (t2) to calculate the join selectivity: 1/72 * 1/750 * 2500 * 2500 = 115.74…

Martin

]]>Stefan,

Correct – that explains the relatively small change in this example from 9i to 10g; and the larger change from 10g to 11g is explained by a change in the way the sanity check is applied.

]]>