A recent comment on a note I wrote some time ago about* faking histograms* asked about the calculations of

*in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating*

**selectivity***, rather than*

**cardinality***, so I thought I’d supply a proper example.*

**selectivity**We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the * analyze *command rather than

*, even though this example comes from an instance of 11.2.0.2).*

**dbms_stats.gather_table_stats**create table t1 ( skew, skew2, padding ) as select r1, r2, rpad('x',400) from ( select /*+ no_merge */ rownum r1 from all_objects where rownum <= 80 ) v1, ( select /*+ no_merge */ rownum r2 from all_objects where rownum <= 80 ) v2 where r2 <= r1 order by r2,r1 ; alter table t1 modify skew not null; alter table t1 modify skew2 not null; create index t1_skew on t1(skew); analyze table t1 compute statistics for all indexed columns size 75;

The way I’ve created the data set the column **skew** has one row with the value 1, two rows with the value 2, and so on up to 80 rows with the value 80. I’ve put in a bid to collect a histogram of 75 buckets – which the default, by the way, for the **analyze **command - on any indexed columns . (Interestingly the resulting histogram on column * skew *held on 74 buckets.)

To demonstrate the calculation of selectivity, I then enabled the 10053 trace and ran a query to select one of the *“non-popular”* values (i.e. a value with a fairly small number of duplicates). The section of the trace file I want to talk about appears as the “Single Table Access Path”.

SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T1[T1] Column (#1): NewDensity:0.008940, OldDensity:0.006757 BktCnt:74, PopBktCnt:31, PopValCnt:15, NDV:80 Column (#1): SKEW( AvgLen: 2 NDV: 80 Nulls: 0 Density: 0.008940 Min: 1 Max: 80 Histogram: HtBal #Bkts: 74 UncompBkts: 74 EndPtVals: 59 Table: T1 Alias: T1 Card: Original: 3240.000000 Rounded: 29 Computed: 28.96 Non Adjusted: 28.96 Access Path: TableScan Cost: 32.00 Resp: 32.00 Degree: 0 Cost_io: 32.00 Cost_cpu: 0 Resp_io: 32.00 Resp_cpu: 0 Access Path: index (AllEqRange) Index: T1_SKEW resc_io: 30.00 resc_cpu: 0 ix_sel: 0.008940 ix_sel_with_filters: 0.008940 Cost: 30.00 Resp: 30.00 Degree: 1 Best:: AccessPath: IndexRange Index: T1_SKEW Cost: 30.00 Degree: 1 Resp: 30.00 Card: 28.96 Bytes: 0

A few points to notice on line 4:

The

NDV(number of distinct values) is 80; this means that the Density, in the absence of a histogram, would be 1/80 = 0.0125

We have aNewDensityand anOldDensity. TheOldDensityis the value that would have been reported simply asDensityin 10g or 9i, and is derived using the mechanism I described in“Cost Based Oracle – Fundamentals”a few years ago. Informally this was:sum of the square of the frequency of the non-popular values /

(number of non-null rows * number of non-popular non-null rows)

This formula for **OldDensity** is (I assume) a fairly subtle formula based on expected number of rows for a randomly selected non-popular value in the presence of popular values. The **NewDensity**, however, seems to take the much simpler approach of “factoring out” the popular values. There are two ways you can approach the arithmetic – one is by thinking of the number of rows you expect for the query *“column = ‘non-popular value’”,* the other is by thinking of the number of non-popuar values and then adjusting for the relative volume of non-popular value in the table.

### Option 1: Cardinality.

Line 9 tells us there are 3240 rows in the table.

Line 4 tells us there are 80 (NDV) distinct values of which 15 (PopValCnt) are popular, and 74 (BktCnt) buckets of which 31 (PopBktCnt) contain popular values.

From this we determine that there are (80 – 15 = ) 65 non-popular values and (3240 * (74-31)/74 = ) 1883 non-popular rows.

Hence we infer that a typical non-popular value will report (1883 / 65 = ) 29 rows – which is the rounded cardinality we see in line 9.

### Option 2: Selectivity.

If we consider only non-popular values, then the selectivity is 1/(number of non-popular values) = 1/65.

But this selectivity applies to only 43 buckets of the 74 total bucket count.

To generate a selectivity that can be applied to the original cardinality of the table we have to scale it accordingly.

The selectivity, labelled the **NewDensity**, is (1/65) * (43/74) = 0.00894 – which is the value we see in line 4.

(Following this, of course, the cardinality for *‘column = constant’* would be 3,240 * 0.00894 = 29

**Footnote 1:**

I have been a little cavalier with rounding throughout the example, just to keep the numbers looking a little tidier.

**Footnote 2:**

If the column is allowed to be null then our calculation of cardinality would use (3,240 – number of nulls) instead of 3,240. The method for calculating the selectivity would not change, but the resulting figure would be applied to (3,240 – number of nulls).

wow – thanks for the detailed info

Comment by oracleman consulting — January 3, 2012 @ 6:17 pm GMT Jan 3,2012 |

Hello Jonathan,

What could be the possible reason of COST = 0 for unique index scan operation of the following query and plan. I am not able to understand the reason.

Could you please help.

Comment by Ajeet — March 9, 2012 @ 12:59 pm GMT Mar 9,2012 |

Ajeet,

I think this is an example covered in my book

, but I can’t be completely certain since I don’t have all the information about your tables and indexes in front of me. The basic points, however, are this:“Cost Based Oracle – Fundamentals”I suspect from this that your index has a blevel of 1: so the cost of “select o_orderkey from orders where o_orderkey = {constant}” has a cost of 1, and the cost of accessing just the order key in the nested loop is zero.

Comment by Jonathan Lewis — March 12, 2012 @ 10:36 am GMT Mar 12,2012 |

Hello Jonathan,

Thanks so much for explaining .I do have your book – cost based oracle -fundamentals, but I could not find the discussion on cost of unique index access ,but I do see some sort of discussion on it in chapter 11 . (nested loops).

Yes the blevel of the unique index which is a PK is 1 . you are absolutely correct .

Kindly point out where in the cost based oracle -fundamentals, I could see more details on this.

I have a hard copy and I might have skipped that section inadvertantely.

Regards.

Ajeet

Comment by Ajeet — March 12, 2012 @ 11:40 am GMT Mar 12,2012 |

Hello Jonathan,

Would you able to help me in understanding the cardinality from the following output, how was 1642 derived?

Comment by suresh — July 17, 2012 @ 6:11 am GMT Jul 17,2012 |

Suresh,

It can take some time to work these things out, and it’s not a particularly interesting or educational task; so I only do it when I’m being paid to do it or puzzled by an example that’s appeared in a forum.

In your example there are two things that would make it harder to do: (a) every column has a histogram – so you might have to look at the histogram data in great detail, and (b) you haven’t told me what the predicates are that the optimizer will be evaluating at this point – and not that they may NOT be exactly the predicates you wrote in your initial query.

Comment by Jonathan Lewis — July 19, 2012 @ 10:57 am GMT Jul 19,2012 |

Hi Jonathan,

Thanks for your time and Thanks for the comments too.

There’s lot of predicates on the given table and not particularly using any index and doing a full table scan. To add more, the query in the question is having problem only on monday after a table re-org (ofcourse stats has been gathered after rebuild) and rest of the days its back to normal. No change in plans and nothing, hence trying to cornering how the cost has been derived. Will further dig more.

I must say, I have been learning a lot with your blog and your nice book “CBO – fundamentals”, Thank you…

Regards

Suresh

Comment by suresh — July 19, 2012 @ 3:21 pm GMT Jul 19,2012 |

First time I’m able to compute New Density. Thanks for a simple and detailed explanation.

However, cardinality estimate is off. My column is not null column with FBI on it (upper(column)) and no popular values.

Predicate is “col like :bindvar||’%’”. Bind value is within range.

Cardinality = 0.000015 * 309157 = 4.637355 vs 15458 (computed by CBO)

Not sure what type of math I should do for a unbounded range…

Will try to figure out by going through your book. But, do you by any chance think this is a bug?

Appreciate your contribution to Oracle community,

Thanks!

Comment by PerfNewbie — December 14, 2012 @ 12:36 am GMT Dec 14,2012 |

It’s always worth spending a couple of minutes working backwards just in case it gives you a clue:

Oracle’s cardinality = 15458

Oracle’s number of rows input = 309157

So Oracle’s selectivity = 15458 / 309157 = 0.05000049

Oracle has used the standard 5% for “column > unknown value” in this case.

Comment by Jonathan Lewis — December 14, 2012 @ 6:48 am GMT Dec 14,2012 |

Oops! Sorry (embarrassed). I got hung up with doing math to derive cardinality…I should have simply checked for default. I realized only when driving back home (just need some diversion I guess).

Thanks for your reply.

Comment by PerfNewbie — December 14, 2012 @ 9:53 pm GMT Dec 14,2012 |