A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.
We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 11.2.0.2).
create table t1 (
skew, skew2, padding
)
as
select r1, r2, rpad('x',400)
from
(
select /*+ no_merge */
rownum r1
from all_objects
where rownum <= 80
) v1,
(
select /*+ no_merge */
rownum r2
from all_objects
where rownum <= 80
) v2
where r2 <= r1
order by r2,r1
;
alter table t1 modify skew not null;
alter table t1 modify skew2 not null;
create index t1_skew on t1(skew);
analyze table t1 compute statistics
for all indexed columns size 75;
The way I’ve created the data set the column skew has one row with the value 1, two rows with the value 2, and so on up to 80 rows with the value 80. I’ve put in a bid to collect a histogram of 75 buckets – which the default, by the way, for the analyze command - on any indexed columns . (Interestingly the resulting histogram on column skew held on 74 buckets.)
To demonstrate the calculation of selectivity, I then enabled the 10053 trace and ran a query to select one of the “non-popular” values (i.e. a value with a fairly small number of duplicates). The section of the trace file I want to talk about appears as the “Single Table Access Path”.
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T1[T1]
Column (#1):
NewDensity:0.008940, OldDensity:0.006757 BktCnt:74, PopBktCnt:31, PopValCnt:15, NDV:80
Column (#1): SKEW(
AvgLen: 2 NDV: 80 Nulls: 0 Density: 0.008940 Min: 1 Max: 80
Histogram: HtBal #Bkts: 74 UncompBkts: 74 EndPtVals: 59
Table: T1 Alias: T1
Card: Original: 3240.000000 Rounded: 29 Computed: 28.96 Non Adjusted: 28.96
Access Path: TableScan
Cost: 32.00 Resp: 32.00 Degree: 0
Cost_io: 32.00 Cost_cpu: 0
Resp_io: 32.00 Resp_cpu: 0
Access Path: index (AllEqRange)
Index: T1_SKEW
resc_io: 30.00 resc_cpu: 0
ix_sel: 0.008940 ix_sel_with_filters: 0.008940
Cost: 30.00 Resp: 30.00 Degree: 1
Best:: AccessPath: IndexRange
Index: T1_SKEW
Cost: 30.00 Degree: 1 Resp: 30.00 Card: 28.96 Bytes: 0
A few points to notice on line 4:
The NDV (number of distinct values) is 80; this means that the Density, in the absence of a histogram, would be 1/80 = 0.0125
We have a NewDensity and an OldDensity. The OldDensity is the value that would have been reported simply as Density in 10g or 9i, and is derived using the mechanism I described in “Cost Based Oracle – Fundamentals” a few years ago. Informally this was:sum of the square of the frequency of the non-popular values /
(number of non-null rows * number of non-popular non-null rows)
This formula for OldDensity is (I assume) a fairly subtle formula based on expected number of rows for a randomly selected non-popular value in the presence of popular values. The NewDensity, however, seems to take the much simpler approach of “factoring out” the popular values. There are two ways you can approach the arithmetic – one is by thinking of the number of rows you expect for the query “column = ‘non-popular value’”, the other is by thinking of the number of non-popuar values and then adjusting for the relative volume of non-popular value in the table.
Option 1: Cardinality.
Line 9 tells us there are 3240 rows in the table.
Line 4 tells us there are 80 (NDV) distinct values of which 15 (PopValCnt) are popular, and 74 (BktCnt) buckets of which 31 (PopBktCnt) contain popular values.
From this we determine that there are (80 – 15 = ) 65 non-popular values and (3240 * (74-31)/74 = ) 1883 non-popular rows.
Hence we infer that a typical non-popular value will report (1883 / 65 = ) 29 rows – which is the rounded cardinality we see in line 9.
Option 2: Selectivity.
If we consider only non-popular values, then the selectivity is 1/(number of non-popular values) = 1/65.
But this selectivity applies to only 43 buckets of the 74 total bucket count.
To generate a selectivity that can be applied to the original cardinality of the table we have to scale it accordingly.
The selectivity, labelled the NewDensity, is (1/65) * (43/74) = 0.00894 – which is the value we see in line 4.
(Following this, of course, the cardinality for ‘column = constant’ would be 3,240 * 0.00894 = 29
Footnote 1:
I have been a little cavalier with rounding throughout the example, just to keep the numbers looking a little tidier.
Footnote 2:
If the column is allowed to be null then our calculation of cardinality would use (3,240 – number of nulls) instead of 3,240. The method for calculating the selectivity would not change, but the resulting figure would be applied to (3,240 – number of nulls).

wow – thanks for the detailed info
Comment by oracleman consulting — January 3, 2012 @ 6:17 pm UTC Jan 3,2012 |
Hello Jonathan,
What could be the possible reason of COST = 0 for unique index scan operation of the following query and plan. I am not able to understand the reason.
Could you please help.
sql_id=52a0skswvaqvk. Current SQL statement for this session: explain plan for select count(*) from orders, lineitem where o_orderkey = l_orderkey ============ Plan Table ============ ---------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------+-----------------------------------+ | 0 | SELECT STATEMENT | | | | 343 | | | 1 | SORT AGGREGATE | | 1 | 10 | | | | 2 | NESTED LOOPS | | 583K | 5830K | 343 | 00:00:05 | | 3 | INDEX FAST FULL SCAN | PK_LINEITEM| 586K | 2932K | 294 | 00:00:05 | | 4 | INDEX UNIQUE SCAN | PK_ORDERS | 1 | 5 | 0 | | ---------------------------------------------+-----------------------------------+ Predicate Information: ---------------------- 4 - access("O_ORDERKEY"="L_ORDERKEY") Content of other_xml column =========================== db_version : 10.2.0.1 parse_schema : EMUDSS plan_hash : 2594599139 Outline Data: /*+ BEGIN_OUTLINE_DATA IGNORE_OPTIM_EMBEDDED_HINTS OPTIMIZER_FEATURES_ENABLE('10.2.0.1') ALL_ROWS OUTLINE_LEAF(@"SEL$1") INDEX_FFS(@"SEL$1" "LINEITEM"@"SEL$1"("LINEITEM"."L_ORDERKEY" "LINEITEM"."L_LINENUMBER")) INDEX(@"SEL$1" "ORDERS"@"SEL$1" ("ORDERS"."O_ORDERKEY")) LEADING(@"SEL$1" "LINEITEM"@"SEL$1" "ORDERS"@"SEL$1") USE_NL(@"SEL$1" "ORDERS"@"SEL$1") END_OUTLINE_DATA */Comment by Ajeet — March 9, 2012 @ 12:59 pm UTC Mar 9,2012 |
Ajeet,
I think this is an example covered in my book “Cost Based Oracle – Fundamentals”, but I can’t be completely certain since I don’t have all the information about your tables and indexes in front of me. The basic points, however, are this:
I suspect from this that your index has a blevel of 1: so the cost of “select o_orderkey from orders where o_orderkey = {constant}” has a cost of 1, and the cost of accessing just the order key in the nested loop is zero.
Comment by Jonathan Lewis — March 12, 2012 @ 10:36 am UTC Mar 12,2012 |
Hello Jonathan,
Thanks so much for explaining .I do have your book – cost based oracle -fundamentals, but I could not find the discussion on cost of unique index access ,but I do see some sort of discussion on it in chapter 11 . (nested loops).
Yes the blevel of the unique index which is a PK is 1 . you are absolutely correct .
Kindly point out where in the cost based oracle -fundamentals, I could see more details on this.
I have a hard copy and I might have skipped that section inadvertantely.
Regards.
Ajeet
Comment by Ajeet — March 12, 2012 @ 11:40 am UTC Mar 12,2012 |
Hello Jonathan,
Would you able to help me in understanding the cardinality from the following output, how was 1642 derived?
SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for TRADE[T] Column (#2): NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:4, NDV:4 Column (#105): NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:2, NDV:2 Column (#55): NewDensity:0.000002, OldDensity:0.000002 BktCnt:2835, PopBktCnt:2835, PopValCnt:2, NDV:2 Column (#1): NewDensity:0.000002, OldDensity:0.000002 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:443488 Column (#72): NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:2, NDV:4 Column (#6): NewDensity:0.001419, OldDensity:0.002012 BktCnt:254, PopBktCnt:10, PopValCnt:5, NDV:682 Column (#159): NewDensity:0.000003, OldDensity:0.000038 BktCnt:254, PopBktCnt:2, PopValCnt:1, NDV:310528 Column (#16): NewDensity:0.000002, OldDensity:0.000005 BktCnt:254, PopBktCnt:10, PopValCnt:1, NDV:426496 Column (#157): NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:3, NDV:4 Column (#155): NewDensity:0.000215, OldDensity:0.001799 BktCnt:254, PopBktCnt:81, PopValCnt:13, NDV:3175 Column (#4): NewDensity:0.000001, OldDensity:0.000001 BktCnt:5178, PopBktCnt:5178, PopValCnt:6, NDV:7 ColGroup (#1, Index) IDX_TRD_DVP_FEX Col#: 1 2 5 8 53 60 73 105 108 CorStregth: -1.00 ColGroup (#2, Index) IDXTRDTAX Col#: 1 2 55 68 72 105 135 CorStregth: 52038.16 ColGroup (#8, Index) IDX_TRADE_FX Col#: 1 2 5 8 105 108 CorStregth: -1.00 ColGroup (#5, Index) IDX$$_4B290002 Col#: 2 6 154 155 156 CorStregth: 2298053781278.98 ColGroup (#7, Index) FEXIGNOREEXCEPTION_712A0001 Col#: 2 5 8 105 108 CorStregth: -1.00 ColGroup (#6, Index) IDX_TRADE_COMM Col#: 2 4 5 153 CorStregth: -1.00 ColGroup (#9, Index) IDX$$_50990001 Col#: 6 7 12 CorStregth: -1.00 ColGroup (#10, Index) TRADE_UK1 Col#: 1 6 46 CorStregth: -1.00 ColGroup (#3, Index) IDX$$_173D0004 Col#: 6 159 CorStregth: 575.99 ColGroup (#4, Index) IDX$$_4B2C0004 Col#: 16 157 CorStregth: 4.00 ColGroup Usage:: PredCnt: 4 Matches Full: Partial: Table: TRADE Alias: T Card: Original: 471074.000000 Rounded: 1642 Computed: 1641.93 Non Adjusted: 1641.93 Access Path: TableScan Cost: 12436.31 Resp: 12436.31 Degree: 0 Cost_io: 12338.00 Cost_cpu: 2113927982 Resp_io: 12338.00 Resp_cpu: 2113927982Comment by suresh — July 17, 2012 @ 6:11 am UTC Jul 17,2012 |
Suresh,
It can take some time to work these things out, and it’s not a particularly interesting or educational task; so I only do it when I’m being paid to do it or puzzled by an example that’s appeared in a forum.
In your example there are two things that would make it harder to do: (a) every column has a histogram – so you might have to look at the histogram data in great detail, and (b) you haven’t told me what the predicates are that the optimizer will be evaluating at this point – and not that they may NOT be exactly the predicates you wrote in your initial query.
Comment by Jonathan Lewis — July 19, 2012 @ 10:57 am UTC Jul 19,2012 |
Hi Jonathan,
Thanks for your time and Thanks for the comments too.
There’s lot of predicates on the given table and not particularly using any index and doing a full table scan. To add more, the query in the question is having problem only on monday after a table re-org (ofcourse stats has been gathered after rebuild) and rest of the days its back to normal. No change in plans and nothing, hence trying to cornering how the cost has been derived. Will further dig more.
I must say, I have been learning a lot with your blog and your nice book “CBO – fundamentals”, Thank you…
Regards
Suresh
Comment by suresh — July 19, 2012 @ 3:21 pm UTC Jul 19,2012 |
First time I’m able to compute New Density. Thanks for a simple and detailed explanation.
However, cardinality estimate is off. My column is not null column with FBI on it (upper(column)) and no popular values.
Predicate is “col like :bindvar||’%'”. Bind value is within range.
NewDensity:0.000015, OldDensity:0.000239 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:67072 Column (#41): SYS_NC00041$( AvgLen: 50 NDV: 67072 Nulls: 0 Density: 0.000015 Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 255 Card: Original: 309157.000000 Rounded: 15458 Computed: 15457.85 Non Adjusted: 15457.85Cardinality = 0.000015 * 309157 = 4.637355 vs 15458 (computed by CBO)
Not sure what type of math I should do for a unbounded range…
Will try to figure out by going through your book. But, do you by any chance think this is a bug?
Appreciate your contribution to Oracle community,
Thanks!
Comment by PerfNewbie — December 14, 2012 @ 12:36 am UTC Dec 14,2012 |
It’s always worth spending a couple of minutes working backwards just in case it gives you a clue:
Oracle’s cardinality = 15458
Oracle’s number of rows input = 309157
So Oracle’s selectivity = 15458 / 309157 = 0.05000049
Oracle has used the standard 5% for “column > unknown value” in this case.
Comment by Jonathan Lewis — December 14, 2012 @ 6:48 am UTC Dec 14,2012 |
Oops! Sorry (embarrassed). I got hung up with doing math to derive cardinality…I should have simply checked for default. I realized only when driving back home (just need some diversion I guess).
Thanks for your reply.
Comment by PerfNewbie — December 14, 2012 @ 9:53 pm UTC Dec 14,2012 |