Oracle Scratchpad

January 3, 2012

NewDensity

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:56 pm BST Jan 3,2012

A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.

We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 11.2.0.2).

create table t1 (
	skew,	skew2,	padding
)
as
select r1, r2, rpad('x',400) 
from
(
	select /*+ no_merge */
		rownum r1
	from all_objects 
	where rownum <= 80
)	v1,	
(
	select /*+ no_merge */
		rownum r2
	from all_objects 
	where rownum <= 80
)	v2
where r2 <= r1
order by r2,r1
;

alter table t1 modify skew not null;
alter table t1 modify skew2 not null;

create index t1_skew on t1(skew);

analyze table t1 compute statistics
	for all indexed columns size 75;

The way I’ve created the data set the column skew has one row with the value 1, two rows with the value 2, and so on up to 80 rows with the value 80. I’ve put in a bid to collect a histogram of 75 buckets – which the default, by the way, for the analyze command – on any indexed columns . (Interestingly the resulting histogram on column skew held on 74 buckets.)

To demonstrate the calculation of selectivity, I then enabled the 10053 trace and ran a query to select one of the “non-popular” values (i.e. a value with a fairly small number of duplicates). The section of the trace file I want to talk about appears as the “Single Table Access Path”.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Column (#1):
    NewDensity:0.008940, OldDensity:0.006757 BktCnt:74, PopBktCnt:31, PopValCnt:15, NDV:80
  Column (#1): SKEW(
    AvgLen: 2 NDV: 80 Nulls: 0 Density: 0.008940 Min: 1 Max: 80
    Histogram: HtBal  #Bkts: 74  UncompBkts: 74  EndPtVals: 59
  Table: T1  Alias: T1
    Card: Original: 3240.000000  Rounded: 29  Computed: 28.96  Non Adjusted: 28.96
  Access Path: TableScan
    Cost:  32.00  Resp: 32.00  Degree: 0
      Cost_io: 32.00  Cost_cpu: 0
      Resp_io: 32.00  Resp_cpu: 0
  Access Path: index (AllEqRange)
    Index: T1_SKEW
    resc_io: 30.00  resc_cpu: 0
    ix_sel: 0.008940  ix_sel_with_filters: 0.008940
    Cost: 30.00  Resp: 30.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: T1_SKEW
         Cost: 30.00  Degree: 1  Resp: 30.00  Card: 28.96  Bytes: 0

A few points to notice on line 4:

The NDV (number of distinct values) is 80; this means that the Density, in the absence of a histogram, would be 1/80 = 0.0125
We have a NewDensity and an OldDensity. The OldDensity is the value that would have been reported simply as Density in 10g or 9i, and is derived using the mechanism I described in “Cost Based Oracle – Fundamentals” a few years ago. Informally this was:

sum of the square of the frequency of the non-popular values /
(number of non-null rows * number of non-popular non-null rows)

This formula for OldDensity is (I assume) a fairly subtle formula based on expected number of rows for a randomly selected non-popular value in the presence of popular values. The NewDensity, however, seems to take the much simpler approach of “factoring out” the popular values. There are two ways you can approach the arithmetic – one is by thinking of the number of rows you expect for the query “column = ‘non-popular value'”, the other is by thinking of the number of non-popuar values and then adjusting for the relative volume of non-popular value in the table.

Option 1: Cardinality.

Line 9 tells us there are 3240 rows in the table.
Line 4 tells us there are 80 (NDV) distinct values of which 15 (PopValCnt) are popular, and 74 (BktCnt) buckets of which 31 (PopBktCnt) contain popular values.
From this we determine that there are (80 – 15 = ) 65 non-popular values and (3240 * (74-31)/74 = ) 1883 non-popular rows.
Hence we infer that a typical non-popular value will report (1883 / 65 = ) 29 rows – which is the rounded cardinality we see in line 9.

Option 2: Selectivity.

If we consider only non-popular values, then the selectivity is 1/(number of non-popular values) = 1/65.
But this selectivity applies to only 43 buckets of the 74 total bucket count.
To generate a selectivity that can be applied to the original cardinality of the table we have to scale it accordingly.
The selectivity, labelled the NewDensity, is (1/65) * (43/74) = 0.00894 – which is the value we see in line 4.
(Following this, of course, the cardinality for ‘column = constant’ would be 3,240 * 0.00894 = 29

Footnote 1:
I have been a little cavalier with rounding throughout the example, just to keep the numbers looking a little tidier.
Footnote 2:
If the column is allowed to be null then our calculation of cardinality would use (3,240 – number of nulls) instead of 3,240. The method for calculating the selectivity would not change, but the resulting figure would be applied to (3,240 – number of nulls).

10 Comments »

  1. wow – thanks for the detailed info

    Comment by oracleman consulting — January 3, 2012 @ 6:17 pm BST Jan 3,2012 | Reply

  2. Hello Jonathan,

    What could be the possible reason of COST = 0 for unique index scan operation of the following query and plan. I am not able to understand the reason.
    Could you please help.

    sql_id=52a0skswvaqvk.
    Current SQL statement for this session:
    explain plan for select
    count(*)
    from
    	orders,
            lineitem
    where
    	o_orderkey = l_orderkey
     
    ============
    Plan Table
    ============
    ---------------------------------------------+-----------------------------------+
    | Id  | Operation               | Name       | Rows  | Bytes | Cost  | Time      |
    ---------------------------------------------+-----------------------------------+
    | 0   | SELECT STATEMENT        |            |       |       |   343 |           |
    | 1   |  SORT AGGREGATE         |            |     1 |    10 |       |           |
    | 2   |   NESTED LOOPS          |            |  583K | 5830K |   343 |  00:00:05 |
    | 3   |    INDEX FAST FULL SCAN | PK_LINEITEM|  586K | 2932K |   294 |  00:00:05 |
    | 4   |    INDEX UNIQUE SCAN    | PK_ORDERS  |     1 |     5 |     0 |           |
    ---------------------------------------------+-----------------------------------+
    Predicate Information:
    ----------------------
    4 - access("O_ORDERKEY"="L_ORDERKEY")
     
    Content of other_xml column
    ===========================
      db_version     : 10.2.0.1
      parse_schema   : EMUDSS
      plan_hash      : 2594599139
      Outline Data:
      /*+
        BEGIN_OUTLINE_DATA
          IGNORE_OPTIM_EMBEDDED_HINTS
          OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
          ALL_ROWS
          OUTLINE_LEAF(@"SEL$1")
          INDEX_FFS(@"SEL$1" "LINEITEM"@"SEL$1"("LINEITEM"."L_ORDERKEY" "LINEITEM"."L_LINENUMBER"))
          INDEX(@"SEL$1" "ORDERS"@"SEL$1" ("ORDERS"."O_ORDERKEY"))
          LEADING(@"SEL$1" "LINEITEM"@"SEL$1" "ORDERS"@"SEL$1")
          USE_NL(@"SEL$1" "ORDERS"@"SEL$1")
        END_OUTLINE_DATA
      */
    
    

    Comment by Ajeet — March 9, 2012 @ 12:59 pm BST Mar 9,2012 | Reply

    • Ajeet,

      I think this is an example covered in my book “Cost Based Oracle – Fundamentals”, but I can’t be completely certain since I don’t have all the information about your tables and indexes in front of me. The basic points, however, are this:

      a) the orders_pk index looks like it might be the index on a primary key column
      b) for index-only access on unique indexes the optimizer subtracts 1 from the normal index-only cost
      c) for index access by nested loop the optimizeir subtracts 1 from the expected cost

      I suspect from this that your index has a blevel of 1: so the cost of “select o_orderkey from orders where o_orderkey = {constant}” has a cost of 1, and the cost of accessing just the order key in the nested loop is zero.

      Comment by Jonathan Lewis — March 12, 2012 @ 10:36 am BST Mar 12,2012 | Reply

  3. Hello Jonathan,

    Thanks so much for explaining .I do have your book – cost based oracle -fundamentals, but I could not find the discussion on cost of unique index access ,but I do see some sort of discussion on it in chapter 11 . (nested loops).
    Yes the blevel of the unique index which is a PK is 1 . you are absolutely correct .

    Kindly point out where in the cost based oracle -fundamentals, I could see more details on this.
    I have a hard copy and I might have skipped that section inadvertantely.

    Regards.
    Ajeet

    Comment by Ajeet — March 12, 2012 @ 11:40 am BST Mar 12,2012 | Reply

  4. Hello Jonathan,

    Would you able to help me in understanding the cardinality from the following output, how was 1642 derived?

    SINGLE TABLE ACCESS PATH
      Single Table Cardinality Estimation for TRADE[T]
      Column (#2):
        NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:4, NDV:4
      Column (#105):
        NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:2, NDV:2
      Column (#55):
        NewDensity:0.000002, OldDensity:0.000002 BktCnt:2835, PopBktCnt:2835, PopValCnt:2, NDV:2
      Column (#1):
        NewDensity:0.000002, OldDensity:0.000002 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:443488
      Column (#72):
        NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:2, NDV:4
      Column (#6):
        NewDensity:0.001419, OldDensity:0.002012 BktCnt:254, PopBktCnt:10, PopValCnt:5, NDV:682
      Column (#159):
        NewDensity:0.000003, OldDensity:0.000038 BktCnt:254, PopBktCnt:2, PopValCnt:1, NDV:310528
      Column (#16):
        NewDensity:0.000002, OldDensity:0.000005 BktCnt:254, PopBktCnt:10, PopValCnt:1, NDV:426496
      Column (#157):
        NewDensity:0.000001, OldDensity:0.000001 BktCnt:5402, PopBktCnt:5402, PopValCnt:3, NDV:4
      Column (#155):
        NewDensity:0.000215, OldDensity:0.001799 BktCnt:254, PopBktCnt:81, PopValCnt:13, NDV:3175
      Column (#4):
        NewDensity:0.000001, OldDensity:0.000001 BktCnt:5178, PopBktCnt:5178, PopValCnt:6, NDV:7
      ColGroup (#1, Index) IDX_TRD_DVP_FEX
        Col#: 1 2 5 8 53 60 73 105 108    CorStregth: -1.00
      ColGroup (#2, Index) IDXTRDTAX
        Col#: 1 2 55 68 72 105 135    CorStregth: 52038.16
      ColGroup (#8, Index) IDX_TRADE_FX
        Col#: 1 2 5 8 105 108    CorStregth: -1.00
      ColGroup (#5, Index) IDX$$_4B290002
        Col#: 2 6 154 155 156    CorStregth: 2298053781278.98
      ColGroup (#7, Index) FEXIGNOREEXCEPTION_712A0001
        Col#: 2 5 8 105 108    CorStregth: -1.00
      ColGroup (#6, Index) IDX_TRADE_COMM
        Col#: 2 4 5 153    CorStregth: -1.00
      ColGroup (#9, Index) IDX$$_50990001
        Col#: 6 7 12    CorStregth: -1.00
      ColGroup (#10, Index) TRADE_UK1
        Col#: 1 6 46    CorStregth: -1.00
      ColGroup (#3, Index) IDX$$_173D0004
        Col#: 6 159    CorStregth: 575.99
      ColGroup (#4, Index) IDX$$_4B2C0004
        Col#: 16 157    CorStregth: 4.00
      ColGroup Usage:: PredCnt: 4  Matches Full:  Partial:
      Table: TRADE  Alias: T
        Card: Original: 471074.000000  Rounded: 1642  Computed: 1641.93  Non Adjusted: 1641.93
      Access Path: TableScan
        Cost:  12436.31  Resp: 12436.31  Degree: 0
          Cost_io: 12338.00  Cost_cpu: 2113927982
          Resp_io: 12338.00  Resp_cpu: 2113927982
    

    Comment by suresh — July 17, 2012 @ 6:11 am BST Jul 17,2012 | Reply

    • Suresh,

      It can take some time to work these things out, and it’s not a particularly interesting or educational task; so I only do it when I’m being paid to do it or puzzled by an example that’s appeared in a forum.

      In your example there are two things that would make it harder to do: (a) every column has a histogram – so you might have to look at the histogram data in great detail, and (b) you haven’t told me what the predicates are that the optimizer will be evaluating at this point – and not that they may NOT be exactly the predicates you wrote in your initial query.

      Comment by Jonathan Lewis — July 19, 2012 @ 10:57 am BST Jul 19,2012 | Reply

      • Hi Jonathan,

        Thanks for your time and Thanks for the comments too.

        There’s lot of predicates on the given table and not particularly using any index and doing a full table scan. To add more, the query in the question is having problem only on monday after a table re-org (ofcourse stats has been gathered after rebuild) and rest of the days its back to normal. No change in plans and nothing, hence trying to cornering how the cost has been derived. Will further dig more.

        I must say, I have been learning a lot with your blog and your nice book “CBO – fundamentals”, Thank you…

        Regards
        Suresh

        Comment by suresh — July 19, 2012 @ 3:21 pm BST Jul 19,2012 | Reply

  5. First time I’m able to compute New Density. Thanks for a simple and detailed explanation.

    However, cardinality estimate is off. My column is not null column with FBI on it (upper(column)) and no popular values.

    Predicate is “col like :bindvar||’%'”. Bind value is within range.

        NewDensity:0.000015, OldDensity:0.000239 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:67072
      Column (#41): SYS_NC00041$(
        AvgLen: 50 NDV: 67072 Nulls: 0 Density: 0.000015
        Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255
        Card: Original: 309157.000000  Rounded: 15458  Computed: 15457.85  Non Adjusted: 15457.85
    

    Cardinality = 0.000015 * 309157 = 4.637355 vs 15458 (computed by CBO)

    Not sure what type of math I should do for a unbounded range…

    Will try to figure out by going through your book. But, do you by any chance think this is a bug?

    Appreciate your contribution to Oracle community,
    Thanks!

    Comment by PerfNewbie — December 14, 2012 @ 12:36 am BST Dec 14,2012 | Reply

    • It’s always worth spending a couple of minutes working backwards just in case it gives you a clue:

      Oracle’s cardinality = 15458
      Oracle’s number of rows input = 309157
      So Oracle’s selectivity = 15458 / 309157 = 0.05000049

      Oracle has used the standard 5% for “column > unknown value” in this case.

      Comment by Jonathan Lewis — December 14, 2012 @ 6:48 am BST Dec 14,2012 | Reply

      • Oops! Sorry (embarrassed). I got hung up with doing math to derive cardinality…I should have simply checked for default. I realized only when driving back home (just need some diversion I guess).

        Thanks for your reply.

        Comment by PerfNewbie — December 14, 2012 @ 9:53 pm BST Dec 14,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,173 other followers