Oracle Scratchpad

November 29, 2009

Cardinality

Filed under: CBO — Jonathan Lewis @ 7:49 pm BST Nov 29,2009

I received an email recently which said:

I have a question about Page 54 of Cost Based Oracle.

10g Update
As with equalities, 10.1.0.4 suddenly changes what happens when you fall outside the low/high limits. The arithmetic, or picture, used for ranges outside the limits is just the same as the new mechanism we saw for equalities.

I can’t work out how the formula provides a cardinality of 82 for the values ” month_no between 14 and 17″. Can you please elaborate on the out of bounds formula ?

The example in the book was only intended to show the general pattern of behaviour, and I didn’t attempt to explain what I thought the optimizer was doing – which is an odd little oversight.

The code to create the table, and the example I ran were as follows:

create table audience as
select
	trunc(dbms_random.value(1,13))	month_no
from
	all_objects
where
	rownum <= 1200
;
-- Collect statistics

select	count(*)
from	audience
where	month_no between 14 and 17;

---------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     3 |     2 |
|   1 |  SORT AGGREGATE    |          |     1 |     3 |       |
|*  2 |   TABLE ACCESS FULL| AUDIENCE |    82 |   246 |     2 |
---------------------------------------------------------------

Now run the query with the following predicates:

month_no between 14 and 16
month_no between 14 and 17
month_no between 14 and 18
month_no between 14 and 21
month_no between 14 and 27
month_no = 14

You will see that the cardinality doesn’t change. It is the cardinality of month_no = 14 … the value at the lower end of the range.

To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps – which means 9.0909 per step.

 Since 14 is two steps from 12, the cardinality is (100 – 2 * 9.0909) which rounds to 82.

17 Comments »

  1. To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps – which means 9.0909 per step.

    Since 14 is two steps from 12, the cardinality is (100 – 2 * 9.0909) which rounds to 82.

    Jonathan,

    Did not get that (at all). After collecting statistics (which do contain low and high values) for columns, why does CBO still try to “guess” the cardinality for the values that are out of range? The EXPLAIN PLAN shows number of rows produced by the step, right?

    Comment by Narendra — November 30, 2009 @ 9:51 am BST Nov 30,2009 | Reply

  2. Here is a page in the book which describes such behavior. Most likely the change was made to cope with constantly increasing values and not-so-frequently collected statistics.

    Comment by Timur Akhmadeev — November 30, 2009 @ 10:09 am BST Nov 30,2009 | Reply

  3. Narenda,

    Don’t ask me WHY Oracle uses a given strategy – I didn’t write the specification or code; however I think that Timur’s suggestion is probably the right one.

    The number of rows reported after using explain plan is the optimizer’s estimate of the number that will be produced (for each call to that line of the plan). It’s a figure that may be completely wrong.

    Comment by Jonathan Lewis — November 30, 2009 @ 10:41 am BST Nov 30,2009 | Reply

    • Jonathan,

      Thanks. I know you did not write specification or code. The only reason for asking the WHY question was I have seen you demonstrate how CBO works (or might be working) in many other cases.

      Comment by Narendra — November 30, 2009 @ 1:42 pm BST Nov 30,2009 | Reply

  4. BTW, it seems like things has changed in 11gR2:

    explain plan set statement_id='s14' for
    select count(*) from audience where month_no = 14;
    
    explain plan set statement_id='s15' for
    select count(*) from audience where month_no = 15;
    
    explain plan set statement_id='s14-17' for
    select count(*) from audience where month_no between 14 and 17;
    
    explain plan set statement_id='s14-16' for
    select count(*) from audience where month_no between 14 and 16;
    
    explain plan set statement_id='s14-18' for
    select count(*) from audience where month_no between 14 and 18;
    
    explain plan set statement_id='s14-21' for
    select count(*) from audience where month_no between 14 and 21;
    
    explain plan set statement_id='s14-27' for
    select count(*) from audience where month_no between 14 and 27;
    
    select statement_id, cardinality
      from plan_table
     where statement_id like 's1%' and id = 2;
    
    STATEMENT_ID                   CARDINALITY
    ------------------------------ -----------
    s14                                     82
    s15                                     73
    s14-17                                 100
    s14-16                                 100
    s14-18                                 100
    s14-21                                 100
    s14-27                                 100
    

    or this is a bug, since 10053 reports “Using prorated density: 0.068182″ which should result in cardinality of 82, but reports cardinality just like for a MONTH_NO=CONST predicate.

    Comment by Timur Akhmadeev — November 30, 2009 @ 10:44 am BST Nov 30,2009 | Reply

    • Timur,

      Thanks for that tip. I still haven’t got around to installing 11.2 so can’t investigate at present.

      It’s possible that it’s deliberate – for example the out-of-range queries may be limited to the maximum single value in range.

      Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24, or do you get an increasing set of values ? Can you pick another set of ranges and see a set of results that increase until they hit 100 ?

      Comment by Jonathan Lewis — December 12, 2009 @ 11:33 am BST Dec 12,2009 | Reply

      • behaviour is same on 10.2.0.4 as well. Jonathan which version are your results coming from ?

        s14 82
        s15 73
        s14-17 100
        s14-16 100
        s14-18 100
        s14-21 100
        s14-27 100
        s21-22 100
        s21-23 100
        s21-24 100

        Comment by coskan — December 13, 2009 @ 7:38 pm BST Dec 13,2009 | Reply

        • Coskan,

          Thanks for that – I’m running on 10.2.0.3 at the moment.
          I wonder if the change is hidden somewhere in the code relating to frequency histograms that appeared in that upgrade.

          Comment by Jonathan Lewis — December 16, 2009 @ 10:07 am BST Dec 16,2009

      • >Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24
        Yes. If low bound is out of range, then cardinality equals to 100.

        >I still haven’t got around to installing 11.2 so can’t investigate at present.
        If you need a VMware image of OEL5u4 with 11.2.0.1 EE, I can upload it (I think it’s about 8GB in compressed form).

        Comment by Timur Akhmadeev — December 14, 2009 @ 2:27 pm BST Dec 14,2009 | Reply

        • Timur,

          I was wondering whether the 100 was an upper bound for range predicates that were out of range – but it seems not.

          Thanks for the offer – I finally got around to installing OEL and 11.2 on my daughter’s old laptop last night.

          Comment by Jonathan Lewis — December 16, 2009 @ 10:33 am BST Dec 16,2009

  5. Jonathan,

    Do you know why the optimizer uses 100 as the point to count back from? Does this change given the size of the table?

    Thanks,

    John

    Comment by John — December 4, 2009 @ 6:50 pm BST Dec 4,2009 | Reply

    • John,

      There are a couple of minor variations on the theme – but the basic answer is that that’s the selectivity in this case for “column = (non-popular) constant”.

      Comment by Jonathan Lewis — December 12, 2009 @ 11:18 am BST Dec 12,2009 | Reply

      • The cardinality of an index is the number of unique values in the indexed field.For example, if you were a car manufacturer and you had a database containing one row for every car that rolls off the assembly line, the primary key might be the VIN, and a secondary key might be the color. If you make cars that are red, blue, or green, then the cardinality of the color index would be three. If you are Henry Ford building model Ts, then the cardinality would be one, since any customer can have a car painted any colour that he wants so long as it is black.

        Comment by Aatif — February 2, 2012 @ 6:20 am BST Feb 2,2012 | Reply

  6. Going with the same example in the book, when I am running on 11.2.0.3 i am not sure why I am getting card of 100 with bind variable.

    SQL> variable b1 number;
    SQL> set autotrace traceonly explain
    SQL> select count(*) from audience where month_no > :b1;
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3337892515
    
    -------------------------------------------------------------------------------
    | Id  | Operation	   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |	      |     1 |     3 |     3	(0)| 00:00:01 |
    |   1 |  SORT AGGREGATE    |	      |     1 |     3 | 	   |	      |
    |*  2 |   TABLE ACCESS FULL| AUDIENCE |   101 |   303 |     3	(0)| 00:00:01 |
    -------------------------------------------------------------------------------
    
    
    When I check the 10053 trace for the same sql I get the correct 5% of 1200 rows ie 60 for cardinality. 
    
    I am not sure if I am doing something wrong or the behavior got changed in 11g R2.
    
    From 11053 trace:
    
    OPTIMIZER INFORMATION
    
    ******************************************
    /AUDIENCE
    --------------------------------------+-----------------------------------+
    | 0   | SELECT STATEMENT    |         |       |       |     3 |           |
    | 1   |  SORT AGGREGATE     |         |     1 |     3 |       |           |
    | 2   |   TABLE ACCESS FULL | AUDIENCE|    60 |   180 |     3 |  00:00:01 |
    --------------------------------------+-----------------------------------+
    Predicate Information:
    ----------------------
    2 - filter("MONTH_NO">:B1)
    
    Content of other_xml column
    ===========================
      db_version     : 11.2.0.3
      parse_schema   : A
      plan_hash      : 3337892515
      plan_hash_2    : 1029935770
    Peeked Binds
    ============
      Bind variable information
        position=1
        datatype(code)=2
        datatype(string)=NUMBER
        precision=0
        scale=0
        max length=22
    

    Comment by Amar — July 29, 2014 @ 4:15 pm BST Jul 29,2014 | Reply

    • Amar,

      I’ve checked the example, and the change from 60 to 101 for the cardinality estimate when using explain plan is just what happens – somewhere around the page the example comes from I did make the comment that “things change”, and this seems to be one of them. (In fact the change to 101 also shows up in 9.2.0.8). With rounding errors it looks like the selectivity has turned into the selectivity of “column = {in range constant}”, just as it is for “column > {out of range constant}”.

      As for the anomaly in the second part of your question, without knowing exactly what steps you took to generate the trace file I can’t work out why you’re seeing 60 as the estimate. Remember that autotrace (which uses explain plan) doesn’t look at the bind variables and assumes that they’re character strings of unknown value. In the versions I’ve just tested I don’t get a “Peeked Binds” section in the 10053 when I use “autotrace traceonly explain”, and I get an actual value displayed (which hasn’t appeared in your output) when I run the query.

      Comment by Jonathan Lewis — August 5, 2014 @ 12:32 am BST Aug 5,2014 | Reply

      • Hi Jonathan,
        The value of 101 (keeping aside round off errors) seems to be close to be following the {column = constant} formula which gives a value of 100 as selectivity for table of 1200 rows with 12 distinct column value.

        So I was just curious to know if you come across this behavior as i mentioned in the first part (Getting card. of 101 instead of 60). Does that means in release 11g and above 5% selectivity does not hold good anymore for unclosed range based predicates and we should substitute it with formula which we use for calculating {column=constant}?

        Comment by Amar — August 5, 2014 @ 3:04 pm BST Aug 5,2014 | Reply

        • Amar,

          Your first paragraph is simple repeating my first paragraph – and clearly I have seen the behaviour you’ve described … even in 9.2.0.8

          My second paragraph emphasises the word EXACTLY – I can’t reproduce your result because you haven’t described your experiment; I’ve also warned that autotrace and explain plan don’t behave like runtime. Obviously I’m not going to give you a wide-ranging strategy in response to a single example which hasn’t been described properly and which I can’t reproduce.

          Comment by Jonathan Lewis — August 15, 2014 @ 11:31 am BST Aug 15,2014


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,990 other followers