Oracle Scratchpad

November 29, 2009

Cardinality

Filed under: CBO — Jonathan Lewis @ 7:49 pm BST Nov 29,2009

I received an email recently which said:

I have a question about Page 54 of Cost Based Oracle.

10g Update
As with equalities, 10.1.0.4 suddenly changes what happens when you fall outside the low/high limits. The arithmetic, or picture, used for ranges outside the limits is just the same as the new mechanism we saw for equalities.

I can’t work out how the formula provides a cardinality of 82 for the values ” month_no between 14 and 17″. Can you please elaborate on the out of bounds formula ?

The example in the book was only intended to show the general pattern of behaviour, and I didn’t attempt to explain what I thought the optimizer was doing – which is an odd little oversight.

The code to create the table, and the example I ran were as follows:

create table audience as
select
	trunc(dbms_random.value(1,13))	month_no
from
	all_objects
where
	rownum <= 1200
;
-- Collect statistics

select	count(*)
from	audience
where	month_no between 14 and 17;

---------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |     3 |     2 |
|   1 |  SORT AGGREGATE    |          |     1 |     3 |       |
|*  2 |   TABLE ACCESS FULL| AUDIENCE |    82 |   246 |     2 |
---------------------------------------------------------------

Now run the query with the following predicates:

month_no between 14 and 16
month_no between 14 and 17
month_no between 14 and 18
month_no between 14 and 21
month_no between 14 and 27
month_no = 14

You will see that the cardinality doesn't change. It is the cardinality of month_no = 14 ... the value at the lower end of the range.

To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps - which means 9.0909 per step.

 Since 14 is two steps from 12, the cardinality is (100 - 2 * 9.0909) which rounds to 82.

13 Comments »

  1. To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps – which means 9.0909 per step.

    Since 14 is two steps from 12, the cardinality is (100 – 2 * 9.0909) which rounds to 82.

    Jonathan,

    Did not get that (at all). After collecting statistics (which do contain low and high values) for columns, why does CBO still try to “guess” the cardinality for the values that are out of range? The EXPLAIN PLAN shows number of rows produced by the step, right?

    Comment by Narendra — November 30, 2009 @ 9:51 am BST Nov 30,2009 | Reply

  2. Here is a page in the book which describes such behavior. Most likely the change was made to cope with constantly increasing values and not-so-frequently collected statistics.

    Comment by Timur Akhmadeev — November 30, 2009 @ 10:09 am BST Nov 30,2009 | Reply

  3. Narenda,

    Don’t ask me WHY Oracle uses a given strategy – I didn’t write the specification or code; however I think that Timur’s suggestion is probably the right one.

    The number of rows reported after using explain plan is the optimizer’s estimate of the number that will be produced (for each call to that line of the plan). It’s a figure that may be completely wrong.

    Comment by Jonathan Lewis — November 30, 2009 @ 10:41 am BST Nov 30,2009 | Reply

    • Jonathan,

      Thanks. I know you did not write specification or code. The only reason for asking the WHY question was I have seen you demonstrate how CBO works (or might be working) in many other cases.

      Comment by Narendra — November 30, 2009 @ 1:42 pm BST Nov 30,2009 | Reply

  4. BTW, it seems like things has changed in 11gR2:

    explain plan set statement_id='s14' for
    select count(*) from audience where month_no = 14;
    
    explain plan set statement_id='s15' for
    select count(*) from audience where month_no = 15;
    
    explain plan set statement_id='s14-17' for
    select count(*) from audience where month_no between 14 and 17;
    
    explain plan set statement_id='s14-16' for
    select count(*) from audience where month_no between 14 and 16;
    
    explain plan set statement_id='s14-18' for
    select count(*) from audience where month_no between 14 and 18;
    
    explain plan set statement_id='s14-21' for
    select count(*) from audience where month_no between 14 and 21;
    
    explain plan set statement_id='s14-27' for
    select count(*) from audience where month_no between 14 and 27;
    
    select statement_id, cardinality
      from plan_table
     where statement_id like 's1%' and id = 2;
    
    STATEMENT_ID                   CARDINALITY
    ------------------------------ -----------
    s14                                     82
    s15                                     73
    s14-17                                 100
    s14-16                                 100
    s14-18                                 100
    s14-21                                 100
    s14-27                                 100
    

    or this is a bug, since 10053 reports "Using prorated density: 0.068182" which should result in cardinality of 82, but reports cardinality just like for a MONTH_NO=CONST predicate.

    Comment by Timur Akhmadeev — November 30, 2009 @ 10:44 am BST Nov 30,2009 | Reply

    • Timur,

      Thanks for that tip. I still haven’t got around to installing 11.2 so can’t investigate at present.

      It’s possible that it’s deliberate – for example the out-of-range queries may be limited to the maximum single value in range.

      Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24, or do you get an increasing set of values ? Can you pick another set of ranges and see a set of results that increase until they hit 100 ?

      Comment by Jonathan Lewis — December 12, 2009 @ 11:33 am BST Dec 12,2009 | Reply

      • behaviour is same on 10.2.0.4 as well. Jonathan which version are your results coming from ?

        s14 82
        s15 73
        s14-17 100
        s14-16 100
        s14-18 100
        s14-21 100
        s14-27 100
        s21-22 100
        s21-23 100
        s21-24 100

        Comment by coskan — December 13, 2009 @ 7:38 pm BST Dec 13,2009 | Reply

        • Coskan,

          Thanks for that – I’m running on 10.2.0.3 at the moment.
          I wonder if the change is hidden somewhere in the code relating to frequency histograms that appeared in that upgrade.

          Comment by Jonathan Lewis — December 16, 2009 @ 10:07 am BST Dec 16,2009

      • >Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24
        Yes. If low bound is out of range, then cardinality equals to 100.

        >I still haven’t got around to installing 11.2 so can’t investigate at present.
        If you need a VMware image of OEL5u4 with 11.2.0.1 EE, I can upload it (I think it’s about 8GB in compressed form).

        Comment by Timur Akhmadeev — December 14, 2009 @ 2:27 pm BST Dec 14,2009 | Reply

        • Timur,

          I was wondering whether the 100 was an upper bound for range predicates that were out of range – but it seems not.

          Thanks for the offer – I finally got around to installing OEL and 11.2 on my daughter’s old laptop last night.

          Comment by Jonathan Lewis — December 16, 2009 @ 10:33 am BST Dec 16,2009

  5. Jonathan,

    Do you know why the optimizer uses 100 as the point to count back from? Does this change given the size of the table?

    Thanks,

    John

    Comment by John — December 4, 2009 @ 6:50 pm BST Dec 4,2009 | Reply

    • John,

      There are a couple of minor variations on the theme – but the basic answer is that that’s the selectivity in this case for “column = (non-popular) constant”.

      Comment by Jonathan Lewis — December 12, 2009 @ 11:18 am BST Dec 12,2009 | Reply

      • The cardinality of an index is the number of unique values in the indexed field.For example, if you were a car manufacturer and you had a database containing one row for every car that rolls off the assembly line, the primary key might be the VIN, and a secondary key might be the color. If you make cars that are red, blue, or green, then the cardinality of the color index would be three. If you are Henry Ford building model Ts, then the cardinality would be one, since any customer can have a car painted any colour that he wants so long as it is black.

        Comment by Aatif — February 2, 2012 @ 6:20 am BST Feb 2,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,873 other followers