I received an email recently which said:
I have a question about Page 54 of Cost Based Oracle.
10g Update
As with equalities, 10.1.0.4 suddenly changes what happens when you fall outside the low/high limits. The arithmetic, or picture, used for ranges outside the limits is just the same as the new mechanism we saw for equalities.
I can’t work out how the formula provides a cardinality of 82 for the values ” month_no between 14 and 17″. Can you please elaborate on the out of bounds formula ?
The example in the book was only intended to show the general pattern of behaviour, and I didn’t attempt to explain what I thought the optimizer was doing – which is an odd little oversight.
The code to create the table, and the example I ran were as follows:
create table audience as select trunc(dbms_random.value(1,13)) month_no from all_objects where rownum <= 1200 ; -- Collect statistics select count(*) from audience where month_no between 14 and 17; --------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 2 | | 1 | SORT AGGREGATE | | 1 | 3 | | |* 2 | TABLE ACCESS FULL| AUDIENCE | 82 | 246 | 2 | ---------------------------------------------------------------
Now run the query with the following predicates:
month_no between 14 and 16
month_no between 14 and 17
month_no between 14 and 18
month_no between 14 and 21
month_no between 14 and 27
month_no = 14
You will see that the cardinality doesn’t change. It is the cardinality of month_no = 14 … the value at the lower end of the range.
To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps – which means 9.0909 per step.
Since 14 is two steps from 12, the cardinality is (100 – 2 * 9.0909) which rounds to 82.

To see why this cardinality is 82: it takes 11 steps to get from the known low value (1) to the known high value (12), so to extrapolate past the high value the optimizer steps from 100 down to zero in 11 more steps – which means 9.0909 per step.
Since 14 is two steps from 12, the cardinality is (100 – 2 * 9.0909) which rounds to 82.
Jonathan,
Did not get that (at all). After collecting statistics (which do contain low and high values) for columns, why does CBO still try to “guess” the cardinality for the values that are out of range? The EXPLAIN PLAN shows number of rows produced by the step, right?
Comment by Narendra — November 30, 2009 @ 9:51 am UTC Nov 30,2009 |
Here is a page in the book which describes such behavior. Most likely the change was made to cope with constantly increasing values and not-so-frequently collected statistics.
Comment by Timur Akhmadeev — November 30, 2009 @ 10:09 am UTC Nov 30,2009 |
Narenda,
Don’t ask me WHY Oracle uses a given strategy – I didn’t write the specification or code; however I think that Timur’s suggestion is probably the right one.
The number of rows reported after using explain plan is the optimizer’s estimate of the number that will be produced (for each call to that line of the plan). It’s a figure that may be completely wrong.
Comment by Jonathan Lewis — November 30, 2009 @ 10:41 am UTC Nov 30,2009 |
Jonathan,
Thanks. I know you did not write specification or code. The only reason for asking the WHY question was I have seen you demonstrate how CBO works (or might be working) in many other cases.
Comment by Narendra — November 30, 2009 @ 1:42 pm UTC Nov 30,2009 |
BTW, it seems like things has changed in 11gR2:
or this is a bug, since 10053 reports “Using prorated density: 0.068182″ which should result in cardinality of 82, but reports cardinality just like for a MONTH_NO=CONST predicate.
Comment by Timur Akhmadeev — November 30, 2009 @ 10:44 am UTC Nov 30,2009 |
Timur,
Thanks for that tip. I still haven’t got around to installing 11.2 so can’t investigate at present.
It’s possible that it’s deliberate – for example the out-of-range queries may be limited to the maximum single value in range.
Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24, or do you get an increasing set of values ? Can you pick another set of ranges and see a set of results that increase until they hit 100 ?
Comment by Jonathan Lewis — December 12, 2009 @ 11:33 am UTC Dec 12,2009 |
behaviour is same on 10.2.0.4 as well. Jonathan which version are your results coming from ?
s14 82
s15 73
s14-17 100
s14-16 100
s14-18 100
s14-21 100
s14-27 100
s21-22 100
s21-23 100
s21-24 100
Comment by coskan — December 13, 2009 @ 7:38 pm UTC Dec 13,2009 |
Coskan,
Thanks for that – I’m running on 10.2.0.3 at the moment.
I wonder if the change is hidden somewhere in the code relating to frequency histograms that appeared in that upgrade.
Comment by Jonathan Lewis — December 16, 2009 @ 10:07 am UTC Dec 16,2009
>Do you get 100 for a range of (say) 21 – 22, then 21 – 23, then 21 – 24
Yes. If low bound is out of range, then cardinality equals to 100.
>I still haven’t got around to installing 11.2 so can’t investigate at present.
If you need a VMware image of OEL5u4 with 11.2.0.1 EE, I can upload it (I think it’s about 8GB in compressed form).
Comment by Timur Akhmadeev — December 14, 2009 @ 2:27 pm UTC Dec 14,2009 |
Timur,
I was wondering whether the 100 was an upper bound for range predicates that were out of range – but it seems not.
Thanks for the offer – I finally got around to installing OEL and 11.2 on my daughter’s old laptop last night.
Comment by Jonathan Lewis — December 16, 2009 @ 10:33 am UTC Dec 16,2009
Jonathan,
Do you know why the optimizer uses 100 as the point to count back from? Does this change given the size of the table?
Thanks,
John
Comment by John — December 4, 2009 @ 6:50 pm UTC Dec 4,2009 |
John,
There are a couple of minor variations on the theme – but the basic answer is that that’s the selectivity in this case for “column = (non-popular) constant”.
Comment by Jonathan Lewis — December 12, 2009 @ 11:18 am UTC Dec 12,2009 |
The cardinality of an index is the number of unique values in the indexed field.For example, if you were a car manufacturer and you had a database containing one row for every car that rolls off the assembly line, the primary key might be the VIN, and a secondary key might be the color. If you make cars that are red, blue, or green, then the cardinality of the color index would be three. If you are Henry Ford building model Ts, then the cardinality would be one, since any customer can have a car painted any colour that he wants so long as it is black.
Comment by Aatif — February 2, 2012 @ 6:20 am UTC Feb 2,2012 |