Here’s an interesting point that’s just been raised on the Oracle-L list-server:

Metalink bug: 5483301.8

If a column has a frequency histogram and a query uses an equality predicate over the column looking for a value that does not exist in the histogram then the CBO was estimating a cardinality of 1. This could favor Nested Loops too much.

The fix introduces a notable change in behavior: instead of estimating a cardinality of 1 then with this fix CBO estimates a cardinality of (0.5 * the lowest cardinality found in the histogram). This can result in plan changes.

Funnily enough, I had a client who ran into a problem assoiated with this cardinality effect just a couple of weeks ago – although their “real” problem was that the automatic stats collection routine in 10g had created a histogram that wasn’t needed. So I was nearly pleased to see the change – except that (a) it spoils the solution I’ve previously used for this problem and (b) any change is likely to be bad for some people.

The latter problem (one man’s fix is another man’s bug) appears as bug 6082745.8 – which says that if the patch for 5483301 causes you problems, you either have to drop the frequency histogram, or you could “unfix” the patch by setting parameter ** “_fix_control”** to

*’5483301:off’*

Hi Jonathan, that new behaviour (5483301 is supposed to be included in 10.2.0.4) is quite interesting when skew data distribution and when for large volume – or in any case where stats gathering with high sampling rate is prohibitive -.

But it seems to fail: look at below testcase there is a frequency histogram for column SKEW and when an out of range value for SKEW is used in the equality predicate (the last query), Oracle estimates cardinality of 1; due to included in 10.2.0.4 5483301 patch I would expect a cardinality of about (2491 * 10) / 2 = 12500 !

So what is wrong ?

Thank you for your comment.

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

drop table t;

create table t as

select

case when mod(rownum, 4) = 1 then 1 else 2 end as skew

, cast('x' as char(500)) as x

from dual

connect by rownum , estimate_percent => 10 -

> , method_opt => 'for columns size skewonly skew')

col column_name format a8

col low_value format 99999999

col high_value format 99999999

col endpoint_actual_value format a10

------------------------------------------------------------------------------

select sample_size, blocks, num_rows, last_analyzed

from user_tab_statistics

where table_name = 'T'

;

SAMPLE_SIZE BLOCKS NUM_ROWS LAST_ANAL

----------- ---------- ---------- ---------

10025 3634 100250 19-AUG-08

select column_name, sample_size, num_distinct, num_nulls, num_buckets

, histogram

, utl_raw.CAST_TO_NUMBER(low_value) as low_value

, utl_raw.CAST_TO_NUMBER(high_value) as high_value

from user_tab_columns

where table_name = 'T'

and column_name = 'SKEW'

;

COLUMN_N SAMPLE_SIZE NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM LOW_VALUE HIGH_VALUE

-------- ----------- ------------ ---------- ----------- --------------- --------- ----------

SKEW 10025 2 0 2 FREQUENCY 1 2

select column_name, endpoint_number, endpoint_value

from user_tab_histograms

where table_name = 'T'

and column_name = 'SKEW'

;

COLUMN_N ENDPOINT_NUMBER ENDPOINT_VALUE

-------- --------------- --------------

SKEW 2491 1

SKEW 10025 2

-- In range cardinality on skew distribution

explain plan for

select * from t where skew = 2;

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS'));

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------

Plan hash value: 1601196873

------------------------------------------

| Id | Operation | Name | Rows |

------------------------------------------

| 0 | SELECT STATEMENT | | 75335 |

| 1 | TABLE ACCESS FULL| T | 75335 |

------------------------------------------

rollback;

-- Out of range cardinality on skew distribution

explain plan for

select * from t where skew = 4;

select * from table(dbms_xplan.display(null, null, 'BASIC ROWS'));

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------

Plan hash value: 1601196873

`------------------------------------------`

| Id | Operation | Name | Rows |

------------------------------------------

| 0 | SELECT STATEMENT | | 1 |

| 1 | TABLE ACCESS FULL| T | 1 |

------------------------------------------

rollback;

Comment by Yves Debizet — August 19, 2008 @ 8:34 am BST Aug 19,2008 |

Yves,

I haven’t upgraded any of my 10g databases to 10.2.0.4 yet, so I can’t test any ideas; but one possibility is that there are two conflicting strategies here – one is the “not present” strategy for frequency histograms, the other is the normal “out of bounds” strategy for dealing with values outside the low/high values.

The cardinality of one for the input of 4 could be because the legal range is 1 – 2, with a range of 1; and the value 4 is outside the legal range by twice the size of the range.

Try repeating the experiment with legal values 1 and 10, and query for values inside the range (2,5,9 say), then a couple of value outside the range (12, 15, 21, 40 say) to see if the “half the min” rule is obeyed when you are inside the range, and some form of ‘tapering’ effect appears as you go outside the range.

Comment by Jonathan Lewis — August 19, 2008 @ 5:56 pm BST Aug 19,2008 |

You are right: when the queried value is within bounds, cardinality is as expected by patch 5483301 description.

This patch seems quite uncomplete to me; behaviour should also apply to out of bound values !

I explain:

* you have a large table and cannot use 100% sampling rate for stats gathering; you then use a small sampling rate (10% for example)

* due to using a small sampling rate you will probably miss some distinct values when the cardinality of these values are small and there is no reason why the missing distinct values are within the bounds of gathered bounds !

drop table t;

create table t as

select

case when mod(rownum, 4) = 1 then 1 else 3 end as skew

, cast(‘x’ as char(500)) as x

from dual

connect by rownum >>>>>

– Html conversion does not seem to reproduce correctly the above query. Read T is populated with 100000 rows

exec dbms_stats.gather_table_stats(null, ‘T’, estimate_percent => 10 -, method_opt => ‘for columns size skewonly skew’)

col column_name format a8

col low_value format 99999999

col high_value format 99999999

col endpoint_actual_value format a10

——————————————————————————

select sample_size, blocks, num_rows, last_analyzed

from user_tab_statistics

where table_name = ‘T’

;

SAMPLE_SIZE BLOCKS NUM_ROWS LAST_ANAL

———– ———- ———- ———

10344 3634 103440 20-AUG-08

select column_name, sample_size, num_distinct, num_nulls, num_buckets

, histogram

, utl_raw.CAST_TO_NUMBER(low_value) as low_value

, utl_raw.CAST_TO_NUMBER(high_value) as high_value

from user_tab_columns

where table_name = ‘T’

and column_name = ‘SKEW’

;

COLUMN_N SAMPLE_SIZE NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM LOW_VALUE HIGH_VALUE

——– ———– ———— ———- ———– ————— ——— ———-

SKEW 10344 2 0 2 FREQUENCY 1 3

select column_name, endpoint_number, endpoint_value

from user_tab_histograms

where table_name = ‘T’

and column_name = ‘SKEW’

;

COLUMN_N ENDPOINT_NUMBER ENDPOINT_VALUE

——– ————— ————–

SKEW 2562 1

SKEW 10344 3

– In range cardinality on skew distribution

explain plan for

select * from t where skew = 3;

select * from table(dbms_xplan.display(null, null, ‘BASIC ROWS’));

PLAN_TABLE_OUTPUT

———————————————————————————————

Plan hash value: 1601196873

——————————————

| Id | Operation | Name | Rows |

——————————————

| 0 | SELECT STATEMENT | | 77815 |

| 1 | TABLE ACCESS FULL| T | 77815 |

——————————————

– Out of range cardinality on skew distribution

explain plan for

select * from t where skew = 2;

select * from table(dbms_xplan.display(null, null, ‘BASIC ROWS’));

PLAN_TABLE_OUTPUT

———————————————————————————————

Plan hash value: 1601196873

——————————————

| Id | Operation | Name | Rows |

——————————————

| 0 | SELECT STATEMENT | | 12810 |

| 1 | TABLE ACCESS FULL| T | 12810 |

——————————————

Comment by Yves Debizet — August 20, 2008 @ 3:26 pm BST Aug 20,2008 |