Oracle Scratchpad

April 25, 2008

Cardinality Change

Filed under: CBO,Execution plans,Histograms,Statistics,Troubleshooting — Jonathan Lewis @ 8:13 pm BST Apr 25,2008

Here’s an interesting point that’s just been raised on the Oracle-L list-server:

Metalink bug: 5483301.8

If a column has a frequency histogram and a query uses an equality predicate over the column looking for a value that does not exist in the histogram then the CBO was estimating a cardinality of 1. This could favor Nested Loops too much.

The fix introduces a notable change in behavior: instead of estimating a cardinality of 1 then with this fix CBO estimates a cardinality of (0.5 * the lowest cardinality found in the histogram). This can result in plan changes.

Funnily¬†enough, I had a client who ran into a problem assoiated with this cardinality effect just a couple of weeks ago – although their “real” problem was that the automatic stats collection routine in 10g had created a histogram that wasn’t needed. So I was nearly pleased to see the change – except that (a) it spoils the solution I’ve previously used for this problem and (b) any change is likely to be bad for some people.

The latter problem (one man’s fix is another man’s bug) appears as bug 6082745.8 – which says that if the patch for 5483301 causes you problems, you either have to drop the frequency histogram, or you could “unfix” the patch by setting parameter “_fix_control” to ’5483301:off’

[Further reading on Histograms]

3 Comments »

  1. Hi Jonathan, that new behaviour (5483301 is supposed to be included in 10.2.0.4) is quite interesting when skew data distribution and when for large volume – or in any case where stats gathering with high sampling rate is prohibitive -.

    But it seems to fail: look at below testcase there is a frequency histogram for column SKEW and when an out of range value for SKEW is used in the equality predicate (the last query), Oracle estimates cardinality of 1; due to included in 10.2.0.4 5483301 patch I would expect a cardinality of about (2491 * 10) / 2 = 12500 !

    So what is wrong ?

    Thank you for your comment.


    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options

    drop table t;

    create table t as
    select
           case when mod(rownum, 4) = 1 then 1 else 2 end as skew
    ,       cast('x' as char(500)) as x
    from dual
    connect by rownum  , estimate_percent => 10 -
    > , method_opt => 'for columns size skewonly skew')

    col column_name format a8
    col low_value format 99999999
    col high_value format 99999999
    col endpoint_actual_value format a10

    ------------------------------------------------------------------------------
    select sample_size, blocks, num_rows, last_analyzed
    from user_tab_statistics
    where table_name = 'T'
    ;

    SAMPLE_SIZE     BLOCKS   NUM_ROWS LAST_ANAL
    ----------- ---------- ---------- ---------
         10025       3634     100250 19-AUG-08

    select column_name, sample_size, num_distinct, num_nulls, num_buckets
    , histogram
    , utl_raw.CAST_TO_NUMBER(low_value) as low_value
    , utl_raw.CAST_TO_NUMBER(high_value) as high_value
    from user_tab_columns
    where table_name = 'T'
    and column_name = 'SKEW'
    ;

    COLUMN_N SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS NUM_BUCKETS HISTOGRAM       LOW_VALUE HIGH_VALUE
    -------- ----------- ------------ ---------- ----------- --------------- --------- ----------
    SKEW           10025            2          0           2 FREQUENCY               1          2

    select column_name, endpoint_number, endpoint_value
    from user_tab_histograms
    where table_name = 'T'
    and column_name = 'SKEW'
    ;

    COLUMN_N ENDPOINT_NUMBER ENDPOINT_VALUE
    -------- --------------- --------------
    SKEW                2491              1
    SKEW               10025              2

    -- In range cardinality on skew distribution
    explain plan for
    select * from t where skew = 2;

    select * from table(dbms_xplan.display(null, null, 'BASIC ROWS'));

    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------------------------
    Plan hash value: 1601196873

    ------------------------------------------
    | Id  | Operation         | Name | Rows  |
    ------------------------------------------
    |   0 | SELECT STATEMENT  |      | 75335 |
    |   1 |  TABLE ACCESS FULL| T    | 75335 |
    ------------------------------------------
    rollback;

    -- Out of range cardinality on skew distribution
    explain plan for
    select * from t where skew = 4;

    select * from table(dbms_xplan.display(null, null, 'BASIC ROWS'));

    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------------------------
    Plan hash value: 1601196873

    ------------------------------------------
    | Id  | Operation         | Name | Rows  |
    ------------------------------------------
    |   0 | SELECT STATEMENT  |      |     1 |
    |   1 |  TABLE ACCESS FULL| T    |     1 |
    ------------------------------------------
    rollback;

    Comment by Yves Debizet — August 19, 2008 @ 8:34 am BST Aug 19,2008 | Reply

  2. Yves,

    I haven’t upgraded any of my 10g databases to 10.2.0.4 yet, so I can’t test any ideas; but one possibility is that there are two conflicting strategies here – one is the “not present” strategy for frequency histograms, the other is the normal “out of bounds” strategy for dealing with values outside the low/high values.

    The cardinality of one for the input of 4 could be because the legal range is 1 – 2, with a range of 1; and the value 4 is outside the legal range by twice the size of the range.

    Try repeating the experiment with legal values 1 and 10, and query for values inside the range (2,5,9 say), then a couple of value outside the range (12, 15, 21, 40 say) to see if the “half the min” rule is obeyed when you are inside the range, and some form of ‘tapering’ effect appears as you go outside the range.

    Comment by Jonathan Lewis — August 19, 2008 @ 5:56 pm BST Aug 19,2008 | Reply

  3. You are right: when the queried value is within bounds, cardinality is as expected by patch 5483301 description.
    This patch seems quite uncomplete to me; behaviour should also apply to out of bound values !
    I explain:
    * you have a large table and cannot use 100% sampling rate for stats gathering; you then use a small sampling rate (10% for example)
    * due to using a small sampling rate you will probably miss some distinct values when the cardinality of these values are small and there is no reason why the missing distinct values are within the bounds of gathered bounds !

    drop table t;

    create table t as
    select
    case when mod(rownum, 4) = 1 then 1 else 3 end as skew
    , cast(‘x’ as char(500)) as x
    from dual
    connect by rownum >>>>>
    – Html conversion does not seem to reproduce correctly the above query. Read T is populated with 100000 rows

    exec dbms_stats.gather_table_stats(null, ‘T’, estimate_percent => 10 -, method_opt => ‘for columns size skewonly skew’)

    col column_name format a8
    col low_value format 99999999
    col high_value format 99999999
    col endpoint_actual_value format a10

    ——————————————————————————
    select sample_size, blocks, num_rows, last_analyzed
    from user_tab_statistics
    where table_name = ‘T’
    ;

    SAMPLE_SIZE BLOCKS NUM_ROWS LAST_ANAL
    ———– ———- ———- ———
    10344 3634 103440 20-AUG-08

    select column_name, sample_size, num_distinct, num_nulls, num_buckets
    , histogram
    , utl_raw.CAST_TO_NUMBER(low_value) as low_value
    , utl_raw.CAST_TO_NUMBER(high_value) as high_value
    from user_tab_columns
    where table_name = ‘T’
    and column_name = ‘SKEW’
    ;

    COLUMN_N SAMPLE_SIZE NUM_DISTINCT NUM_NULLS NUM_BUCKETS HISTOGRAM LOW_VALUE HIGH_VALUE
    ——– ———– ———— ———- ———– ————— ——— ———-
    SKEW 10344 2 0 2 FREQUENCY 1 3

    select column_name, endpoint_number, endpoint_value
    from user_tab_histograms
    where table_name = ‘T’
    and column_name = ‘SKEW’
    ;

    COLUMN_N ENDPOINT_NUMBER ENDPOINT_VALUE
    ——– ————— ————–
    SKEW 2562 1
    SKEW 10344 3

    – In range cardinality on skew distribution
    explain plan for
    select * from t where skew = 3;

    select * from table(dbms_xplan.display(null, null, ‘BASIC ROWS’));

    PLAN_TABLE_OUTPUT
    ———————————————————————————————
    Plan hash value: 1601196873

    ——————————————
    | Id | Operation | Name | Rows |
    ——————————————
    | 0 | SELECT STATEMENT | | 77815 |
    | 1 | TABLE ACCESS FULL| T | 77815 |
    ——————————————

    – Out of range cardinality on skew distribution
    explain plan for
    select * from t where skew = 2;

    select * from table(dbms_xplan.display(null, null, ‘BASIC ROWS’));

    PLAN_TABLE_OUTPUT
    ———————————————————————————————
    Plan hash value: 1601196873

    ——————————————
    | Id | Operation | Name | Rows |
    ——————————————
    | 0 | SELECT STATEMENT | | 12810 |
    | 1 | TABLE ACCESS FULL| T | 12810 |
    ——————————————

    Comment by Yves Debizet — August 20, 2008 @ 3:26 pm BST Aug 20,2008 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,514 other followers