Oracle Scratchpad

October 23, 2015

Histogram Limit

Filed under: Bugs,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:03 pm GMT Oct 23,2015

A surprising question came up on OTN a couple of days ago:

Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate perfomance is good.

In the absence of further information there are various reasons why this is possible – but the example in question was about a “versioning” table where the single very large value was used as the “not yet ended” value for the history of an id so, at a minimum, the table held columns (id, nstart, nend, other), and each id could appear many times with pairs of start and end values that supplied non-overlapping, covering ranges and one row that had the very large number as the end value.

Let’s jot down a few ideas about what the data (and stats) might look like.

Assuming every id appears “a few” time and every id has to have a “still valid” row this means that a very large fraction (say 10% to 25%, if “a few” means 4 to 9) of the rows hold the value 999999999999999999.

If you gather stats without a histogram then you should get the low and high, finding that the high is 999999999999999999 and that the range is enormous, and so the predicates “column = {high value}” and “column > {high value} – 1” should give very similar cardinalities.

If you collect stats with a histogram you should find the very popular high value even in a very small sample set (which is what happens with histogram collection in 11g, and even in 12c for hybrid histograms). In this case the histogram should spot the significance of the high value and again the two predicates should have very similar cardinalities.

At first sight there doesn’t seem to be a feasible way that the two cardinalities could be sufficiently different to cause a problem – so maybe there’s something about character conversion or maybe bind variable usage that hasn’t been mentioned. So to test a couple of the less likely ideas I built a data set using 11.2.0.4 – and found a bug:


create table t1 (
        id              number(18),
        nend            number(18),
        n1              number(18),
        small_vc        varchar2(10)
)
nologging
;

insert /*+ append */ into t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum -1               id,
        case mod(rownum - 1,4)
                when 0  then 999999999999999999
--              when 0  then 999999999999999
                        else mod(rownum - 1, 250000)
        end                     nend,
        rownum - 1              n1,
        rpad('x',10,'x')        small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 ; commit; select * from t1 where nend = 0; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size auto'
        );
end;
/

I have an nend column that is set to 999999999999999999 every 4th row in the table and otherwise has 4 rows per value for 187,500 other (relatively low) values. It’s probably a reasonable initial model of the original data. I’ve run a query with a predicate referencing nend before gathering stats so that the (default) auto option will build a histogram for nend. Then I’ve checked the execution plans for two critical queries:


explain plan for
select  *
from    t1
where   nend = 999999999999999999
;

select * from table(dbms_xplan.display);

explain plan for
select  *
from    t1
where   nend > 999999999999999998
;

select * from table(dbms_xplan.display);

And here’s the surprise – the two plans, in order:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |   108 |   625   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |   108 |   625   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND"=999999999999999999)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250K|  6591K|   627   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |   250K|  6591K|   627   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND">999999999999999998)


Clearly this should not happen – the equality test is way off, the inequality test is correct. The obvious first guess is that something funny has happened with the statistics so let’s see what they look like – the column stats (user_tab_cols) and the histogram stats (user_tab_histograms) seem like a good starting point:


select
        sample_size, num_distinct, histogram, num_buckets,
        substr(low_value,1,26) low_value, substr(high_value,1,26) high_value
from
        user_tab_cols
where
        table_name = 'T1'
and     column_name = 'NEND'
;

column endpoint_value format 999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'NEND'
order by
        endpoint_number
;

And here are the results (with a couple of hundred uninteresting rows eliminated from the histogram):


      Sample     Distinct HISTOGRAM          Buckets LOW_VALUE                  HIGH_VALUE
------------ ------------ --------------- ---------- -------------------------- --------------------------
       5,541      190,176 HEIGHT BALANCED        254 C102                       C9646464646464646464


ENDPOINT_NUMBER               ENDPOINT_VALUE
--------------- ----------------------------
              0                           19
              1                        1,225
              2                        2,503
              3                        3,911
              4                        4,806
...
            188                      247,479
            189                      248,754
            190                      249,862
            254    1,000,000,000,000,000,000

Oracle will have started with a 100% sample to collect stats on all the columns, but taken a small sample to test the need for a histogram on the nend column – and that’s why the sample size of 5,541 has appeared, but that’s not relevant to the problem in hand. The big question comes from endpoint_number 254 – why is the highest value in the histogram 1e19 when we know (and the column stats show) that the highest value is actually 999999999999999999 ?!

It’s a question to which I don’t have an answer – but I do know that

  • if your high value is 15 digits long (all 9s) then the histogram shows the right high value
  • if your high value is more that 15 9s then the histogram shows the high value plus 1
  • the value collected by the query that Oracle runs is the actual value (i.e. 18 9s)
  • if you use set_column stats to set 18 9s as the high value you still get 1e19 in the histogram

Once you see the stats you can understand why the OP sees the odd performance problem. If the histogram identifies 1e19 as a (very) popular value, leaving 999999999999999 as an “average” value with only 4 rows; on the other hand the query for greater than 999999999999999998 can see that there really are 250K rows with higher values.

Footnote:

Interestingly 12c does the same with the stats – introducing the 1e19 in the histogram – but still manages, somehow, to calculate the correct cardinality in the equality case. (There is one slight difference in 12c, the histogram is a hybrid histogram, not a height-balanced histogram).

Update 24th Oct 2015:

I realised late last night that I had written about this behaviour before – though possible just as a response on OTN or in some unpublished notes. According to the notes in a script called histogram_numeric_bug.sql that I found on my laptop this is “Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES”. (except that the base bugs that that one is linked to are probably not the same bug).

I suspect the problem is related to the way that character histograms are built based on a numeric representation of the string that takes the first few characters of the string, treats that resulting N bytes as a hex number, converts to decimal and then applies round(N,-14) to restrict the precision stored. The effect with strings is that (broadly speaking) you get fifteen digits precision – which is exactly what I seem to be seeing with numbers.

The reason that 12c can get the right answers despite storing the wrong endpoint_value is that it’s also storing to_char() of the right value as the endpoint_actual_value – possibly doing this any time it has had to round the endpoint_value (for character strings the endpoint_actual_value was only populated if two entries in the endpoint_value were the same).

 

 

 

 

3 Comments »

  1. Jonathan,

    I think that Oracle 12c (12.1.0.2.0 ) has succeeded to get the correct estimation for the equality query because it has succeeded(in contrast to the 11g where the same value is non-popular) to capture the extreme value 999999999999999999 as a popular HYBRID histogram value.

    What do you think?

    SQL> SELECT
                     ept_act_val,
                     ept_nbr,
                     ept_nbr - ept_lag   count,
                     rpt_cnt
            FROM
                    (
                     select
                         endpoint_actual_value      ept_act_val
                     ,endpoint_number            ept_nbr
                    ,lag(endpoint_number,1,0) over(order by endpoint_number) ept_lag
                    ,endpoint_repeat_count rpt_cnt
                  from
                    all_tab_histograms
                 where
                    table_name  = 'T1'
                 and
                    column_name = 'NEND'
                  )
          where ept_act_val = 999999999999999999;
    
    EPT_ACT_VAL             EPT_NBR      COUNT    RPT_CNT
    -------------------- ---------- ---------- ----------
    999999999999999999         5600       1464       1463
    

    And having captured it, the estimation for the equality is simply an answer to the following formula

    E-Rows = ENDPOINT_REPEAT_COUNT * bucket size = 1463 * 1000000/5600 = 261250 = 261K
    
    explain plan for
    select  *
    from    t1
    where   nend = 999999999999999999
    ;
    
    select * from table(dbms_xplan.display);
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |   261K|  6888K|  1249   (1)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   |   261K|  6888K|  1249   (1)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - filter("NEND"=999999999999999999)
    

    In contrast to Oracle 11g release where this value has not been captured and, as such, its estimation is calculated using the NewDensity

    E-ROWS = num_rows *NewDensity =  1000000 * 0.00000389198096655691142327466724080286598854  = 3.89198097 ~4
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |     4 |   108 |  1250   (1)| 00:00:15 |
    |*  1 |  TABLE ACCESS FULL| T1   |     4 |   108 |  1250   (1)| 00:00:15 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - filter("NEND"=999999999999999999)
    

    Best regards

    Comment by hourim — October 27, 2015 @ 9:23 am GMT Oct 27,2015 | Reply

  2. Mohamed,

    Thanks for the write-up, but I think you must have done the work after a first reading and missed the update that I did on the following day. You’re exactly correct – it’s a feature of 12c and the change in the way Oracle stores the endpoint_actual_value. If you force 11g to generate a height-balanced histogram it will still make the same mistake because it won’t record any endpoint_actual values.

    Comment by Jonathan Lewis — October 27, 2015 @ 10:55 pm GMT Oct 27,2015 | Reply

  3. Jonathan,

    Thanks for your answer and the suggestion to force 12c generating Height Balanced histogram by collecting stats using a non default estimate_percent (for example estimate_percent => 100)

    SQL> exec dbms_stats.gather_table_stats(user ,'t1', estimate_percent => 100);
    

    Interestingly, the forced 12c Height Balanced histogram has captured this special value (of course there is no corresponding endpoint_repeat_count)

    SQL> select
    column_name
    , num_distinct
    , num_buckets
    , sample_size
    , histogram
    , last_analyzed
    from
    all_tab_col_statistics
    where table_name = ‘T1’
    and column_name in ( ‘NEND’);

    COLUMN_NAME NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM LAST_ANALYZ
    ————- ———— ———– ———– ————— ———–
    NEND 187501 254 1000000 HEIGHT BALANCED 28-OCT-2015

    SQL> SELECT
    ept_act_val,
    ept_nbr,
    ept_nbr – ept_lag count,
    rpt_cnt
    FROM
    (
    select
    endpoint_actual_value ept_act_val
    ,endpoint_number ept_nbr
    ,lag(endpoint_number,1,0) over(order by endpoint_number) ept_lag
    ,endpoint_repeat_count rpt_cnt
    from
    all_tab_histograms
    where
    table_name = ‘T1’
    and
    column_name = ‘NEND’
    )
    where ept_act_val = 999999999999999999;

    EPT_ACT_VAL EPT_NBR COUNT RPT_CNT
    ————– —- ——————————
    999999999999999999 254 64 0
    [/code]

    And the estimation for the equality is acceptable

    SQL> explain plan for
        select  *
        from    t1
        where   nend = 999999999999999999
       ;
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |   250K|  6591K|  1249   (1)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   |   250K|  6591K|  1249   (1)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       1 - filter("NEND"=999999999999999999)
    

    I don’t know how exactly Oracle is calculating this 250K estimation. This is a special 12c case where this extreme value has not been captured when considered from the endpoint_repeat_count while it can be considered as popular when seen from the 11g (forced 12c) Height Balanced Histogram where its frequency is 64 suggesting that it has been seen at the end point of several bucket number.

    The 10053 trace file show this

    SINGLE TABLE ACCESS PATH 
      Single Table Cardinality Estimation for T1[T1] 
      SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
      Column (#2): 
        NewDensity:0.000004, OldDensity:0.000004 BktCnt:254.000000, PopBktCnt:64.000000, PopValCnt:1, NDV:187501
      Column (#2): NEND(NUMBER)
        AvgLen: 7 NDV: 187501 Nulls: 0 Density: 0.000004 Min: 1.000000 Max: 999999999999999872.000000
        Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 192  ActualVal: yes
      Table: T1  Alias: T1
        Card: Original: 1000000.000000  Rounded: 250000  Computed: 250000.000000  Non Adjusted: 250000.000000
    

    Where we see that the Computed cardinality equals the Rounded one. It is suggesting that Oracle is using the NewDensity. But that exact computed cardinality is irritating me as far as I still have not figured out how Oracle gets it.

    Best regards
    Mohamed

    Comment by hourim — October 29, 2015 @ 9:26 am GMT Oct 29,2015 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.