Oracle Scratchpad

December 2, 2016

Histogram Upgrade

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 3:02 pm GMT Dec 2,2016

I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).

There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any histograms on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.

Here’s a little demo to show the underlying difference:


rem
rem     Script:         histogram_change_11204.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2016
rem     Purpose:
rem

create table t1 (v1 varchar2(32), c1 char(32));
insert into t1
select
        case when rownum <= 100 then 'N' else 'Y' end,
        case when rownum <= 100 then 'N' else 'Y' end
from
        all_objects
where
        rownum <= 1000    -- > "gt symbol to bypass WordPress formatting issue"
;

begin
        dbms_stats.gather_table_stats( ownname => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 10'
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on column_name skip 1

select
        column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
from
        user_tab_histograms
where
        table_name = 'T1'
order by
        column_name,
        endpoint_number
;

Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):

Results 11.1.0.7
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  405,650,737,316,592,000,000,000,000,000,000,000    4e20202020203a7bb119d5f6000000
                                1000  462,766,002,760,475,000,000,000,000,000,000,000    59202020202034d998ff0b5ae00000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Results 11.2.0.4
----------------
COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX
-------------------- --------------- ------------------------------------------------ ---------------------------------
C1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

V1                               100  404,999,154,965,717,000,000,000,000,000,000,000    4e0000000000181f436c7bbb200000
                                1000  462,114,420,409,600,000,000,000,000,000,000,000    590000000000127d2b51b120000000

Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).

In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).

Footnote

There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.

4 Comments »

  1. Hi, Jonathan:

    Per Note 1645862.1, applying the fix for Bug 18255105 to Rel. 11.2.0.4, prior to executing the pre-upgrade installation script, will recommend that you rebuild these particular histograms. (The fix is included in Rel. 12.1.0.2 and Rel. 12.2.)

    Sincerely,

    Ted

    Comment by Ted Persky — January 9, 2017 @ 4:08 pm GMT Jan 9,2017 | Reply

    • Ted,

      Thanks for the information.

      For other readers this note is: “Things to Consider to Avoid Poor Performance or Wrong Results on 11.2.0.4 (Doc ID 1645862.1)” and includes a number of other bits of information that might be useful.

      Following the link about this histogram problem in particular I see that it was actually addressed some time early in 2015, so my report is just a bit late :(

      Comment by Jonathan Lewis — January 9, 2017 @ 4:25 pm GMT Jan 9,2017 | Reply

  2. Thanks for the share

    Comment by fouedgray — February 14, 2017 @ 11:15 am GMT Feb 14,2017 | Reply

  3. […] need to refer to a note I wrote a few years ago to help us on our way (with a little caveat due to a change that appeared in 11.2.0.4) – what number would Oracle use to represent ‘mm’ and the other three strings we […]

    Pingback by Character selectivity | Oracle Scratchpad — April 14, 2017 @ 12:40 pm GMT Apr 14,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.