I’ve written notes in the past about the improvements 12c introduces for histograms – particularly the frequency and top-N histograms which can be generated “free of charge” while the basic “approximate NDV” scans are taking place to gather stats. Gathering histograms in 12c is much safer than it used to be in earlier versions of Oracle even in the case of the new hybrid histograms (which are still sampled on a very small sample and therefore still a stability risk).
There is a threat, though, recently highlighted by Franck Pachot, that sneaked in at 11.2.0.4 – the way in which the calculation of endpoint values for histograms on char() and nchar() columns has changed. When you upgrade from anything prior to 11.2.0.4 to either 11.2.0.4 or 12c you need to recreate any histograms on those columns; note that this does not apply to varchar2() and nvarchar2() columns, just the fixed length character types. If you fail to do this then you may find that Oracle produces some very silly estimates of cardinality, which could result in some very inefficient tablescans – in particular you are likely to find (as Franck showed) cases where you “know” that a particular value is IN the histogram but the optimizer behaves as if it isn’t – which means it uses the “half the least popular” estimate for the cardinality.
Here’s a little demo to show the underlying difference:
rem rem Script: histogram_change_11204.sql rem Author: Jonathan Lewis rem Dated: Nov 2016 rem Purpose: rem create table t1 (v1 varchar2(32), c1 char(32)); insert into t1 select case when rownum <= 100 then 'N' else 'Y' end, case when rownum <= 100 then 'N' else 'Y' end from all_objects where rownum <= 1000 -- > "gt symbol to bypass WordPress formatting issue" ; begin dbms_stats.gather_table_stats( ownname => user, tabname => 'T1', method_opt => 'for all columns size 10' ); end; / column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999 break on column_name skip 1 select column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx') from user_tab_histograms where table_name = 'T1' order by column_name, endpoint_number ;
Here are the results from an instance of 11.1.0.7 (though anything up to 11.2.0.3 should produce the same), and 11.2.0.4 (and later – including 12.2):
Results 11.1.0.7 ---------------- COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX -------------------- --------------- ------------------------------------------------ --------------------------------- C1 100 405,650,737,316,592,000,000,000,000,000,000,000 4e20202020203a7bb119d5f6000000 1000 462,766,002,760,475,000,000,000,000,000,000,000 59202020202034d998ff0b5ae00000 V1 100 404,999,154,965,717,000,000,000,000,000,000,000 4e0000000000181f436c7bbb200000 1000 462,114,420,409,600,000,000,000,000,000,000,000 590000000000127d2b51b120000000 Results 11.2.0.4 ---------------- COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE TO_CHAR(ENDPOINT_VALUE,'XXXXXXXXX -------------------- --------------- ------------------------------------------------ --------------------------------- C1 100 404,999,154,965,717,000,000,000,000,000,000,000 4e0000000000181f436c7bbb200000 1000 462,114,420,409,600,000,000,000,000,000,000,000 590000000000127d2b51b120000000 V1 100 404,999,154,965,717,000,000,000,000,000,000,000 4e0000000000181f436c7bbb200000 1000 462,114,420,409,600,000,000,000,000,000,000,000 590000000000127d2b51b120000000
Look particularly at the first 6 bytes of the Hex version of the endpoint values for the char() column c1. In 11.1.0.7 you see “4e2020202020”, “592020202020” – that’s ASCII ‘N’ and ‘Y’ respectively, padded to 6 characters with spaces. In 11.2.0.4 the spaces have disappeared – the char() columns are now padded to 6 characters with zeros (which is how varchar2() columns have always been treated).
In 11.1.0.7 the optimizer will find a histogram entry for c1 = ‘Y’ and produce a cardinality of 900; if you upgrade the database to 11.2.0.4 without recreating the histograms the optimizer won’t find a histogram entry for the predicate and will produce a cardinality of 50 (i.e. 100 / 2).
Footnote
There’s a brief summary of the algorithm Oracle uses to generate values for character-based histograms at this URL.
Hi, Jonathan:
Per Note 1645862.1, applying the fix for Bug 18255105 to Rel. 11.2.0.4, prior to executing the pre-upgrade installation script, will recommend that you rebuild these particular histograms. (The fix is included in Rel. 12.1.0.2 and Rel. 12.2.)
Sincerely,
Ted
Comment by Ted Persky — January 9, 2017 @ 4:08 pm GMT Jan 9,2017 |
Ted,
Thanks for the information.
For other readers this note is: “Things to Consider to Avoid Poor Performance or Wrong Results on 11.2.0.4 (Doc ID 1645862.1)” and includes a number of other bits of information that might be useful.
Following the link about this histogram problem in particular I see that it was actually addressed some time early in 2015, so my report is just a bit late :(
Comment by Jonathan Lewis — January 9, 2017 @ 4:25 pm GMT Jan 9,2017 |
Thanks for the share
Comment by fouedgray — February 14, 2017 @ 11:15 am GMT Feb 14,2017 |
[…] need to refer to a note I wrote a few years ago to help us on our way (with a little caveat due to a change that appeared in 11.2.0.4) – what number would Oracle use to represent ‘mm’ and the other three strings we […]
Pingback by Character selectivity | Oracle Scratchpad — April 14, 2017 @ 12:40 pm BST Apr 14,2017 |
Hello!
Very useful piece of information.
Note that the above implicitly means also that by changing optimizer_features_enable from 11.2.0.3 to anything higher and not recollecting statistics your SQLs may start behaving badly.
In other words upgrade of the CBO (optimizer_features_enable) and not just the database triggers this potential threat.
Regards,
RobK
Comment by vaurob — October 9, 2018 @ 1:37 pm BST Oct 9,2018 |
[…] Upgrade threat – a step you need to take to upgrade from 11.2.0.3 if you have histograms on char() columns […]
Pingback by Faking Histograms | Oracle Scratchpad — October 15, 2018 @ 2:14 pm BST Oct 15,2018 |