When you upgrade from 10.2.0.3 to 10.2.0.4 or 11g, watch out for SQL that depends on the existence of frequency histograms.
In 10.2.0.3 (and earlier versions of Oracle) if you have a frequency histogram on a column, and then use a predicate of the form:
where colX = {value not found in histogram}
then the optimizer would calculate a cardinality of one for this predicate (having recorded a density of 1/(2 * num_rows) in the data dictionary (see comments 21 and 22 of this note on dbms_xplan).
In 10.2.0.4, the cardinality will change in one of two possible ways. If the value you supply does not appear in the histogram, but is inside the low/high range of the histogram then the cardinality will be half the cardinality of the least frequently occurring value that is in the histogram. (A change first brought to my attention by Alberto Dell’Era).
If the value you supply is outside the low/high range of the histogram Oracle starts with the “half the least popular” value, then applies the normal (for 10g) linear decay estimate so that the cardinality drops the further outside the known range your requested value falls.
For example, consider a column with these five values and the following frequencies:
99 10M rows
88 1M rows
81 400K rows
91 20 rows
75 20 rows
Imagine you gathered table stats using the automatic sample size and Oracle created a frequency histogram that failed to capture any information about the rather rare values 75 and 91.
In 10.2.0.3 a query for 75 or 91 would result in the optimizer saying “only one row” and it would probably produce a reasonable execution plan.
In 10.2.0.4 the optimizer would say 200K rows (half of 400K) for the value 91 which is “missing” but inside the (captured) range 81 to 99 – and this could easily lead to a bad execution plan.
For the value 75 the cardinality would be about 140,000 (roughly two-thirds of 200,000) because it is missing – hence the starting point of 200,000 which then has to be scaled down because the value is outside the observed range by a gap that is one-third of the size of that range. [This is an approximation, I don't have a 10.2.0.4 on hand to check the exact arithmetic].
The upshot of this is that if you have frequency histograms with some very extreme behaviour (like this example) where ‘rare but important’ values could be missed when you use a sample to gather stats, you should be very keen to write a little program using dbms_stats.set_column_stats() to create a fake but realistic and complete frequency histogram rather than letting Oracle work one out for itself.

Just a tiny nitpick, looks like you’ve used 79 in place of 75 in the post a couple of times :)
Comment by Tubby — April 23, 2009 @ 7:54 pm UTC Apr 23,2009 |
Jonathan,
I’ve covered this changed behaviour in a post inspired by a note from Riyaj Shamsudeen (http://orainternals.wordpress.com/2008/12/19/correlation-nocorrelation-and-extended-stats/)
The issue is documented in Metalink Doc ID 5483301.8, and “Bug 5483301: Cardinality of 1 when predicate value non-existent in frequency histogram”.
You can control this by using the new fix_control feature introduced in 10.2.0.2: “alter session set “_fix_control”=’5483301:off’;” switches back to pre-10.2.0.4 and 11.1.0.6 behaviour.
The complete note can be found here: http://oracle-randolf.blogspot.com/2009/01/correlation-nocorrelation-and-extended.html
Regards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/
Comment by Randolf Geist — April 24, 2009 @ 6:41 am UTC Apr 24,2009 |
Wouldn’t you say that the cardinality estimate of “only one row” in 10.2.0.3 is actually Bug# 5483301 ?
http://jonathanlewis.wordpress.com/2008/04/25/cardinality-change/
and
http://hemantoracledba.blogspot.com/2009/04/incorrect-cardinality-estimate-of-1-bug.html
So the change in behaviour in 10.2.0.4 is a Bug Fix ?
Hemant K Chitale
Comment by Hemant K Chitale — April 24, 2009 @ 7:57 am UTC Apr 24,2009 |
Hi Jonathan,
this change was made in fix for Bug 5483301, and it can be turned off with _fix_control. Just an example (I had to run it several times to get desired results, because auto estimate_percent and method_opt are not consistent):
drop table t cascade constraints purge; create table t (id int); insert into t with t1 as (select null from all_objects where rownum <= 5000) select 99 from t1,t1 where rownum <= 10000000 union all select 88 from t1,t1 where rownum <= 1000000 union all select 81 from t1,t1 where rownum <= 400000 union all select 91 from t1 where rownum <= 20 union all select 75 from t1 where rownum <= 20 ); execute dbms_stats.gather_table_stats(user,'t',method_opt => 'for all columns size auto') select endpoint_number,endpoint_value from user_tab_histograms where table_name = 'T'; truncate table plan_table; alter session set "_fix_control"='5483301:ON'; explain plan set statement_id 'on_91' for select * from t where id = 91; explain plan set statement_id 'on_100' for select * from t where id = 100; explain plan set statement_id 'on_105' for select * from t where id = 105; explain plan set statement_id 'on_110' for select * from t where id = 110; alter session set "_fix_control"='5483301:OFF'; explain plan set statement_id 'off_91' for select * from t where id = 91; explain plan set statement_id 'off_100' for select * from t where id = 100; explain plan set statement_id 'off_105' for select * from t where id = 105; explain plan set statement_id 'off_110' for select * from t where id = 110; select statement_id, cardinality, filter_predicates from plan_table where (statement_id like 'on%' or statement_id like 'off%') and depth = 1;SQL> select endpoint_number,endpoint_value from user_tab_histograms where table_name = 'T'; ENDPOINT_NUMBER ENDPOINT_VALUE --------------- -------------- 191 81 648 88 5469 99 SQL> select statement_id, cardinality, filter_predicates from plan_table 2 where (statement_id like 'on%' or statement_id like 'off%') and depth = 1; STATEMENT_ID CARDINALITY FILTER_PREDICATES --------------- -------------------- ---------------------------------------------------------- on_91 197983 "ID"=91 on_100 186984 "ID"=100 on_105 131989 "ID"=105 on_110 76993 "ID"=110 off_91 1 "ID"=91 off_100 1 "ID"=100 off_105 1 "ID"=105 off_110 1 "ID"=110Comment by Timur Akhmadeev — April 24, 2009 @ 9:51 am UTC Apr 24,2009 |
Tubby,
Thanks for the note – corrections now in place. I did post an acknowledgement last night when I made the change, but it seems to have disappeared down /dev/null !
Randolf,
Thanks for the links – your post was initially marked as spam because it had too many links in it. (By the way, Alberto sent me an email last night questioning whether he was the one that had mentioned this change to me. I was fairly sure it was, but now I think it might have been you that first brought it to my attention. Apologies for the mis-attribution if that’s the case).
Hemant,
Thanks for the comment, your post also got marked as spam.
Timur,
Thanks for posting the example – a bit got lost because you missed one of the “less thans” when editing the text. I’ve edited it with my best guess about the text that got dropped.
All –
Is this the quoted bug ? Yes and No.
I don’t think the “bug” was really a bug, just an optimizer assumption of reasonable behaviour that nobody questioned for several years. Personally I was quite amused when another bug number appeared about one week after the 10.2.0.4 patchset came out explaining how to reverse the effect – so both behaviours appear as bugs on Metalink.
Comment by Jonathan Lewis — April 24, 2009 @ 3:19 pm UTC Apr 24,2009 |
[...] Jonathan Lewis @ 7:34 pm UTC May 28,2009 In a recent article on changes to the calculation of cardinality with frequency histograms I made a comment about writing programs to create fake but realistic and complete frequency [...]
Pingback by Frequency Histograms « Oracle Scratchpad — May 28, 2009 @ 7:36 pm UTC May 28,2009 |
Hi Jonathan,
I noticed that the dbms_stats is influenced someway by how columns are used in where clause predicates.
I did the following
1. create table
2. gather stats : distinct keys and density set to 1
3. query the table with col ‘N’ and col := :b1 and col2 =
4. gather stats : distinct key set to 1 and density 1/*num_rows
I must have missed that in the documents and on the net somehow.
Regards Hans-Peter
Comment by Hans-Peter Sloot — August 21, 2009 @ 1:09 pm UTC Aug 21,2009 |
The 3rd line of the steps should read:
3. query the table with col ‘N’ and col := :b1 and col2 = “some very selective value”
I think that is essential.
Regards Hans-Peter
Comment by Hans-Peter Sloot — August 24, 2009 @ 9:14 am UTC Aug 24,2009 |
[...] that the most important change is the introduction of the “half the least popular" rule (see the "Histogram change" post by Jonathan Lewis, which distills the findings of Randolf Geist and Riyaj Shamsudeen) – a [...]
Pingback by Alberto Dell’Era’s Oracle blog » CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV) — October 23, 2009 @ 4:07 pm UTC Oct 23,2009 |
[...] posting: Change in use of Frequency Histogram at [...]
Pingback by Frequency Histograms 2 « Oracle Scratchpad — September 21, 2010 @ 12:18 pm UTC Sep 21,2010 |
[...] Neither of the two strings in my predicate appear in the data or in the histogram – but the first string (after applying the histogram algorithm) is a match for an entry in the histogram so Oracle reports the histogram endpoint_number as the cardinality. The second string doesn’t appear even after applying the algorithm – so Oracle has supplied a cardinality of 50, which is half the cardinality of the least frequently occuring value that it can find in the histogram. (Note – if you are running 10.2.0.3 or earlier the cardinality for this case would be 1; the optimizer changed in 10.2.0.4). [...]
Pingback by Frequency Histogram 5 « Oracle Scratchpad — October 13, 2010 @ 9:23 am UTC Oct 13,2010 |
[...] 10.2.0.3 it has set the resulting cardinality to 1 (this would be 75 in 10.2.0.4 or 11g – half the lowest known value in the histogram). But in the second case (‘priority low’) the optimizer has done its [...]
Pingback by Frequency Histograms – 6 « Oracle Scratchpad — October 20, 2010 @ 8:12 am UTC Oct 20,2010 |
I derived a small routine to automate the creation of frequency histogram limited to the 254 more skewed columns. Within the boundaries of the histogram, all value were the one given either by the histogram or by the density. Outside, it was 1.
Oracle version 10.2.0.5 on HP-UX.
SQL: alter session set "_fix_control"='5483301:off' ; Session altered. declare srec dbms_stats.statrec; a_bucket dbms_stats.numarray; v_tot_rows_not_in_freq number; v_tot number; v_density number; v_distinct_key_not_in_freq number; v_cutoff number :=150 -- set here your variable begin -- create the histogram, maximum 254 buckets but may be lower following v_cutoff_value select cpt, USERIDENTITY_ID bulk collect into srec.bkvals, a_bucket from ( select cpt, USERIDENTITY_ID from ( select count(*) cpt, USERIDENTITY_ID from IBS6_EB_OWNER.USERMESSAGE group by USERIDENTITY_ID order by count(*) desc ) where rownum 150 order by USERIDENTITY_ID ); -- Calculate now the density for all values which are not in the histo. -- the density is calculated from all rows not in the frequency histogram -- so density = (tot rows not in freq / distinct key not in freq)/tot row in table) -- the cut off value to be taken into the histogram is set by default to 150, adapt following needs -- total rows not in frequency: with v as (select useridentity_id from ( select useridentity_id from MYTABLE_OWNER.CUSTOMER group by USERIDENTITY_ID having count(*) > v_cutoff order by count(*) desc) where rownum 'MYTABLE_OWNER', tabname => 'CUSTOMER', colname => 'CUST_ID', density => v_density, srec => srec ); end; / Rows in table : 459699 distint Keys in histogram : 74 Tot Rows keys covered by histo : 454026 Distinct Keys not in histogram : 250 Rows not in histogram : 5673 Density for keys not in histo : 0.0000493627Produces this histogram:
TABLE_NAME COLUMN_NAME Bck nbr ENDPOINT_VALUE Frequency ------------ --------------- ---------- -------------- ---------- USERMESSAGE USERIDENTITY_ID 303 11 303 1990 12 1687 2443 19 453 5165 21 2722 5417 24 252 5945 27 528 6105 33 160 6381 85 276 23950 148 17569 <-- See SQL1 24371 149 421 <-- see SQL2 <-- see SQL3 value 169 not in histogram 27792 180 3421 107833 201 80041 108440 216 607 109053 221 613 110623 223 1570 110833 422 210 111030 429 197 111222 440 192 151751 441 40529 . . (74 rows) SQL_ID g3b5j7qfy7u5d, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ COUNT(*) FROM usermessage um, useridentity ui , usermessagetype umt WHERE um.isconsumed = 0 AND ui.useridentity_id = um.useridentity_id AND umt.usermessagetype_id = um.usermessagetype_id AND ui.useridentity_id = 148 Plan hash value: 3616009165 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.07 | 2362 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.07 | 2362 | | 2 | NESTED LOOPS | | 1 | 7505 | 8559 |00:00:00.06 | 2362 | |* 3 | INDEX UNIQUE SCAN | USERIDENTITY_PK | 1 | 1 | 1 |00:00:00.01 | 2 | |* 4 | TABLE ACCESS BY INDEX ROWID| USERMESSAGE | 1 | 7505 | 8559 |00:00:00.06 | 2360 | |* 5 | INDEX RANGE SCAN | USERMESSAGE_IDX | 1 | 17792 | 17569 |00:00:00.01 | 46 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("UI"."USERIDENTITY_ID"=148) 4 - filter("UM"."ISCONSUMED"=0) 5 - access("UM"."USERIDENTITY_ID"=148)Expected cardinality is very close. I wonder why it modified the 17569 to 17792
COUNT(*) ---------- 33 SQL_ID 81dtb814kvvy9, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ COUNT(*) FROM usermessage um, useridentity ui , usermessagetype umt WHERE um.isconsumed = 0 AND ui.useridentity_id = um.useridentity_id AND umt.usermessagetype_id = um.usermessagetype_id AND ui.useridentity_id = 149 Plan hash value: 3616009165 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 231 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 231 | | 2 | NESTED LOOPS | | 1 | 180 | 33 |00:00:00.01 | 231 | |* 3 | INDEX UNIQUE SCAN | USERIDENTITY_PK | 1 | 1 | 1 |00:00:00.01 | 2 | |* 4 | TABLE ACCESS BY INDEX ROWID| USERMESSAGE | 1 | 180 | 33 |00:00:00.01 | 229 | |* 5 | INDEX RANGE SCAN | USERMESSAGE_IDX | 1 | 426 | 421 |00:00:00.01 | 5 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("UI"."USERIDENTITY_ID"=149) 4 - filter("UM"."ISCONSUMED"=0) 5 - access("UM"."USERIDENTITY_ID"=149)Not bad
COUNT(*) ---------- 9 SQL_ID 0z8t787a3t8an, child number 0 ------------------------------------- SELECT /*+ gather_plan_statistics */ COUNT(*) FROM usermessage um, useridentity ui , usermessagetype umt WHERE um.isconsumed = 0 AND ui.useridentity_id = um.useridentity_id AND umt.usermessagetype_id = um.usermessagetype_id AND ui.useridentity_id = 169 Plan hash value: 3616009165 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 11 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 11 | | 2 | NESTED LOOPS | | 1 | 10 | 9 |00:00:00.01 | 11 | |* 3 | INDEX UNIQUE SCAN | USERIDENTITY_PK | 1 | 1 | 1 |00:00:00.01 | 2 | |* 4 | TABLE ACCESS BY INDEX ROWID| USERMESSAGE | 1 | 10 | 9 |00:00:00.01 | 9 | |* 5 | INDEX RANGE SCAN | USERMESSAGE_IDX | 1 | 23 | 9 |00:00:00.01 | 3 | -----------------------------------------------------------------------------------------------------------23 rows this is the default density : 5673/250 = 22.692 round up to 23.
Comment by Bernard Polarski — January 4, 2011 @ 1:44 pm UTC Jan 4,2011 |
Bernard,
Thanks for the contribution. Unfortunately it got messed up by a rogue “less than”.
If you can email me the entire source as a text-file attachment I’ll copy it into your comment.
The variation between your prediction and Oracle’s prediction is due to scaling by Oracle to cater for the difference between “total rows in table” and “total rows covered by histogram”.
Comment by Jonathan Lewis — January 5, 2011 @ 10:16 am UTC Jan 5,2011 |