When you upgrade from 10.2.0.3 to 10.2.0.4 or 11g, watch out for SQL that depends on the existence of frequency histograms.
In 10.2.0.3 (and earlier versions of Oracle) if you have a frequency histogram on a column, and then use a predicate of the form:
where colX = {value not found in histogram}
then the optimizer would calculate a cardinality of one for this predicate (having recorded a density of 1/(2 * num_rows) in the data dictionary (see comments 21 and 22 of this note on dbms_xplan).
In 10.2.0.4, the cardinality will change in one of two possible ways. If the value you supply does not appear in the histogram, but is inside the low/high range of the histogram then the cardinality will be half the cardinality of the least frequently occurring value that is in the histogram. (A change first brought to my attention by Alberto Dell’Era).
If the value you supply is outside the low/high range of the histogram Oracle starts with the “half the least popular” value, then applies the normal (for 10g) linear decay estimate so that the cardinality drops the further outside the known range your requested value falls.
For example, consider a column with these five values and the following frequencies:
99 10M rows 88 1M rows 81 400K rows 91 20 rows 75 20 rows
Imagine you gathered table stats using the automatic sample size and Oracle created a frequency histogram that failed to capture any information about the rather rare values 75 and 91.
In 10.2.0.3 a query for 75 or 91 would result in the optimizer saying “only one row” and it would probably produce a reasonable execution plan.
In 10.2.0.4 the optimizer would say 200K rows (half of 400K) for the value 91 which is “missing” but inside the (captured) range 81 to 99 – and this could easily lead to a bad execution plan.
For the value 75 the cardinality would be about 140,000 (roughly two-thirds of 200,000) because it is missing – hence the starting point of 200,000 which then has to be scaled down because the value is outside the observed range by a gap that is one-third of the size of that range. [This is an approximation, I don’t have a 10.2.0.4 on hand to check the exact arithmetic].
The upshot of this is that if you have frequency histograms with some very extreme behaviour (like this example) where ‘rare but important’ values could be missed when you use a sample to gather stats, you should be very keen to write a little program using dbms_stats.set_column_stats() to create a fake but realistic and complete frequency histogram rather than letting Oracle work one out for itself.
Just a tiny nitpick, looks like you’ve used 79 in place of 75 in the post a couple of times :)
Comment by Tubby — April 23, 2009 @ 7:54 pm BST Apr 23,2009 |
Jonathan,
I’ve covered this changed behaviour in a post inspired by a note from Riyaj Shamsudeen (http://orainternals.wordpress.com/2008/12/19/correlation-nocorrelation-and-extended-stats/)
The issue is documented in Metalink Doc ID 5483301.8, and “Bug 5483301: Cardinality of 1 when predicate value non-existent in frequency histogram”.
You can control this by using the new fix_control feature introduced in 10.2.0.2: “alter session set “_fix_control”=’5483301:off’;” switches back to pre-10.2.0.4 and 11.1.0.6 behaviour.
The complete note can be found here: http://oracle-randolf.blogspot.com/2009/01/correlation-nocorrelation-and-extended.html
Regards,
Randolf
Oracle related stuff blog:
http://oracle-randolf.blogspot.com/
SQLTools++ for Oracle (Open source Oracle GUI for Windows):
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/
Comment by Randolf Geist — April 24, 2009 @ 6:41 am BST Apr 24,2009 |
Wouldn’t you say that the cardinality estimate of “only one row” in 10.2.0.3 is actually Bug# 5483301 ?
https://jonathanlewis.wordpress.com/2008/04/25/cardinality-change/
and
http://hemantoracledba.blogspot.com/2009/04/incorrect-cardinality-estimate-of-1-bug.html
So the change in behaviour in 10.2.0.4 is a Bug Fix ?
Hemant K Chitale
Comment by Hemant K Chitale — April 24, 2009 @ 7:57 am BST Apr 24,2009 |
Hi Jonathan,
this change was made in fix for Bug 5483301, and it can be turned off with _fix_control. Just an example (I had to run it several times to get desired results, because auto estimate_percent and method_opt are not consistent):
Comment by Timur Akhmadeev — April 24, 2009 @ 9:51 am BST Apr 24,2009 |
Tubby,
Thanks for the note – corrections now in place. I did post an acknowledgement last night when I made the change, but it seems to have disappeared down /dev/null !
Randolf,
Thanks for the links – your post was initially marked as spam because it had too many links in it. (By the way, Alberto sent me an email last night questioning whether he was the one that had mentioned this change to me. I was fairly sure it was, but now I think it might have been you that first brought it to my attention. Apologies for the mis-attribution if that’s the case).
Hemant,
Thanks for the comment, your post also got marked as spam.
Timur,
Thanks for posting the example – a bit got lost because you missed one of the “less thans” when editing the text. I’ve edited it with my best guess about the text that got dropped.
All –
Is this the quoted bug ? Yes and No.
I don’t think the “bug” was really a bug, just an optimizer assumption of reasonable behaviour that nobody questioned for several years. Personally I was quite amused when another bug number appeared about one week after the 10.2.0.4 patchset came out explaining how to reverse the effect – so both behaviours appear as bugs on Metalink.
Comment by Jonathan Lewis — April 24, 2009 @ 3:19 pm BST Apr 24,2009 |
[…] Jonathan Lewis @ 7:34 pm UTC May 28,2009 In a recent article on changes to the calculation of cardinality with frequency histograms I made a comment about writing programs to create fake but realistic and complete frequency […]
Pingback by Frequency Histograms « Oracle Scratchpad — May 28, 2009 @ 7:36 pm BST May 28,2009 |
Hi Jonathan,
I noticed that the dbms_stats is influenced someway by how columns are used in where clause predicates.
I did the following
1. create table
2. gather stats : distinct keys and density set to 1
3. query the table with col ‘N’ and col := :b1 and col2 =
4. gather stats : distinct key set to 1 and density 1/*num_rows
I must have missed that in the documents and on the net somehow.
Regards Hans-Peter
Comment by Hans-Peter Sloot — August 21, 2009 @ 1:09 pm BST Aug 21,2009 |
The 3rd line of the steps should read:
3. query the table with col ‘N’ and col := :b1 and col2 = “some very selective value”
I think that is essential.
Regards Hans-Peter
Comment by Hans-Peter Sloot — August 24, 2009 @ 9:14 am BST Aug 24,2009 |
[…] that the most important change is the introduction of the “half the least popular" rule (see the "Histogram change" post by Jonathan Lewis, which distills the findings of Randolf Geist and Riyaj Shamsudeen) – a […]
Pingback by Alberto Dell’Era’s Oracle blog » CBO: NewDensity for Frequency Histograms,11g-10.2.0.4 (densities part IV) — October 23, 2009 @ 4:07 pm BST Oct 23,2009 |
[…] posting: Change in use of Frequency Histogram at […]
Pingback by Frequency Histograms 2 « Oracle Scratchpad — September 21, 2010 @ 12:18 pm BST Sep 21,2010 |
[…] Neither of the two strings in my predicate appear in the data or in the histogram – but the first string (after applying the histogram algorithm) is a match for an entry in the histogram so Oracle reports the histogram endpoint_number as the cardinality. The second string doesn’t appear even after applying the algorithm – so Oracle has supplied a cardinality of 50, which is half the cardinality of the least frequently occuring value that it can find in the histogram. (Note – if you are running 10.2.0.3 or earlier the cardinality for this case would be 1; the optimizer changed in 10.2.0.4). […]
Pingback by Frequency Histogram 5 « Oracle Scratchpad — October 13, 2010 @ 9:23 am BST Oct 13,2010 |
[…] 10.2.0.3 it has set the resulting cardinality to 1 (this would be 75 in 10.2.0.4 or 11g – half the lowest known value in the histogram). But in the second case (‘priority low’) the optimizer has done its […]
Pingback by Frequency Histograms – 6 « Oracle Scratchpad — October 20, 2010 @ 8:12 am BST Oct 20,2010 |
I derived a small routine to automate the creation of frequency histogram limited to the 254 more skewed columns. Within the boundaries of the histogram, all value were the one given either by the histogram or by the density. Outside, it was 1.
Oracle version 10.2.0.5 on HP-UX.
Produces this histogram:
Expected cardinality is very close. I wonder why it modified the 17569 to 17792
Not bad
23 rows this is the default density : 5673/250 = 22.692 round up to 23.
Comment by Bernard Polarski — January 4, 2011 @ 1:44 pm GMT Jan 4,2011 |
Bernard,
Thanks for the contribution. Unfortunately it got messed up by a rogue “less than”.
If you can email me the entire source as a text-file attachment I’ll copy it into your comment.
The variation between your prediction and Oracle’s prediction is due to scaling by Oracle to cater for the difference between “total rows in table” and “total rows covered by histogram”.
Comment by Jonathan Lewis — January 5, 2011 @ 10:16 am GMT Jan 5,2011 |
[…] cases this makes introduces a slowly increasing error (though for some predicates it can create an immediate catastrophe). There is a special consideration, though, that means you need to be very careful about […]
Pingback by Upgrades | Oracle Scratchpad — December 10, 2015 @ 8:42 am GMT Dec 10,2015 |
[…] last algorithm appeared in 10.2.0.4; prior to that a “value not in frequency histogram” would have been given an estimated […]
Pingback by Extended Histograms | Oracle Scratchpad — August 1, 2018 @ 3:02 pm BST Aug 1,2018 |