A surprising question came up on OTN a couple of days ago:
Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate performance is good.
In the absence of further information there are various reasons why this is possible – but the example in question was about a “versioning” table where the single very large value was used as the “not yet ended” value for the history of an id so, at a minimum, the table held columns (id, nstart, nend, other), and each id could appear many times with pairs of start and end values that supplied non-overlapping, covering ranges and one row that had the very large number as the end value.
Let’s jot down a few ideas about what the data (and stats) might look like.
Assuming every id appears “a few” time and every id has to have a “still valid” row this means that a very large fraction (say 10% to 25%, if “a few” means 4 to 9) of the rows hold the value 999999999999999999.
If you gather stats without a histogram then you should get the low and high, finding that the high is 999999999999999999 and that the range is enormous, and so the predicates “column = {high value}” and “column > {high value} – 1” should give very similar cardinalities.
If you collect stats with a histogram you should find the very popular high value even in a very small sample set (which is what happens with histogram collection in 11g, and even in 12c for hybrid histograms). In this case the histogram should spot the significance of the high value and again the two predicates should have very similar cardinalities.
At first sight there doesn’t seem to be a feasible way that the two cardinalities could be sufficiently different to cause a problem – so maybe there’s something about character conversion or maybe bind variable usage that hasn’t been mentioned. So to test a couple of the less likely ideas I built a data set using 11.2.0.4 – and found a bug:
rem Script: histogram_bug.sql rem Author: Jonathan Lewis rem Dated: Sep 2015 create table t1 ( id number(18), nend number(18), n1 number(18), small_vc varchar2(10) ) nologging ; insert /*+ append */ into t1 with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum -1 id, case mod(rownum - 1,4) when 0 then 999999999999999999 -- when 0 then 999999999999999 else mod(rownum - 1, 250000) end nend, rownum - 1 n1, rpad('x',10,'x') small_vc from generator v1, generator v2 where rownum <= 1e6 -- > "gt" bypasses WordPress formatting issue. ; commit; select * from t1 where nend = 0; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size auto' ); end; /
I have an nend column that is set to 999999999999999999 every 4th row in the table and otherwise has 4 rows per value for 187,500 other (relatively low) values. It’s probably a reasonable initial model of the original data. I’ve run a query with a predicate referencing nend before gathering stats so that the (default) auto option will build a histogram for nend. Then I’ve checked the execution plans for two critical queries:
explain plan for select * from t1 where nend = 999999999999999999 ; select * from table(dbms_xplan.display); explain plan for select * from t1 where nend > 999999999999999998 ; select * from table(dbms_xplan.display);
And here’s the surprise – the two plans, in order:
-------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 | 108 | 625 (9)| 00:00:04 | |* 1 | TABLE ACCESS FULL| T1 | 4 | 108 | 625 (9)| 00:00:04 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NEND"=999999999999999999) -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 250K| 6591K| 627 (9)| 00:00:04 | |* 1 | TABLE ACCESS FULL| T1 | 250K| 6591K| 627 (9)| 00:00:04 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NEND">999999999999999998)
Clearly this should not happen – the equality test is way off, the inequality test is correct. The obvious first guess is that something funny has happened with the statistics so let’s see what they look like – the column stats (user_tab_cols) and the histogram stats (user_tab_histograms) seem like a good starting point:
select sample_size, num_distinct, histogram, num_buckets, substr(low_value,1,26) low_value, substr(high_value,1,26) high_value from user_tab_cols where table_name = 'T1' and column_name = 'NEND' ; column endpoint_value format 999,999,999,999,999,999,999 select endpoint_number, endpoint_value -- , endpoint_repeat_count from user_tab_histograms where table_name = 'T1' and column_name = 'NEND' order by endpoint_number ;
And here are the results (with a couple of hundred uninteresting rows eliminated from the histogram):
Sample Distinct HISTOGRAM Buckets LOW_VALUE HIGH_VALUE ------------ ------------ --------------- ---------- -------------------------- -------------------------- 5,541 190,176 HEIGHT BALANCED 254 C102 C9646464646464646464 ENDPOINT_NUMBER ENDPOINT_VALUE --------------- ---------------------------- 0 19 1 1,225 2 2,503 3 3,911 4 4,806 ... 188 247,479 189 248,754 190 249,862 254 1,000,000,000,000,000,000
Oracle will have started with a 100% sample to collect stats on all the columns, but taken a small sample to test the need for a histogram on the nend column – and that’s why the sample size of 5,541 has appeared, but that’s not relevant to the problem in hand. The big question comes from endpoint_number 254 – why is the highest value in the histogram 1e19 when we know (and the column stats show) that the highest value is actually 999999999999999999 ?!
It’s a question to which I don’t have an answer – but I do know that
- if your high value is 15 digits long (all 9s) then the histogram shows the right high value
- if your high value is more that 15 9s then the histogram shows the high value plus 1
- the value collected by the query that Oracle runs is the actual value (i.e. 18 9s)
- if you use set_column stats to set 18 9s as the high value you still get 1e19 in the histogram
Once you see the stats you can understand why the OP sees the odd performance problem. If the histogram identifies 1e19 as a (very) popular value, leaving 999999999999999 as an “average” value with only 4 rows; on the other hand the query for greater than 999999999999999998 can see that there really are 250K rows with higher values.
Footnote:
Interestingly 12c does the same with the stats – introducing the 1e19 in the histogram – but still manages, somehow, to calculate the correct cardinality in the equality case. (There is one slight difference in 12c, the histogram is a hybrid histogram, not a height-balanced histogram).
Update 24th Oct 2015:
I realised late last night that I had written about this behaviour before – though possibly just as a response on OTN or in some unpublished notes. According to some notes that I found on my laptop this is “Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES”. (except that the base bug that that one is linked to is probably not the same bug).
I suspect the problem is related to the way that character histograms are built based on a numeric representation of the string that takes the first few characters of the string, treats that resulting N bytes as a hex number, converts to decimal and then applies round(N, -21) to restrict the precision stored. The effect with strings is that (broadly speaking) you get fifteen digits (sometimes reduced to 14) precision – which is exactly what I seem to be seeing with numbers.
The reason that 12c can get the right answers despite storing the wrong endpoint_value is that it’s also storing to_char() of the right value as the endpoint_actual_value – possibly doing this any time it has had to round the endpoint_value (for character strings the endpoint_actual_value was only populated if two entries in the endpoint_value were the same).
Jonathan,
I think that Oracle 12c (12.1.0.2.0 ) has succeeded to get the correct estimation for the equality query because it has succeeded(in contrast to the 11g where the same value is non-popular) to capture the extreme value 999999999999999999 as a popular HYBRID histogram value.
What do you think?
And having captured it, the estimation for the equality is simply an answer to the following formula
In contrast to Oracle 11g release where this value has not been captured and, as such, its estimation is calculated using the NewDensity
Best regards
Comment by hourim — October 27, 2015 @ 9:23 am GMT Oct 27,2015 |
Mohamed,
Thanks for the write-up, but I think you must have done the work after a first reading and missed the update that I did on the following day. You’re exactly correct – it’s a feature of 12c and the change in the way Oracle stores the endpoint_actual_value. If you force 11g to generate a height-balanced histogram it will still make the same mistake because it won’t record any endpoint_actual values.
Comment by Jonathan Lewis — October 27, 2015 @ 10:55 pm GMT Oct 27,2015 |
Jonathan,
Thanks for your answer and the suggestion to force 12c generating Height Balanced histogram by collecting stats using a non default estimate_percent (for example estimate_percent => 100)
Interestingly, the forced 12c Height Balanced histogram has captured this special value (of course there is no corresponding endpoint_repeat_count)
And the estimation for the equality is acceptable
I don’t know how exactly Oracle is calculating this 250K estimation. This is a special 12c case where this extreme value has not been captured when considered from the endpoint_repeat_count while it can be considered as popular when seen from the 11g (forced 12c) Height Balanced Histogram where its frequency is 64 suggesting that it has been seen at the end point of several bucket number.
The 10053 trace file show this
Where we see that the Computed cardinality equals the Rounded one. It is suggesting that Oracle is using the NewDensity. But that exact computed cardinality is irritating me as far as I still have not figured out how Oracle gets it.
Best regards
Mohamed
Comment by hourim — October 29, 2015 @ 9:26 am GMT Oct 29,2015 |
[…] Big number problem – older versions of Oracle (pre 12c) can go wrong with data values more than 15 digits long […]
Pingback by Faking Histograms | Oracle Scratchpad — October 15, 2018 @ 1:37 pm BST Oct 15,2018 |