This note is a variant of a note I wrote a couple of months ago about the impact of nulls on column groups. The effect showed up recently, though slightly obscured by other issues, on a client site so I thought it would be worth mentioning this indexing variant in its own right. I’ll start with a script to generate some test data:
rem rem Script: pt_hash_cbo_anomaly.sql rem Author: Jonathan Lewis rem Dated: Dec 2018 rem Purpose: rem rem Last tested rem 18.3.0.0 rem 12.2.0.1 rem 12.1.0.2 rem create table t1 ( hash_col, rare_col, n1, padding ) nologging partition by hash (hash_col) partitions 32 as with generator as ( select rownum id from dual connect by level <= 1e4 -- > comment to avoid WordPress format issue ) select mod(rownum,128), case when mod(rownum,1021) = 0 then rownum + trunc(dbms_random.value(-256, 256)) end case, rownum, lpad('x',100,'x') padding from generator v1, generator v2 where rownum <= 1048576 -- > comment to avoid WordPress format issue ; create index t1_i1 on t1(hash_col, rare_col) nologging local compress 1 ; begin dbms_stats.gather_table_stats( ownname => null, tabname => 'T1', granularity => 'ALL', method_opt => 'for all columns size 1' ); end; /
I’ve got a hash-partitioned table with 32 partitions; the partitioning key is called hash_col, and there is another column called rare_col that is almost alway null – roughly 1 row in every 1,000 holds a value. I’ve added a local index on (hash_col, rare_col) compressing the leading column since hash_col is very repetitive, and gathered stats on the partitions and table. Here’s a view of the data for a single value of hash_col, and a summary report of the whole data set:
select hash_col, rare_col, count(*) from t1 where hash_col = 63 group by hash_col, rare_col order by hash_col, rare_col ; HASH_COL RARE_COL COUNT(*) ---------- ---------- ---------- 63 109217 1 63 240051 1 63 370542 1 63 501488 1 63 631861 1 63 762876 1 63 893249 1 63 1023869 1 63 8184 9 rows selected. select count(*), ct from ( select hash_col, rare_col, count(*) ct from t1 group by hash_col, rare_col order by hash_col, rare_col ) group by ct order by count(*) ; COUNT(*) CT ---------- ---------- 3 8183 125 8184 1027 1
Given the way I’ve generated the data any one value for hash_col will have there are 8,184 (or 8,183) rows where the rare_col is null; but there are 1027 rows which have a value for both hash_col and rare_col with just one row for each combination.
Now we get to the problem. Whenever rare_col is non null the combination of hash_col and rare_col is unique (though this wasn’t quite the case at the client site) so when we query for a given hash_col and rare_col we would hope that the optimizer would be able to estimate a cardinality of one row; but this is what we see:
variable n1 number variable n2 number explain plan for select /*+ index(t1) */ n1 from t1 where hash_col = :n1 and rare_col = :n2 ; select * from table(dbms_xplan.display); ======================================== -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 908 | 10896 | 76 (0)| 00:00:01 | | | | 1 | PARTITION HASH SINGLE | | 908 | 10896 | 76 (0)| 00:00:01 | KEY | KEY | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1 | 908 | 10896 | 76 (0)| 00:00:01 | KEY | KEY | |* 3 | INDEX RANGE SCAN | T1_I1 | 908 | | 2 (0)| 00:00:01 | KEY | KEY | -------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))
The optimizer has predicted a massive 908 rows. A quick check of the object stats shows us that this is “number of rows in table” / “number of distinct keys in index” (1,048,576 / 1,155, rounded up).
Any row with rare_col set to null cannot match the predicate “rare_col = :n2”, but because the optimizer is looking at the statistics of complete index entries (and there are 1048576 of them, with 1155 distinct combinations, and none that are completely null) it has lost sight of the frequency of nulls for rare_col on its own. (The same problem appears with column groups – which is what I commented on in my previous post on this topic).
I’ve often said in the past that you shouldn’t create histograms on data unless your code is going to use them. In this case I need to stop the optimizer from looking at the index.distinct_keys and one way to do that is to create a histogram on one of the columns that defines the index; and I’ve chosen to do this with a fairly arbitrary size of 10 buckets:
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns rare_col size 10') explain plan for select /*+ index(t1) */ n1 from t1 where hash_col = :n1 and rare_col = :n2 ; select * from table(dbms_xplan.display); ======================================== -------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 2 (0)| 00:00:01 | | | | 1 | PARTITION HASH SINGLE | | 1 | 12 | 2 (0)| 00:00:01 | KEY | KEY | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1 | 1 | 12 | 2 (0)| 00:00:01 | KEY | KEY | |* 3 | INDEX RANGE SCAN | T1_I1 | 1 | | 1 (0)| 00:00:01 | KEY | KEY | -------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))
Bonus observation
This problem came to my attention (and I’ve used a partitioned table in my demonstration) because I had noticed an obvious optimizer error in the client’s execution plan for exactly this simple a query. I can demonstrate the effect the client saw by running the test again without creating the histogram but declaring hash_col to be not null. Immediately after creating the index I’m going to add the line:
alter table t1 modify hash_col not null;
(The client’s system didn’t declare the column not null, but their equivalent of hash_col was part of the primary key of the table which meant it was implicitly declared not null). Here’s what my execution plan looked like with this constraint in place:
-------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 908 | 10896 | 76 (0)| 00:00:01 | | | | 1 | PARTITION HASH SINGLE | | 908 | 10896 | 76 (0)| 00:00:01 | KEY | KEY | | 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1 | 908 | 10896 | 76 (0)| 00:00:01 | KEY | KEY | |* 3 | INDEX RANGE SCAN | T1_I1 | 28 | | 2 (0)| 00:00:01 | KEY | KEY | -------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("HASH_COL"=TO_NUMBER(:N1) AND "RARE_COL"=TO_NUMBER(:N2))
Spot the difference.
The estimate of index rowids is far smaller than the estimate of the rows that will be fetched using those rowids. This is clearly an error.
If you’re wondering how Oracle got this number divide 908 by 32 (the number of partitions in the table) – the answer is 28.375.
Fortunately it’s (probably) an error that doesn’t matter despite looking worryingly wrong. Critically the division hasn’t changed the estimate of the number of table rows (we’ll ignore the fact that the estimate is wrong anyway thanks to a different error), and the cost of the index range scan and table access have not changed. The error is purely cosmetic in effect.
Interestingly if you modify the query to be index-only (i.e. you restrict the select list to columns in the index) this extra division disappears.
Summary
1) If you have a B-tree index where one (or more) of the columns is null for a large fraction of the entries then the optimizer may over-estimate the cardinality of a predicate of the form: “(list of all index columns) = (list of values)” as it will be using the index.distinct_keys in its calculations and ignore the effects of nulls in the individual columns. If you need to work around this issue then creating a histogram on one of the index columns will be sufficient to switch Oracle back to the strategy of multiplying the individual column selectivities.
2) There are cases of plans for accessing partitioned tables where Oracle starts by using table-level statistics to get a suitable set of estimates but then displays a plan with the estimate of rows for an index range scan scaled down by the number of partitions in the table. This results in a visible inconsistency between the index estimate and the table estimate, but it doesn’t affect the cardinality estimate for the table access or either of the associated costs – so it probably doesn’t have a destabilising effect on the plan.
[…] A threat from nulls in indexes (Dec 2018): Sometimes you have to stop the optimizer using index.distinct_keys for cardinality estimates […]
Pingback by Column Group Catalog | Oracle Scratchpad — December 14, 2018 @ 7:06 pm GMT Dec 14,2018 |
Thank you Jonathon—Oracle has placed the hybrid histogram in my 12.1 database —-Can you please let me know how we calculate column selectivity for the rare_col with the hybrid histogram?
Comment by bhavani — December 16, 2018 @ 2:46 am GMT Dec 16,2018 |
bhavani,
Rather than write a new article to answer your question I did a quick search on Google for oracle hybrid histogram selectivity and the first hit was an article by Mohamed Houri on the redgate website. You might want to start by reading that: https://www.red-gate.com/simple-talk/sql/oracle/12c-hybrid-histogram/
Comment by Jonathan Lewis — December 16, 2018 @ 6:15 pm GMT Dec 16,2018 |