A question came up on the Oracle-L list server a couple of days ago expressing some surprise at the following execution plan:
-------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 845 (100)| | | 1 | SORT AGGREGATE | | 1 | 7 | | | |* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| ANY_TABLE | 84827 | 579K| 845 (1)| 00:00:01 | | 3 | SORT CLUSTER BY ROWID | | 68418 | | 76 (0)| 00:00:01 | |* 4 | INDEX RANGE SCAN | ANY_INDEX | 68418 | | 76 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("X"."ANY_COLUMN1"='J') 4 - access("X"."ANY_COLUMN2"=89155)
You’ll notice that this is a very simple query accessing a table by index, yet the estimated table rows found exceeds the estimated number of index entries used to probe the table. How can this happen. The answer (most frequently) is that there’s a mismatch between the table (or, more commonly, column) statistics and the index statistics. This seems to happen very frequently when you start mixing partitioned tables with global (or globally partitioned) indexes but it can happen in very simple cases, especially since a call to gather_table_stats() with cascade set to true and using the auto_sample_size will take a small sample from the index while using a 100% “sample” from the table.
Here’s an example I engineered very quickly to demonstrate the point. There’s no particular reason for the choice of DML I’ve used on the data beyond a rough idea of setting up a percentage of nulls and deleting a non-uniform pattern of rows.
rem rem Script: table_index_mismatch.sql rem Author: Jonathan Lewis rem Dated: Nov 2019 rem rem Last tested rem 19.3.0.0 rem 12.2.0.1 rem create table t1 as with generator as ( select rownum id from dual connect by level <= 1e4 -- > comment to avoid WordPress format issue ) select rownum id, mod(rownum,1000) n1, mod(rownum,1000) n2, lpad('x',100,'x') padding from generator v1, generator v2 where rownum <= 1e6 -- > comment to avoid WordPress format issue ; begin dbms_stats.gather_table_stats( ownname => null, tabname => 'T1', method_opt => 'for all columns size 1, for columns (n1,n2) size 1' ); end; / create index t1_i1 on t1(n1); delete from t1 where mod(trunc(sqrt(n1)),7) = 0; update t1 set n1 = null where mod(n1,10) = 0; delete from t1 where mod(n1,10) = trunc(dbms_random.value(0,10)); execute dbms_stats.gather_table_stats(user,'t1',estimate_percent=>1) execute dbms_stats.gather_index_stats(null,'t1_i1',estimate_percent=> 0.01)
Of course you’re not supposed to collect stats with arbitrary samples in any recent version of Oracle, so going for a 1% and 0.01% sample seems a little daft but I’m just doing that to demonstrate the problem with a very small data set.
After generating the data and gathering the stats I ran a few queries to pick out some critical numbers.
select table_name, sample_size, num_rows from user_tables where table_name = 'T1' / select index_name, sample_size, num_rows, distinct_keys from user_indexes where table_name = 'T1' and index_name = 'T1_I1' / select column_name, sample_size, num_nulls, num_distinct from user_tab_cols where table_name = 'T1' and ( column_name = 'N1' or virtual_column = 'YES' ) order by column_name /
You’ll notice that I’ve only picked one of my original columns and any virtual columns. My gather_table_stats() call had a method_opt that included the creation of extended stats for the column group (n1, n2) and I want to report the stats on the resulting virtual column.
TABLE_NAME SAMPLE_SIZE NUM_ROWS -------------------- ----------- ---------- T1 7865 786500 INDEX_NAME SAMPLE_SIZE NUM_ROWS DISTINCT_KEYS -------------------- ----------- ---------- ------------- T1_I1 385779 713292 714 COLUMN_NAME SAMPLE_SIZE NUM_NULLS NUM_DISTINCT -------------------------------- ----------- ---------- ------------ N1 7012 85300 771 SYS_STUBZH0IHA7K$KEBJVXO5LOHAS 7865 0 855
A couple of observations on the stats
- the table sample size is, as expected, 1% of the reported num_rows (the actual count is 778,154).
- The index sample size is much bigger than expected – but that’s probably related to the normal “select 1,100 leaf blocks strategy”. Because of the skew in the pattern of deleted values it’s possible for the sample size in this model to vary between 694,154 and something in the region of 380,000.
- The n1 sample size is about 10% smaller than the table sample size – but that’s because I set 10% of the column to null.
- The column group sample size matches the table sample size because column group hash values are never null, even if an underlying column is null.
So let’s check the execution plan for a very simple query:
set autotrace on explain select id from t1 where n1 = 140 and n2 = 140; set autotrace off --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 920 | 11960 | 918 (1)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 920 | 11960 | 918 (1)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_I1 | 909 | | 5 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N2"=140) 2 - access("N1"=140)
The estimate for relevant index rowids is smaller than the estimate for the number of table rows! The difference is not as extreme as the case reported on Oracle-l, but I’m only trying to demonstrate a principle, not reproduce the exact results.
There are several ways in which contradictory results like this can appear – but in this case we can see the following:
- For the table access table.num_rows/column.num_distinct = 786,500 / 855 = 919.88 (using the column group num_distinct)
- For the index range scan: (table.num_rows – column.num_nulls) / column.num_distinct = (786500 – 85300) / 771 = 909.47 (using the n1 statistics)
So the change in strategy as it becomes possible for the optimizer to take advantage of the column group means the index and table have been using incompatible sets of stats (in particular there’s that loss of information about NULLs) as their cardinalities are calculated. The question, then, is “how much is that likely to matter”, and the follow-up if it can matter is “in what circumstancs could the effect be large enough to cause problems”. But that’s a topic for another day.
Update / Footnote
In the case of the Oracle-l example, there was no column group, and in some cases the optimizer would produce a plan where the table estimate was much smaller than the index estimate, and in other cases (like the opening plan above) the table estimate was signficantly greater than the index estimate. This was a side effect of adaptive statistics: the low table estimate was due to the basic “multiply separate selectivities”; but the with adaptive statistics enabled Oracle started sampling the table to check the correlation between the two tables, and then produced an SQL Plan Directive to do so and got to the higher (and correct) result.
Comments and related questions are welcome.