I’ve waxed lyrical in the past about creating suitable column group statistics whenever you drop an index because even when the optimizer doesn’t use an index in its execution path it might have used the number of distinct keys of the index (user_indexes.distinct_keys) in its estimates of cardinality.
I’ve also highlighted various warnings (here (with several follow-on URLs) and here) about when the optimizer declines to use column group statistics. One of those cases is when a predicate on one of the columns goes “out of range” – i.e. is below the column low_value or above the column high_value. Last night it suddenly crossed my mind that if we drop an index and replace it with a column group we might see an example of inconsistent behaviour: what happens when the index exists but the predicate is out of range – would you find that dropping the index and replacing it with a column group would give you different cardinality estimates for out of range predicates ?
Here’s the demonstration of what happened when I created a simple test on 12.1.0.2:
rem rem Script: index_v_colgrp.sql rem Author: Jonathan Lewis rem rem Last tested rem 12.1.0.2 rem create table t1 nologging as with generator as ( select rownum id from dual connect by level <= 1e4 ) select rownum id, mod(rownum-1,100) n1, mod(rownum-1,100) n2, lpad('x',100,'x') padding from generator v1, generator v2 where rownum <= 1e6 -- > comment to avoid WordPress format problem ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; / create index t1_i1 on t1(n1, n2); set autotrace traceonly explain
I’ve created a table with 1M rows, where n1 and n2 are perfectly correlated – there are 100 distinct pairs of values (ranging from (0,0) to (99,99)). Now with autotrace enabled I’ll execute three queries – two with an index on the table of which one will be the baseline plan for predicates that are “in-range” and the other will take the predicates out of range, and the third after I’ve dropped the index and substituted a matching column group to see what I get for the “out of range” plan. The plans may produce different paths as the index disappears, of course, but what we’re only interested in the cardinality estimates in this experiment.
Here’s the code to run the three queries:
select padding from t1 where n1 = 50 and n2 = 50 ; select padding from t1 where n1 = 110 and n2 = 110 ; drop index t1_i1; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for columns (n1, n2) size 1' ); end; / select padding from t1 where n1 = 110 and n2 = 110 ;
And the three execution plans:
-------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 1044K| 2142 (4)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 10000 | 1044K| 2142 (4)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N1"=50 AND "N2"=50) --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 79 | 8453 | 83 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 79 | 8453 | 83 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_I1 | 79 | | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N1"=110 AND "N2"=110) -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 79 | 8453 | 2142 (4)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 79 | 8453 | 2142 (4)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("N1"=110 AND "N2"=110)
In summary:
- With the index in place and the predicates in range the optimizer used user_indexes.distinct_keys to calculate cardinality.
- With the index in place and the predicates (or just one of them, in fact) out of range the optimizer used the individual column selectivities with linear decay.
- With a column group instead of an index the optimizer behaved exactly as it used to with the index in place.
So my concern that substituting column groups for indexes was unfounded – the optimizer was being silly (legal disclaimer: that’s just my opinion) with indexes, and the silly (ditto) behaviour with column groups hasn’t changed anything.
I’ll have to go back a couple of versions of Oracle to repeat these tests – maybe this behaviour with user_indexes.distinct_keys in place is relatively recent, but it’s another reason why execution plans may change suddenly and badly as time passes when “nothing changed”.
Am I missing the create index statement for T1_I1, or?
Comment by rune mørk — March 28, 2017 @ 2:20 pm BST Mar 28,2017 |
Rune,
Thanks for that – now added.
Comment by Jonathan Lewis — March 28, 2017 @ 4:04 pm BST Mar 28,2017 |
Thanks, will test on 11.2, have seen this strange behaviour on 12c, unable to pinpoint the reason, now i might have an explanation.
rgds
Rune
Comment by rune mørk — March 29, 2017 @ 6:57 am BST Mar 29,2017 |
Jonathan,
obviously I am missing the point somewhere, but you write “what happens when the index exists but the predicate is out of range – would you find that dropping the index and replacing it with a column group would give you different cardinality estimates for out of range predicates ?” And to me the result of the second and the third query actually shows that the existence of the index results in another plan (and different cardinalities) than the use of a corresponding column group. So it seems to me that the behaviour of both strategies is not entirely consistent (resulting in a “yes” for the answer to the initial query). Or did I manage to misunderstand the question?
Regards
Martin
Comment by Martin Preiss — March 29, 2017 @ 8:05 am BST Mar 29,2017 |
Martin,
The second and third plans both have a cardinality of 79 (which is the only thing the note was about). Were you looking at the cost, perhaps ?
Comment by Jonathan Lewis — March 29, 2017 @ 9:52 am BST Mar 29,2017 |
Jonathan,
thank you – yes, that’s the point I missed…
Comment by Martin Preiss — March 29, 2017 @ 9:58 am BST Mar 29,2017 |
thanks for sharing this information will definitely test and let you know if it works.
Comment by Sarath Chandra — May 26, 2017 @ 1:18 pm BST May 26,2017 |
[…] Index out of range (Mar 2017): Effects of “out of range” predicates on indexes and column groups […]
Pingback by Column Group Catalog | Oracle Scratchpad — September 27, 2018 @ 5:16 pm BST Sep 27,2018 |