Here’s a question that is NOT a trick question, it was prompted by a question on the OTN database forum and demonstrates an example of optimizer behaviour that might come as a surprise – a clever use of an index skip scan when all the columns in an index appear as predicates in a query.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:
define m_date='30-Jan-2013' select small_vc from t1 where addr_id0050 between 24 and 26 and effective_date = to_date('&m_date', 'dd-mon-yyyy') ; ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 18 | 396 | 23 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 18 | 396 | 23 | |* 2 | INDEX SKIP SCAN | T1_I0050 | 18 | | 5 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30' 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26) filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?
To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.
select small_vc from t1 where addr_id2500 between 24 and 26 and effective_date = to_date('&m_date', 'dd-mon-yyyy') ; ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 5 | | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 23 | 5 | |* 2 | INDEX RANGE SCAN | T1_I2500 | 1 | | 4 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26) filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
In this case, with the same starting predicate (but referencing the alternative column), the optimizer produced the index range that we expected.
The index skip scan isn’t just for cases where the first column(s) of an index is/are missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.
As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.
So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.
When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.
If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:
rem rem Script: skip_scan_anomaly_2.sql rem Author: Jonathan Lewis rem Dated: Jan 2013 rem rem Last tested rem 19.3.0.0 rem 11.2.0.3 rem create table t1 as with generator as ( select --+ materialize rownum id from all_objects where rownum <= 3000 -- > comment to avoid WordPress format issue ) select mod(rownum,2500) addr_id2500, mod(rownum,50) addr_id0050, trunc(sysdate) + trunc(mod(rownum,2501)/3) effective_date, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 250000 -- > comment to avoid WordPress format issue ; create index t1_i1 on t1(effective_date); create index t1_i2500 on t1(addr_id2500, effective_date); create index t1_i0050 on t1(addr_id0050, effective_date); begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size skewonly', cascade => true ); end; / column d1 new_value m_date select to_char(trunc(sysdate)+28,'dd-mon-yyyy') d1 from dual ;
I had disabled CPU costing to get repeatable results – depending on parameter settings and whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.
Update (Feb 2021)
Since this topic came up on a twitter conversation recently I thought I’d update the tests to 19c (19.3.0.0 specifically). I’ve made three changes to the script – first I’ve allowed CPU costing to come into play with my “standard” values, and I’ve changed (corrected) the call to gather stats which had method_opt set to “size skewonly” to use “size 1”. Finally I’ve add the little bit of code I used to generate the literal date that I used in the queries.
In the output below I’ve included a third execution plan, which is one where I’ve blocked the use of the skip scan on id0050 with the hint /*+ no_index_ss(t1) */ to show that the optimizer had calculated a for the range scan but found it to be more expensive than the skip scan.
Here are the system stats that I set for routine tests:
begin dbms_stats.set_system_stats('MBRC',16); dbms_stats.set_system_stats('MREADTIM',10); dbms_stats.set_system_stats('SREADTIM',5); dbms_stats.set_system_stats('CPUSPEED',500); exception when others then null; end; /
And here are the three plans – with titles, and a little cosmetic editing:
=============================== Default plan with column id0050 =============================== ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 24 | 528 | 34 (6)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 24 | 528 | 34 (6)| 00:00:01 | |* 2 | INDEX SKIP SCAN | T1_I0050 | 24 | | 9 (23)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26) filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) ======================================= no_index_ss(t1) plan with column id0050 ======================================= ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 24 | 528 | 91 (3)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 24 | 528 | 91 (3)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_I0050 | 24 | | 66 (4)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26) filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) =============================== Default plan with column id2500 =============================== ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 23 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 23 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T1_I2500 | 1 | | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26) filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
As you can see in the first two plan, the change in the call to gather_table_stats has made a bit of a difference to the cardinality (Rows) estimate, but we still have a skip scan that has a much lower cost than the corresponding range scan.
Jonathan,
very interesting.
It seems to me that the predicate section for the skip scan is a little bit misleading. With a NO_INDEX_SS-Hint I get (in 11.1.0.7) the same predicates including the filter on effective_date:
So in this case the engine has to read the index for the range addr_id0050 between 24 and 26 and then only uses the entries with the fitting effective_date to do the table access. Autotrace tells me that the query uses 53 consistent gets – while the skip scan only uses 27 consistent gets. For the skip scan I assume the engine reads only the relevant parts of the index structure (i.e. the leaf_blocks with the fitting effective_date) – and so the filter predicate on effective_date seems to me to be redundant.
When I change your query to use a IN-list instead of the range I get a plan with more plausible predicates and I assume the internal work is very similar to the skip scan case (at least I also get 27 consistent gets):
Regards
Martin
Comment by Martin Preiss — January 4, 2013 @ 9:15 am GMT Jan 4,2013 |
Martin,
Thanks for adding in the extra demonstrations. The “no_index_ss()” is particularly significant – one of my first suggestions for addressing the question “why isn’t the optimizer doing X” is to tell people to put in the hints to make it “do X” and maybe you’ll see what it didn’t like about that plan. Seeing what Oracle does with this example (and the cost) when you block the skip scan is very instructive.
It’s interesting to note, by the way, that the cost of your in-list example is the same as the cost of my skip scan example (now that I’ve posted the plan with CPU costing disabled, which I didn’t originally). This is partly because of perfect match between the optimizer’s model and the data, of course, but mainly it’s because that’s the way it really ought to be costed.
I can’t make up my mind about the suitability of the predicates – I’ve certainly been helped in the past by seeing an “unexpected” predicate in the FILTER section when I’ve only been expecting to see it in the ACCESS section, so I quite like the little warning it gives. As you say, though, it doesn’t really look right; maybe it’s just a side effect of applying the general skip scan presentation rule to a very special boundary case.
Comment by Jonathan Lewis — January 4, 2013 @ 1:18 pm GMT Jan 4,2013 |
Thank You Jonathan.
With Regards Dillip
Comment by Dillip Das — January 5, 2013 @ 4:48 am GMT Jan 5,2013 |
Hi Jonathan
There is no difference between both query and the database is oracle 11gr2.
Comment by Selvam — January 6, 2013 @ 3:30 pm GMT Jan 6,2013 |
Selvam,
The closing comment in the article is: “you may need to tweak the code”.
In your case you’ve got system statistics enabled (which I can tell from the appearance of a time column in the plan).
Your comment did remind me, though, that I hadn’t mentioned anything about collecting stats – which I had with a call to dbms_stats.gather_table_stats() – I’ve add a note to that effect in the code. (Your plans show dynamic sampling which, again, could have affected the outcome to hide the skip scan).
One thing to bear in mind, you’ve said “11.2” – unfortunately little details of the optimizer behaviour can change on point releases, so any time you highlight something that appears to be an inconsistency in behaviour you need to quote the full version number. (There are some interesting changes between 11.2.0.1 and 11.2.0.3)
Comment by Jonathan Lewis — January 6, 2013 @ 3:44 pm GMT Jan 6,2013 |
Comment by Selvam — January 6, 2013 @ 3:31 pm GMT Jan 6,2013 |