Oracle Scratchpad

January 3, 2013

Skip Scan 2

Filed under: CBO,Index skip scan,Indexing,Oracle — Jonathan Lewis @ 6:42 pm GMT Jan 3,2013

Here’s a question that is NOT a trick question, it was prompted by a question on the OTN database forum and demonstrates an example of optimizer behaviour that might come as a surprise – a clever use of an index skip scan when all the columns in an index appear as predicates in a query.

I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
        small_vc
from    t1
where
        addr_id0050 between 24 and 26
and     effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
        small_vc
from    t1
where
        addr_id2500 between 24 and 26
and     effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produced the index range that we expected.

The index skip scan isn’t just for cases where the first column(s) of an index is/are missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:


rem
rem     Script:         skip_scan_anomaly_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2013
rem
rem     Last tested
rem             19.3.0.0
rem             11.2.0.3
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum  id
        from    all_objects
        where   rownum <= 3000 -- > comment to avoid WordPress format issue
)
select
        mod(rownum,2500)                                addr_id2500,
        mod(rownum,50)                                  addr_id0050,
        trunc(sysdate) + trunc(mod(rownum,2501)/3)      effective_date,
        lpad(rownum,10,'0')                             small_vc,
        rpad('x',100)                                   padding
from
        generator       v1,
        generator       v2
where
        rownum <= 250000 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size skewonly',
                cascade          => true
        );
end;
/

column  d1      new_value m_date

select
        to_char(trunc(sysdate)+28,'dd-mon-yyyy') d1
from
        dual
;

I had disabled CPU costing to get repeatable results – depending on parameter settings and whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.

Update (Feb 2021)

Since this topic came up on a twitter conversation recently I thought I’d update the tests to 19c (19.3.0.0 specifically). I’ve made three changes to the script – first I’ve allowed CPU costing to come into play with my “standard” values, and I’ve changed (corrected) the call to gather stats which had method_opt set to “size skewonly” to use “size 1”. Finally I’ve add the little bit of code I used to generate the literal date that I used in the queries.

In the output below I’ve included a third execution plan, which is one where I’ve blocked the use of the skip scan on id0050 with the hint /*+ no_index_ss(t1) */ to show that the optimizer had calculated a for the range scan but found it to be more expensive than the skip scan.

Here are the system stats that I set for routine tests:

begin
        dbms_stats.set_system_stats('MBRC',16);
        dbms_stats.set_system_stats('MREADTIM',10);
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',500);
exception
        when others then null;
end;
/

And here are the three plans – with titles, and a little cosmetic editing:

===============================
Default plan with column id0050
===============================

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |    24 |   528 |    34   (6)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |    24 |   528 |    34   (6)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN                   | T1_I0050 |    24 |       |     9  (23)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


=======================================
no_index_ss(t1) plan with column id0050
=======================================

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |    24 |   528 |    91   (3)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |    24 |   528 |    91   (3)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I0050 |    24 |       |    66   (4)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


===============================
Default plan with column id2500
===============================

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |    23 |     5   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |    23 |     5   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I2500 |     1 |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00',
              'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2021-03-15 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))


As you can see in the first two plan, the change in the call to gather_table_stats has made a bit of a difference to the cardinality (Rows) estimate, but we still have a skip scan that has a much lower cost than the corresponding range scan.

6 Comments »

  1. Jonathan,

    very interesting.

    It seems to me that the predicate section for the skip scan is a little bit misleading. With a NO_INDEX_SS-Hint I get (in 11.1.0.7) the same predicates including the filter on effective_date:

    select /*+ opt_param('_optimizer_cost_model','io') NO_INDEX_SS(t1)*/
        small_vc
    from    t1
    where
        addr_id0050 between 24 and 26
    and effective_date = to_date('30.01.2013', 'dd.mm.yyyy');
    
    ------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
    ------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |    24 |   528 |    89 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    24 |   528 |    89 |
    |*  2 |   INDEX RANGE SCAN          | T1_I0050 |    24 |       |    64 |
    ------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE('
                  2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss'))
    

    So in this case the engine has to read the index for the range addr_id0050 between 24 and 26 and then only uses the entries with the fitting effective_date to do the table access. Autotrace tells me that the query uses 53 consistent gets – while the skip scan only uses 27 consistent gets. For the skip scan I assume the engine reads only the relevant parts of the index structure (i.e. the leaf_blocks with the fitting effective_date) – and so the filter predicate on effective_date seems to me to be redundant.

    When I change your query to use a IN-list instead of the range I get a plan with more plausible predicates and I assume the internal work is very similar to the skip scan case (at least I also get 27 consistent gets):

    select /*+ opt_param('_optimizer_cost_model','io') */
        small_vc
    from    t1
    where
        addr_id0050 in (24, 25, 26)
    and effective_date = to_date('30.01.2013', 'dd.mm.yyyy')
    
    -------------------------------------------------------------------------
    | Id  | Operation                    | Name     | Rows  | Bytes | Cost  |
    -------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |          |    18 |   396 |    23 |
    |   1 |  INLIST ITERATOR             |          |       |       |       |
    |   2 |   TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
    |*  3 |    INDEX RANGE SCAN          | T1_I0050 |    18 |       |     5 |
    -------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - access(("ADDR_ID0050"=24 OR "ADDR_ID0050"=25 OR
                  "ADDR_ID0050"=26) AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss'))
    

    Regards

    Martin

    Comment by Martin Preiss — January 4, 2013 @ 9:15 am GMT Jan 4,2013 | Reply

    • Martin,

      Thanks for adding in the extra demonstrations. The “no_index_ss()” is particularly significant – one of my first suggestions for addressing the question “why isn’t the optimizer doing X” is to tell people to put in the hints to make it “do X” and maybe you’ll see what it didn’t like about that plan. Seeing what Oracle does with this example (and the cost) when you block the skip scan is very instructive.

      It’s interesting to note, by the way, that the cost of your in-list example is the same as the cost of my skip scan example (now that I’ve posted the plan with CPU costing disabled, which I didn’t originally). This is partly because of perfect match between the optimizer’s model and the data, of course, but mainly it’s because that’s the way it really ought to be costed.

      I can’t make up my mind about the suitability of the predicates – I’ve certainly been helped in the past by seeing an “unexpected” predicate in the FILTER section when I’ve only been expecting to see it in the ACCESS section, so I quite like the little warning it gives. As you say, though, it doesn’t really look right; maybe it’s just a side effect of applying the general skip scan presentation rule to a very special boundary case.

      Comment by Jonathan Lewis — January 4, 2013 @ 1:18 pm GMT Jan 4,2013 | Reply

  2. Thank You Jonathan.

    With Regards Dillip

    Comment by Dillip Das — January 5, 2013 @ 4:48 am GMT Jan 5,2013 | Reply

  3. Hi Jonathan
    There is no difference between both query and the database is oracle 11gr2.

    Plan hash value: 600644705
     
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |     7 |   308 |    11   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     7 |   308 |    11   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_I0050 |     5 |       |     6   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(" 2013-01-30 00:00:00", "syyyy-mm-dd 
                  hh24:mi:ss"))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    

    Comment by Selvam — January 6, 2013 @ 3:30 pm GMT Jan 6,2013 | Reply

    • Selvam,

      The closing comment in the article is: “you may need to tweak the code”.

      In your case you’ve got system statistics enabled (which I can tell from the appearance of a time column in the plan).

      Your comment did remind me, though, that I hadn’t mentioned anything about collecting stats – which I had with a call to dbms_stats.gather_table_stats() – I’ve add a note to that effect in the code. (Your plans show dynamic sampling which, again, could have affected the outcome to hide the skip scan).

      One thing to bear in mind, you’ve said “11.2” – unfortunately little details of the optimizer behaviour can change on point releases, so any time you highlight something that appears to be an inconsistency in behaviour you need to quote the full version number. (There are some interesting changes between 11.2.0.1 and 11.2.0.3)

      Comment by Jonathan Lewis — January 6, 2013 @ 3:44 pm GMT Jan 6,2013 | Reply

  4. Plan hash value: 3850733051
     
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |     7 |   308 |    11   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     7 |   308 |    11   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_I2500 |     5 |       |     6   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(" 2013-01-30 00:00:00", "syyyy-mm-dd 
                  hh24:mi:ss"))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    

    Comment by Selvam — January 6, 2013 @ 3:31 pm GMT Jan 6,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.