Oracle Scratchpad

January 3, 2013

Skip Scan 2

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 6:42 pm BST Jan 3,2013

Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
	small_vc
from	t1
where
	addr_id0050 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
	small_vc
from	t1
where
	addr_id2500 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.

The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,2500)				addr_id2500,
	mod(rownum,50)					addr_id0050,
	trunc(sysdate) + trunc(mod(rownum,2501)/3)	effective_date,
	lpad(rownum,10,'0')				small_vc,
	rpad('x',100)					padding
from
	generator	v1,
	generator	v2
where
	rownum <= 250000
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

-- collect stats, no histograms, 11g auto sample size

I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.

6 Comments »

  1. Jonathan,

    very interesting.

    It seems to me that the predicate section for the skip scan is a little bit misleading. With a NO_INDEX_SS-Hint I get (in 11.1.0.7) the same predicates including the filter on effective_date:

    select /*+ opt_param('_optimizer_cost_model','io') NO_INDEX_SS(t1)*/
        small_vc
    from    t1
    where
        addr_id0050 between 24 and 26
    and effective_date = to_date('30.01.2013', 'dd.mm.yyyy');
    
    ------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
    ------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |    24 |   528 |    89 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    24 |   528 |    89 |
    |*  2 |   INDEX RANGE SCAN          | T1_I0050 |    24 |       |    64 |
    ------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE('
                  2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss'))
    

    So in this case the engine has to read the index for the range addr_id0050 between 24 and 26 and then only uses the entries with the fitting effective_date to do the table access. Autotrace tells me that the query uses 53 consistent gets – while the skip scan only uses 27 consistent gets. For the skip scan I assume the engine reads only the relevant parts of the index structure (i.e. the leaf_blocks with the fitting effective_date) – and so the filter predicate on effective_date seems to me to be redundant.

    When I change your query to use a IN-list instead of the range I get a plan with more plausible predicates and I assume the internal work is very similar to the skip scan case (at least I also get 27 consistent gets):

    select /*+ opt_param('_optimizer_cost_model','io') */
        small_vc
    from    t1
    where
        addr_id0050 in (24, 25, 26)
    and effective_date = to_date('30.01.2013', 'dd.mm.yyyy')
    
    -------------------------------------------------------------------------
    | Id  | Operation                    | Name     | Rows  | Bytes | Cost  |
    -------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |          |    18 |   396 |    23 |
    |   1 |  INLIST ITERATOR             |          |       |       |       |
    |   2 |   TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
    |*  3 |    INDEX RANGE SCAN          | T1_I0050 |    18 |       |     5 |
    -------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - access(("ADDR_ID0050"=24 OR "ADDR_ID0050"=25 OR
                  "ADDR_ID0050"=26) AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
                  'syyyy-mm-dd hh24:mi:ss'))
    

    Regards

    Martin

    Comment by Martin Preiss — January 4, 2013 @ 9:15 am BST Jan 4,2013 | Reply

    • Martin,

      Thanks for adding in the extra demonstrations. The “no_index_ss()” is particularly significant – one of my first suggestions for addressing the question “why isn’t the optimizer doing X” is to tell people to put in the hints to make it “do X” and maybe you’ll see what it didn’t like about that plan. Seeing what Oracle does with this example (and the cost) when you block the skip scan is very instructive.

      It’s interesting to note, by the way, that the cost of your in-list example is the same as the cost of my skip scan example (now that I’ve posted the plan with CPU costing disabled, which I didn’t originally). This is partly because of perfect match between the optimizer’s model and the data, of course, but mainly it’s because that’s the way it really ought to be costed.

      I can’t make up my mind about the suitability of the predicates – I’ve certainly been helped in the past by seeing an “unexpected” predicate in the FILTER section when I’ve only been expecting to see it in the ACCESS section, so I quite like the little warning it gives. As you say, though, it doesn’t really look right; maybe it’s just a side effect of applying the general skip scan presentation rule to a very special boundary case.

      Comment by Jonathan Lewis — January 4, 2013 @ 1:18 pm BST Jan 4,2013 | Reply

  2. Thank You Jonathan.

    With Regards Dillip

    Comment by Dillip Das — January 5, 2013 @ 4:48 am BST Jan 5,2013 | Reply

  3. Hi Jonathan
    There is no difference between both query and the database is oracle 11gr2.

    Plan hash value: 600644705
     
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |     7 |   308 |    11   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     7 |   308 |    11   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_I0050 |     5 |       |     6   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - access("ADDR_ID0050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID0050"=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(" 2013-01-30 00:00:00", "syyyy-mm-dd 
                  hh24:mi:ss"))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    

    Comment by Selvam — January 6, 2013 @ 3:30 pm BST Jan 6,2013 | Reply

    • Selvam,

      The closing comment in the article is: “you may need to tweak the code”.

      In your case you’ve got system statistics enabled (which I can tell from the appearance of a time column in the plan).

      Your comment did remind me, though, that I hadn’t mentioned anything about collecting stats – which I had with a call to dbms_stats.gather_table_stats() – I’ve add a note to that effect in the code. (Your plans show dynamic sampling which, again, could have affected the outcome to hide the skip scan).

      One thing to bear in mind, you’ve said “11.2″ – unfortunately little details of the optimizer behaviour can change on point releases, so any time you highlight something that appears to be an inconsistency in behaviour you need to quote the full version number. (There are some interesting changes between 11.2.0.1 and 11.2.0.3)

      Comment by Jonathan Lewis — January 6, 2013 @ 3:44 pm BST Jan 6,2013 | Reply

  4. Plan hash value: 3850733051
     
    ----------------------------------------------------------------------------------------
    | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |          |     7 |   308 |    11   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     7 |   308 |    11   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | T1_I2500 |     5 |       |     6   (0)| 00:00:01 |
    ----------------------------------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------
     
       2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 
                  00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
           filter("EFFECTIVE_DATE"=TO_DATE(" 2013-01-30 00:00:00", "syyyy-mm-dd 
                  hh24:mi:ss"))
     
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    

    Comment by Selvam — January 6, 2013 @ 3:31 pm BST Jan 6,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,530 other followers