Oracle Scratchpad

May 11, 2018

Skip Scan 3

Filed under: CBO,Index skip scan,Indexing,Oracle — Jonathan Lewis @ 2:26 pm BST May 11,2018

If you’ve come across any references to the “index skip scan” operation for execution plans you’ve probably got some idea that this can appear when the number of distinct values for the first column (or columns – since you can skip multiple columns) is small. If so, what do you make of this demonstration:


rem
rem     Script:         skip_scan_cunning.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

begin
        dbms_stats.set_system_stats('MBRC',16);
        dbms_stats.set_system_stats('MREADTIM',10);
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        rownum                          id1,
        rownum                          id2,
        lpad(rownum,10,'0')             v1,
        lpad('x',150,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format issue
;

create index t1_i1 on t1(id1, id2);

begin
        dbms_stats.gather_table_stats(
                ownname     => user,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

For repeatability I’ve set some system statistics but if you’ve left the system stats to default you should see the same effect. All I’ve done is create a table and an index on that table. The way I’ve defined the id1 and id2 columns means they could individually support unique constraints and the index clearly has 1 million distinct values for id1 in the million index entries. So what execution plan do you think I’m likely to get from the following simple query:


set serveroutput off
alter session set statistics_level = all;

prompt  =======
prompt  Default
prompt  =======

select  id 
from    t1
where   id2 = 999
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last cost'));

You’re probably not expecting an index skip scan to appear but, given the title of this posting, you may have a suspicion that it will; so here’s the plan I got running this test on 12.2.0.1 [update Feb 2021 – nothing has changed in 19.3.0.0]:


SQL_ID  8r5xghdx1m3hn, child number 0
-------------------------------------
select id from t1 where id2 = 999

Plan hash value: 400488565

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  2929 (100)|      1 |00:00:00.17 |    2932 |      5 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |  2929   (1)|      1 |00:00:00.17 |    2932 |      5 |
|*  2 |   INDEX SKIP SCAN                   | T1_I1 |      1 |      1 |  2928   (1)|      1 |00:00:00.17 |    2931 |      4 |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID2"=999)
       filter("ID2"=999)


So an index skip scan doesn’t require a small number of distinct values for the first column of the index (unless you’re running a version older than 11.2.0.2 where a code change appeared that could be disabled by setting fix_control 9195582 off).

When the optimizer doesn’t do what you expect it’s always worth hinting the code to follow the plan you were expecting – so here’s the effect of hinting a full tablescan (which happened to do direct path reads – despite the hint from the Buffers column in the rowsource execution stats that the reads were to cache):

SQL_ID  bxqwhsjwqfm7q, child number 0
-------------------------------------
select  /*+ full(t1) */  id from t1 where id2 = 999

Plan hash value: 3617692013

----------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  3317 (100)|      1 |00:00:00.12 |   25652 |  25635 |
|*  1 |  TABLE ACCESS FULL| T1   |      1 |      1 |  3317   (3)|      1 |00:00:00.12 |   25652 |  25635 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("ID2"=999)

Note that the cost is actually more expensive than the cost of the indexed access path.  For reference you need to know that the blocks statistic for the table was 25,842 while the number of index leaf blocks was 2,922. The latter figure (combined with a couple of other details regarding the clustering_factor and undeclared uniqueness of the index) explains why the cost of the skip scan was only 2,928: the change that appeared in 11.2.0.2 limited the I/O cost of an index skip scan to the total number of leaf blocks in the index.  The tablescan cost (with my system stats) was basically dividing my table block count by 16 (to get the number of multi-block reads) and then doubling (because – see the start of the SQL script – I had set the multiblock read time to be twice the single block read time).

As a quick demo of how older versions of Oracle would behave after setting “_fix_control”=’9195582:OFF’:


SQL_ID	bn0p9072w9vfc, child number 1
-------------------------------------
select	/*+ index_ss(t1) */  id from t1 where id2 = 999

Plan hash value: 400488565

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  1001K(100)|      1 |00:00:00.13 |    2932 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |  1001K  (1)|      1 |00:00:00.13 |    2932 |
|*  2 |   INDEX SKIP SCAN                   | T1_I1 |      1 |      1 |  1001K  (1)|      1 |00:00:00.13 |    2931 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID2"=999)
       filter("ID2"=999)

The cost of the skip scan is now a little over 1,000,000 – corresponding (approximately) to the 1 million index probes that will have to take place. You’ll notice that the number of buffer visits recorded is 2931 for the index operation, though: this is the result of the run-time optimisation that keeps buffers pinned very aggressively for skip scan – you might expect to see a huge number of visits recorded as “buffer is pinned count”, but for some reason that doesn’t happen. The cost is essentially Oracle calculating (with pinned root and branch) the cost of “id1 = {constant} and id2 = 999” and multiplying by ndv(id1).

Footnote:

Ideally, of course, the optimizer ought to work out that an index fast full scan followed by a table access ought to have a lower cost (using multi-block reads rather than walking the index in leaf block order one block at a time (which is what this particular skip scan will have to do) – but that’s not (yet) an acceptable execution plan though in the most recent versions of Oracle it can now appear as a plan for deleting data.

tl;dr

If you have an index that is very much smaller than the table you may find examples where the optimizer does what appears to be an insanely stupid index skip scan when you were expecting a tablescan or, possibly, some other less efficient index to be used. There is a rationale for this, but such a plan may be much more CPU and read intensive than it really ought to be.

 

2 Comments »

  1. Thank you again Jonathan for another wonderful demonstration.
    I have used your test case to check for impact of parallel DML (DELETE) on the table with LOB column….so feel free to ignore as this is not exactly related to this post.
    As expected, DELETE did not run in parallel (am on 11.2.0.4) but oracle did use parallel slaves to access table. However, I was expecting to see corresponding hints in the outline section but could not see it and instead saw something new (to me) called SHARED(12). Any idea about the strange outline hints?

    
    SQL> alter table t1 add (padding_as_clob clob) ;
    
    Table altered.
    
    SQL> alter session enable parallel dml ;
    
    Session altered.
    
    SQL> delete /*+ parallel(12) */ from t1 where id2 = 999 ;
    
    1 row deleted.
    
    SQL> rollback ;
    
    Rollback complete.
    
    SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last advanced')) ;
    
    PLAN_TABLE_OUTPUT
    -------------------------------------
    SQL_ID  0vh9g3317vgv9, child number 1
    -------------------------------------
    delete /*+ parallel(12) */ from t1 where id2 = 999
    
    Plan hash value: 3718066193
    
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name     | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | DELETE STATEMENT      |          |      0 |        |       |   649 (100)|          |        |      |            |      0 |00:00:00.01 |       0 |      0 |
    |   1 |  DELETE               | T1       |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |
    |   2 |   PX COORDINATOR      |          |      0 |        |       |            |          |        |      |            |      0 |00:00:00.01 |       0 |      0 |
    |   3 |    PX SEND QC (RANDOM)| :TQ10000 |      0 |      1 |    10 |   649   (1)| 00:00:08 |  Q1,00 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |
    |   4 |     PX BLOCK ITERATOR |          |      1 |      1 |    10 |   649   (1)| 00:00:08 |  Q1,00 | PCWC |            |      0 |00:00:00.10 |    2505 |   2475 |
    |*  5 |      TABLE ACCESS FULL| T1       |     15 |      1 |    10 |   649   (1)| 00:00:08 |  Q1,00 | PCWP |            |      0 |00:00:00.01 |    2505 |   2475 |
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    
       1 - DEL$1
       5 - DEL$1 / T1@DEL$1
    
    Outline Data
    -------------
    
      /*+
          BEGIN_OUTLINE_DATA
          IGNORE_OPTIM_EMBEDDED_HINTS
          OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
          DB_VERSION('11.2.0.4')
          ALL_ROWS
          SHARED(12)
          OUTLINE_LEAF(@"DEL$1")
          FULL(@"DEL$1" "T1"@"DEL$1")
          END_OUTLINE_DATA
      */
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       5 - access(:Z>=:Z AND :Z<=:Z)
           filter("ID2"=999)
    
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
    
       2 - (cmp=3; cpy=2) "T1".ROWID[ROWID,10], "T1"."ID1"[NUMBER,22], "ID2"[NUMBER,22]
       3 - (#keys=0) "T1".ROWID[ROWID,10], "T1"."ID1"[NUMBER,22], "ID2"[NUMBER,22]
       4 - "T1".ROWID[ROWID,10], "T1"."ID1"[NUMBER,22], "ID2"[NUMBER,22]
       5 - "T1".ROWID[ROWID,10], "T1"."ID1"[NUMBER,22], "ID2"[NUMBER,22]
    
    Note
    -----
       - Degree of Parallelism is 12 because of hint
    
    

    Comment by Narendra — April 23, 2020 @ 9:10 pm BST Apr 23,2020 | Reply

    • Narendra,

      Thanks for the comment.

      I’ve mentioned the shared() hint from time to time (though the only reference I can find on the blog at present is a throwaway comment here). It’s just the form of the parallel() hint that gets into outlines / baselines and the SQL that gets sent to remote databases occasionally. I’ve never looked too closely at when and why it appears.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — April 23, 2020 @ 9:44 pm BST Apr 23,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.