Oracle Scratchpad

June 28, 2016

Index Sanity

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 8:43 am BST Jun 28,2016

By popular demand (well, one person emailed me to ask for it) I’m going to publish the source code for a little demo I’ve been giving since the beginning of the millennium – it concerns indexes and the potential side effects that you can get when you drop an index that you’re “not using”. I think I’ve mentioned the effect several times in the history of this blog, but I can’t find an explicit piece of demo code, so here it is – starting at the conclusion – as a cut and paste from an SQL*Plus session running against an 11g instance:


SQL> set autotrace traceonly explain
select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1184213596

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   484 | 64856 |   227   (2)| 00:00:02 |
|*  1 |  HASH JOIN          |      |   484 | 64856 |   227   (2)| 00:00:02 |
|*  2 |   HASH JOIN         |      |   484 | 57596 |    14   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T2   |    20 |  1160 |     4   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |   484 | 29524 |    10   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   |  5000 | 75000 |   213   (2)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T3"."ID"="T1"."ID1")
   2 - access("T2"."ID1"="T1"."ID1" AND "T2"."IND_PAD"="T1"."IND_PAD"
              AND "T2"."ID2"="T1"."ID2")
   4 - filter("T1"."N1"<=50 AND "T1"."N1">=40)

SQL> drop index t2_i1;

Index dropped.

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2290830436

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    52 |  6968 |    67   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |    52 |  6968 |    67   (2)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |    52 |  6968 |    67   (2)| 00:00:01 |
|*  3 |    HASH JOIN                 |       |    52 |  6188 |    14   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL        | T2    |    20 |  1160 |     4   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL        | T1    |   484 | 29524 |    10   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID| T3    |     1 |    15 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."ID1"="T1"."ID1" AND "T2"."IND_PAD"="T1"."IND_PAD" AND
              "T2"."ID2"="T1"."ID2")
   5 - filter("T1"."N1"<=50 AND "T1"."N1">=40)
   6 - access("T3"."ID"="T1"."ID1")

Starting from the top – I’ve enabled autotrace which, technically, could mean that the plans are not the ones I’d see at run-time, but you can take my word for it that in 11g they are the run-time plans; then I’ve supplied a query that produces a plan with 3 full tablescans, two hash joins, and no index usage at all.

You’ll notice at operation 3 of the plan that table t2 is very small – only 20 rows selected, with no predicates that could have filtered that result down from a large table (take my word for it the stats have just been collected) so, as the ancient mythology would have it, we don’t really need an index on that table (a quick check tells me that the index wasn’t there to enforce uniqueness). Immediately after the first execution plan you can see that I’ve dropped an index called t2_i1 – trust me that IS the index on table t2.

We “run” the original query again, it gets re-optimised (and there’s no question of cardinality feedback or any other feature coming into play) and we get a different plan.

Dropping, or adding, a multi-column index to a table could change execution plans – even if the index is not present in the plan.

The reason for this is the “index sanity check”. When the optimizer is doing its cardinality estimates, if it see equality conditions on the set of columns that make up an index it can use the distinct_keys statistic from the index in the calculation rather than using the standard calculation of multiplying together the num_distinct of the separate columns. In earlier versions of Oracle there were some restrictions about uniqueness, but the limitations were removed in 11.1.0.7.

In my case there were 10 distinct values for id1, just one value for ind_pad, and 20 distinct values for id2 – but a total of only 20 distinct values for the combination. With an index in place on the combination the optimizer used the value 20 in its calculation, in the absence of the index it used the value 200 – that factor of 10 led to a drop in the join cardinality estimate from 484 rows to 52 rows – at which point the optimizer calculations made the next step in the plan change from a hash join to a nested loop join.

If you want to reproduce the demo, here’s the full script – the data isn’t a realistic data set, and I’ve had to use various non-standard settings to make the script as repeatable as possible – I’ve built the data set in a tablespace using an 8KB block size, 1MB uniform extents and manual (freelist) segment space management.


rem
rem     Script:         index_sanity.sql
rem     Author:         Jonathan Lewis
rem

drop table t3;
drop table t2;
drop table t1;

execute dbms_random.seed(0);

begin   
        begin           execute immediate 'purge recyclebin';
        exception       when others then null;
        end; 

        begin
                dbms_stats.set_system_stats('MBRC',16);
                dbms_stats.set_system_stats('MREADTIM',10);
                dbms_stats.set_system_stats('SREADTIM',5);
                dbms_stats.set_system_stats('CPUSPEED',1000);
        exception
                when others then null;
        end;

end;
/

create table t1
as
select
        mod(rownum,10)          id1,
        mod(rownum,20)          id2,
        rpad('x',40,'x')        ind_pad,
        mod(rownum,100)         n1,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        all_objects
where
        rownum  <= 4000
;

create table t2 
pctfree 99
pctused 1
as
select
        mod(rownum,10)          id1,
        mod(rownum,20)          id2,
        rpad('x',40,'x')        ind_pad,
        mod(rownum,100)         n1, 
        lpad(rownum,10,'0')     small_vc,
        rpad('x',200)           padding
from
        all_objects
where
        rownum <= 20
;

create table t3
pctfree 95
pctused 1
as
select
        rownum          id,
        rpad(rownum,10) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 5000
;
begin
        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T2',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T3',
                method_opt => 'for all columns size 1'
        );

end;
/

create        index t1_i1 on t1(id1, ind_pad, id2) pctfree 91;
create        index t2_i1 on t2(id1, ind_pad, id2) pctfree 91;
alter table t3 add constraint t3_pk primary key (id);

set autotrace traceonly explain

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
;

-- alter index t1_i1 invisible;
-- alter index t2_i1 invisible;

drop index t1_i1;
-- drop index t2_i1;

accept X prompt "Press return to coninue"

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
;

set autotrace off

You’ll notice from the commented lines in the above that the effect appears whether you drop the index or make it invisible, also that there’s a similar index on the t1 table that matches the index on the t2 table – I could get the effect from dropping or making invisible either index.

There is a saving grace in 11g – if I do drop, or make invisible, one of these indexes I can protect myself against the statistical effect by create a column group on the same set of columns, and the num_distinct from the column group would serve the same purpose as the distinct_keys from the index.

12 Comments »

  1. Hi Jonathan,
    interesting article. thanks for it.

    I have one question.
    We have couple of invisible indexes(invisible since Jan this year) in our PROD DB . We plan to drop it in few weeks time, can this dropping of these invisible indexes cause any plans ( for SELECT queries, we are not worried about the DMLs) to change to compare to what the plans are right now?

    I am not talking about comparision with pre-Jan plans.

    Regds,
    Kunwar

    Comment by kunwar — June 28, 2016 @ 8:02 pm BST Jun 28,2016 | Reply

    • Kunwar,

      I have a long list of notes about this dating over several years, and according to the notes 11.1.0.7 WILL still be using the index stats even though the indexes are invisible – this is considered a bug (so maybe there’s a patch for it if you’re on that version).

      More importantly, though not the focus of your question, invisible indexes will protect you from the “foreign key locking” problem, even up to 12.1.0.2

      Comment by Jonathan Lewis — June 28, 2016 @ 8:34 pm BST Jun 28,2016 | Reply

      • We are on 11.2.0.4.0.

        Can you please share which notes are referring to? I will go over them :) for my understanding and knowledge.

        I didnt understand your last comment about foreign key locking problem

        Comment by kunwar — June 28, 2016 @ 9:43 pm BST Jun 28,2016 | Reply

        • Kunwar,

          I don’t publish everything I write, it takes too much time to rationalise, summarise, and make safe for general consumption.

          I’m surprised that I don’t seem to have a note with an explicit description of the foreign key locking issue; I’ve mentioned it in passing a few times, though.

          In outline – if you have a referential integrity constraint on table CHILD that references table PARENT then an update to, or delete of, a parent key will lock the child table in mode 4 or 5 (in order to check that there are no related child rows) unless the child table has an index starting with the foreign key columns.

          Update: I’ve just discovered I published a pointer to some comments I made about this on OTN.

          Comment by Jonathan Lewis — June 28, 2016 @ 10:09 pm BST Jun 28,2016

  2. Hi Jonathan,
    First, I wanted to let you know that I just used this script on 12.1.0.2 and it uses adaptive plans and gets the full table scan plan (as with the index). autotrace showed the same behavior as yours, but dbms_xplan showed the adaptive plans with the full table scans.
    Second, I have no way to test it in 11g but was wondering if extended statistics on these 3 columns will fix it. What do you think?

    Comment by amitzil — June 29, 2016 @ 12:16 am BST Jun 29,2016 | Reply

    • Liron,

      When running this test on 12.1, disabling the adaptive optimization features was one of the necessary steps (not visible in the code above) that I took to reproduce the effect (the whole example is extremely artificial simply to demonstrate the point). 12c would otherwise create an adaptive execution plan because of the multi-column predicates with their threat of correlation between columns in the same table.

      I’ve used the expression “column group” in the final paragraph rather than “extended statistics” but replacing any (multi-column) index with extended stats on the same set of columns addresses the problem.

      I probably should have reminded people at that point, though, that when individual high values go out of range column group stats are ignored – so there’s a stability effect with column groups that wouldn’t be there with indexes.

      Having said that, though, the default gathering of table stats generally uses a small sample to gather stats on the indexes, which introduces a point of instability to the distinct_keys value for the indexes :(

      Heads you win, tails you lose.

      Comment by Jonathan Lewis — June 29, 2016 @ 8:01 am BST Jun 29,2016 | Reply

  3. Hi, Jonathan:

    Just wondering whether or not you have tested this scenario in Rel. 12.1.0.2, which contains the fix for Bug 16544878.

    Sincerely,

    Ted

    Comment by Ted Persky — July 7, 2016 @ 2:52 pm BST Jul 7,2016 | Reply

  4. Oh, sorry. Never mind. (Just read the above comments.)

    Comment by Ted Persky — July 7, 2016 @ 2:55 pm BST Jul 7,2016 | Reply

    • Ted,

      Don’t worry – that bug is worth mentioning because it’s a whole new option for running into surprises.
      I’ve written a quick blog note about it (which has pinged back to here).

      Comment by Jonathan Lewis — July 7, 2016 @ 5:29 pm BST Jul 7,2016 | Reply

  5. […] Oracle Midlands event someone asked me if Oracle would use the statistics on invisible indexes for the index sanity check. I answered that there had been a bug in the very early days of invisible indexes when the […]

    Pingback by Invisible Bug | Oracle Scratchpad — July 7, 2016 @ 5:27 pm BST Jul 7,2016 | Reply

  6. […] There is also a good example of the impact this can have on the query execution plan(switching join method from hash join to nested loop due to the drop of the estimated number of rows returned).Link […]

    Pingback by Run-time side effect of dropping a composite unused index | Hatem Mahmoud Oracle's blog — August 9, 2016 @ 11:24 am BST Aug 9,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.