Oracle Scratchpad

December 15, 2015

Indexing

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 11:22 am GMT Dec 15,2015

A recent question on the OTN database forum asked:

I have a table with a,b,c,d,e,f,g,h,i,j,k. columns and I have an index on (a,b) columns. There is a sql statement now with “where a= ?” and we are wondering if it could also be good to add a single index on just (a).

Does it help at all? Does it help in some cases?

This is one of those questions where the answer for a perfectly designed and managed system could easily contradict the pragmatic answer for a live system in its current state. That may mean you have to do the wrong thing in the short term while working (possibly very slowly) towards the right thing.  I gave the following (slightly edited) answer on the forum:

The basic answer is that you do NOT need the single column index if you have the two-column index.

The complex answer is that you may have to spend some time and effort ensuring that the two-column index is used in all cases where it would have been appropriate to use the single column index. This may simple mean ensuring the clustering_factor of the index is adjusted suitably so that the optimizer “likes” the index enough it may mean you (also) have to modify some code to include the cluster_by_rowid hint (when you’re at 12c) so that you don’t suffer a performance impact at run-time.

Key factors to consider: the two-column index will be physically larger than the single column index – this will increase the (optimizer’s estimated) cost of using it; the clustering_factor of the two-column index will almost certainly be larger than the clustering_factor of the single column index – this will also increase the (optimizer’s estimated) cost of using it.

These two points are echoed at run-time: the two column index will be bigger so you will have to do more work (though probably not very much more work) to read the relevant rowids and, if you walk the two-column index in order for a given value of the first column, you will visit the table blocks in a different order compared to the order of visits from the single column index – this may result in the query actually doing noticeably more work at run-time.

The change in the index leaf_block count is often insignificant (especially if, as per your example, the number of rows required – hence blocks visited in the table – is large); the impact of the clustering_factor can make a dramatic difference to the cost calculations; but you can often work around this. In 11.2.0.4, particularly, you can use the dbms_stats.set_table_prefs() call to set the ‘table_cached_blocks’ parameter for a table so that all its indexes look more desirable to the optimizer.

Bottom line: you don’t need the single column index but if you currently have it and want to drop it the human effort required to ensure that it can be dropped without side effects may make you decide to keep it anyway, especially if it doesn’t seem to be causing any concurrency or other performance overheads.  If you don’t have it yet, then you shouldn’t need to create it – though you might have to do some work to make sure that the optimizer takes full advantage of the two-column index.

Since I’m on the topic, I’ll add that the same arguments apply to a pair of indexes like (a, b, c) and (a, b); if you’ve got the longer index you shouldn’t need the shorter one; however, because the shorter index is a multi-column index, you might find that it’s beneficial to create a column group on that column combination so that the optimizer doesn’t lose information about the number of distinct values for the combination when you drop the index.

3 Comments »

  1. Jonathan,

    At one customer site I have been asked to locate redundant indexes and eventually drop them. I have then used the following SQL which is not mine (sorry I don’t remember who wrote it)

    SELECT
      *
    FROM
    (SELECT 
        tab_owner.name owner,
        t.name table_name,
        o1.name contained_index ,
        o2.name container_index
    FROM 
        sys.USER$ tab_owner,
        sys.OBJ$ t,
        sys.IND$ i1,
        sys.OBJ$ o1,
        sys.IND$ i2,
        sys.OBJ$ o2
    WHERE 
       i1.bo# = i2.bo#
      AND i1.obj# <> i2.obj#
      AND i2.cols >= i1.cols
      AND i1.cols  > 0
      AND i1.cols  =
      (SELECT   
        COUNT(1)
      FROM sys.ICOL$ cc1,
           sys.icol$ cc2
      WHERE 
          cc2.obj# = i2.obj#
      AND cc1.obj#   = i1.obj#
      AND cc2.pos#   = cc1.pos#
      AND cc2.COL#   = cc1.COL#
      )
    AND i1.obj#  = o1.obj#
    AND i2.obj#  = o2.obj#
    AND t.obj#   = i1.bo#
    AND t.owner# = tab_owner.USER#
    AND tab_owner.name = 'THE_OWNER'
    )
    WHERE contained_index NOT LIKE '%PK%' -- this is possible because of the client naming standard
    ORDER BY 1,2;
    

    which I have changed it a little bit to exclude primary keys and eventually unique key. The script should in fact be updated to use the uniqueness index property instead of relying on the client naming standard as I did above.

    This script gave us a bunch of contained indexes that we started by setting invisible and waiting a couple of weeks checking a possible collateral effect. When we did such an operation one particular index with a single column ind1(a) which that has been set invisible in profit of a container index of 6 columns (a,b,c,d,e,f), we have seen the CBO opting for a full table scan instead of using the container index occasioning a serious performance issue.

    The bottom line is that a container index might successfully cover a contained index provided the container index doesn’t include several other columns which will certainly damage its clustering factor, its effective selectivity and therefore its desirability

    Best regards
    Mohamed

    Comment by hourim — December 18, 2015 @ 3:53 pm GMT Dec 18,2015 | Reply

    • Mohamed,

      Thanks for describing your experience, and the sample SQL. If you re-read the post, though, you will see that you’re just re-iterating the points I made. Technically you could have done enough work to ensure that the single column index could have been dropped, but it would probably have been easy to decide that it wasn’t worth the effort to do so.

      Further to your point about uniqueness – a further modification (or a second query) would be needed to check whether any unique/PK constraints were protected by a non-unique index.

      “… its effective selectivity …” adding columns to the index won’t affect the selectivity, only the clustering_factor, leaf block count, and (though I didn’t mention it in the original note) the blevel; the step from 1 column to 6 could easily mean a change in height, which would be an undesirable thing to have to work around in terms of costing.

      Comment by Jonathan Lewis — December 18, 2015 @ 4:21 pm GMT Dec 18,2015 | Reply

  2. […] clustering of the data so, if you haven’t come across it before, you should examine the “table_cached_blocks” option that appeared in 11.2.0.4 for the procedure dbms_stats.set_table_prefs() as this allows you […]

    Pingback by Index Usage | Oracle Scratchpad — March 29, 2016 @ 10:53 am GMT Mar 29,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.