Oracle Scratchpad

July 5, 2013

Wrong Index

Filed under: Indexing,Oracle,Statistics — Jonathan Lewis @ 8:00 pm GMT Jul 5,2013

One of the sad things about trying to keep on top of Oracle is that there are so many little things that could go wrong and take a long time to identify. In part this is why I try to accumulate test cases for all the oddities and anomalies I come across as I travel around the world – if I’ve spent the time recreating a problem I’ll probably remember it the next time I see the symptoms.

Here’s a little threat that comes into play when a couple of events occur simultaneously, in this case: automatically selected indexes being rebuilt combined with an unfortunate choice of index definitions. Here’s a demonstration (running 11.2.0.3, 1MB uniform extents, 8KB block size, freelist management – first the symptoms, script, followed by results:


set autotrace traceonly explain

select * from t1 where skew = 100;
alter index t1_skew rebuild;
select * from t1 where skew = 100;

set autotrace off

Execution plan before the rebuild
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   126 | 58338 |   127   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   126 | 58338 |   127   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_SKEW |   126 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SKEW"=100)

Execution plan after the rebuild
-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   126 | 58338 |   128   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1          |   126 | 58338 |   128   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_SKEW_MOD |   126 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SKEW"=100)

So I have an access path that uses a certain index, I rebuild the because my analytical code tells me it needs to be rebuilt, and the optimizer starts to use a different index as a result. If you’re wondering about the two indexes, and whether the second index is supposed to be the better choice anyway, here are the definitions:

SQL> select
  2  	index_name, column_name
  3  from
  4  	user_ind_columns
  5  where
  6  	table_name = 'T1'
  7  order by
  8  	index_name, column_position
  9  ;

INDEX_NAME           COLUMN_NAME
-------------------- --------------------
T1_SKEW              SKEW

T1_SKEW_MOD          SKEW
                     MOD_100

3 rows selected.

The switch took us from a single column index to a multi-column index with the same leading edge. It’s definitely not the right index for this query; apart from the fact that it has more columns and (all other things being equal) will clearly be physically larger, the additional columns will mean that the clustering_factor could easily be larger with the rows visited in a different order which would therefore increase the buffer cache latch activity.

When something like this happens the issue is almost inevitably going to be “statistics”. Sometimes it’s easy to see where the numbers have gone wrong, sometimes it’s difficult; sometimes it’s easy but counter-intuitive – which makes it harder than it really is.

In this case the issue is quite straight forward – I’ve added a lot of data to the table since I last collected stats – the definitions of the indexes, combined with the way the data was added. meant that the smaller index because less efficient than the larger index in terms of the average free space in each leaf block, and this was sufficient (in my hypothetical system) to trigger a rebuild of one index but not the other. But an index rebuild automatically collects stats, and the new stats tell the optimizer that the correct index has become much larger – so the optimizer has picked the smaller, incorrect index. If I now rebuild the second index you can see how things change – here are the plans for the two indexes when they have both been rebuilt and I hint each index in turn:

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   126 | 58338 |   252   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   126 | 58338 |   252   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | T1_SKEW |   251 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SKEW"=100)

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   126 | 58338 |   253   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1          |   126 | 58338 |   253   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | T1_SKEW_MOD |   251 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SKEW"=100)

As you can see, when both indexes have been rebuilt Oracle thinks the second index is slightly more expensive than the first – so we would go back to using the correct index by default. Notice, however, that we have now exposed another problem – according to the index stats we would be fetching 251 rows from the table, but the row estimate when we look at the table operations shows 126 rows. When we decide to rebuild one index on a table, not only do we have to consider rebuilding EVERY index on that table, we may have to bring the table stats up to date at the same time to avoid the risk of destabilising the system.

This set of results just happened to drop out when I was trying to build a model for another problem – but it demonstrates an issue that I have come across from time to time. Luckily it’s not a problem that appears very frequently, there are a couple of things conspiring to cause the issue (and in my example the stats were really borderline anyway). The point of the article, though, is to remind you that this type of thing really can happen, and when you’re looking at tasks like defining new indexes, rebuilding old indexes, or even just being picky about which stats to collect – you need to think a little carefully about side effects.

Test Case

If you want to try building my sample data set, here’s the code. As I said, I was trying to address a completely different issue which is why the pattern generation is a little unusual.

begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',10);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
exception
	when others then null;
end;

create table t1 (
	skew, id, mod_100, padding
)
as
select
	r1,
	rownum			id,
	mod(rownum-1,100)	mod_100,
	rpad('x',450)
from
(
	select /*+ no_merge */
		rownum r1
	from all_objects
	where rownum <= 250
)	v1,
(
	select /*+ no_merge */
		rownum r2
	from all_objects
	where rownum <= 250
)	v2
where
	r2 <= r1
order by
	r2,r1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

end;
/

create index t1_skew on t1(skew);
create index t1_skew_mod on t1(skew, mod_100);

5 Comments »

  1. Most excellent, my friend. And your reminder one may need to update all the stats for all the indexes as well as the table is a key lesson. In fact, in the case where more tables and indexes are involved, it might cascade to join choices as well and hilarity may indeed ensue. Your example also kicked off a gedanken experiment: Now what if the mod100 column were column 255? (Does Oracle figure out on a select * it might be cheaper to use the bigger index to avoid the extra row piece? Does it depend on how many second row pieces are in different blocks? NOW see where you’ve sent me thinking! Fortunately the homegrown, home picked, home baked cherry pie is ready and that will distract me from this train of thought.)

    Comment by rsiz — July 5, 2013 @ 10:02 pm GMT Jul 5,2013 | Reply

    • Mark,
      Tempting questions – and I had to start checking one of them.

      The optimimzer adds a CPU cost for each column it has to skip over to select the columns it wants from a row. The cost may be platform dependent, but on the first instance I tested it seemed to be 20 CPU operations per column – even when the skip was from the first row piece to the second row piece (no extra CPU cost for locating a row piece).

      Even though multiple row pieces don’t seem to have a special costing effect, the fact that further down the row means higher CPU cost, and CPU cost equates to I/O cost, you could imagine an example of something like: select col1, col1000 from t1 where col1 = ‘X’ and col1000 = ‘Y’ where Oracle chooses an index on (col1000) over an index on (col1) because that path avoids the incremental cost of skipping 999 columns.

      Footnote: Gedanken Experiment (note – this link requires knowledge of some colloquial American)

      Comment by Jonathan Lewis — July 5, 2013 @ 10:36 pm GMT Jul 5,2013 | Reply

  2. I think some of the lines in the test case are inadvertently stripped out.

    create table t1 (
    skew, id, mod_100, padding
    )
    as
    select
    r1,
    rownum id,
    mod(rownum-1,100) mod_100,
    rpad(‘x’,450)
    from
    (
    select /*+ no_merge */
    rownum r1
    from all_objects
    where rownum <= 250
    ) v1,
    (
    select /*+ no_merge */
    rownum r2
    from all_objects
    where rownum <= 250
    ) v2
    where
    r2 user,
    tabname => ‘T1′,
    method_opt => ‘for all columns size 1′
    );
    end;
    /

    Comment by Vishal Gupta — July 5, 2013 @ 10:15 pm GMT Jul 5,2013 | Reply

  3. […] couple of days ago I wrote an article about Oracle picking the “wrong index” after an index rebuild, and I mentioned that the sample data I had generated looked a little odd […]

    Pingback by Wrong Index 2 | Oracle Scratchpad — July 12, 2013 @ 5:18 pm GMT Jul 12,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,524 other followers