Oracle Scratchpad

March 10, 2011

Index Rebuilds

Filed under: Index Rebuilds — Jonathan Lewis @ 6:15 pm BST Mar 10,2011

I wrote a short note last week that linked to a thread on the Russian Oracle forum about indexing, and if you’ve followed the thread you will have seen a demonstration that seemed to be proving the point that there were cases where an index rebuild would be beneficial.

Of course it’s not difficult to come up with cases where index rebuilds should make a difference – but it’s harder to come up with demonstrations that look realistic, so I thought I’d review the example to explain why it doesn’t really work as a good example of why you might need to think about rebuilding some production index.

The code is simple – create a table with an index, insert a batch of rows, check the index usage, repeat a few times, then rebuild the index and show that the space usage drops dramatically. Here’s the starting code (with a couple of minor changes):


create table testin (s varchar2(1000));

create index testin on testin (s) 
tablespace test_16k
;

execute dbms_random.seed(0)

insert into testin 
select 
	dbms_random.string('P', 1000) s 
from	dual 
connect by 
	level <= 1000
;

commit;

validate index testin;

select 
	height, blocks, lf_blks, lf_rows, lf_rows_len, pct_used
from 
	index_stats
;

You’ll note that the test index is using a blocksize of 16KB. Since the key value is 1,000 characters it does make sense to use one of the larger block size for the index although, to highlight the more common variation on that particular theme, it’s a good idea to think about using larger block sizes for index organized tables because a single index entry is often quite large and this can have an unpleasant effect on leaf block splits.

The call to dbms_random.string(‘P’,1000) generates a string of 1,000 characters selected from the full character set – which maximises the possible variation in values.

After the first validation, the code did the following:


insert into testin select ...
commit;
validate index testin;
select ... from  index_stats;

insert into testin select ...
commit;
validate index testin;
select ... from  index_stats;


alter index testin rebuild;
validate index testin;
select ... from  index_stats;

This is what the four sets of results from index_stats looked like when I ran the test:


    HEIGHT     BLOCKS    LF_BLKS    LF_ROWS LF_ROWS_LEN   PCT_USED
---------- ---------- ---------- ---------- ----------- ----------
         2        128         99       1000     1013000         63
         2        256        217       2000     2026000         58
         2        384        320       3000     3039000         59
         2        256        215       3000     3039000         87	-- rebuilt

Quite clearly the rebuild has had a significant effect on the index – the size has dropped by nearly 30 percent. But apart from the fact that it tells us that a rebuild can reduce the size of an index, does it tell us about anything that’s likely to happen in a real system ?

The answer is yes. It has given us a good indication that indexes with very long keys are likely to operate with a larger amount of free space per leaf block than indexes with short keys. The pct_used of around 60% is significantly lower than the value of around 70% that you might expect for the more general case b-tree index with random data arrival. (In fact when I ran the test using an 8KB block size the usage was closer to 50% – but then you can only get seven of these index entries into an 8KB block – giving a guaranteed wastage of 1KB).

Has it told us, though, that this type of index merits a rebuild ? Possibly not until you answer a few questions, of which the most important is probably: “What’s going to happen next on your production system ?”

We’ve started with 1,000 rows inserted at a time – is this going to go on for the lifetime of the index and, if so, how long is that lifetime ? Is our test unrepresentative because in “real life” we would be inserting 1,000 rows into a table holding 10,000,000 rows.

Perhaps, following an initial bit of batch loading, we’re going to be dribbling rows into the table in ones and twos – which, again, would make the test unrepresentative of the full lifetime of the index. The impact of an occasional big batch operation (often a delete rather than an insert) on an index is often an indicator that special measures should be taken – to the extent that sometimes we drop an index before a batch job and rebuild it afterwards.

Maybe we’re going to be deleting 1,000 rows rows at random from now on just before, or just after, we do the inserts – possibly with, or without, a commit in between. How would that affect any ideas we have about rebuilds ?

The fact that a rebuild of this index at this point in time gives a significant space saving doesn’t necessarily mean much in terms of the continued use and maintenance of the index. And just to show how potentially misleading this demonstration could be, let’s just repeat the load one more time, and see what the index looks like.

Here are two sets of results – one showing what happens on the next insert if you have rebuilt the index, the other showing what happens if you simply extend the test by one more cycle without rebuilding the index:


    HEIGHT     BLOCKS    LF_BLKS    LF_ROWS LF_ROWS_LEN   PCT_USED
---------- ---------- ---------- ---------- ----------- ----------
         2        256        215       3000     3039000         87	-- rebuilt
         2        512        432       4000     4052000         58	-- after next insert


    HEIGHT     BLOCKS    LF_BLKS    LF_ROWS LF_ROWS_LEN   PCT_USED
---------- ---------- ---------- ---------- ----------- ----------
         2        384        320       3000     3039000         59	-- not rebuilt
         2        512        413       4000     4052000         61	-- after next insert

Not only is the “final” index larger because you rebuilt it part way through the test – to get to that larger state you have done 217 index leaf blocks splits as you inserted the data, compared to 93 index leaf block splits in the case where you didn’t rebuild the index. As a rough guideline, that’s about 4.5MB of extra redo log generated (trivial in absolute terms, but a significant fraction of the total redo generated).

To me, the importance of the original demonstration is not that it shows us an index that gets smaller when we rebuild it, its importance lies in the fact that it reminds us that we have to think carefully about how well our tests represent real scenarios.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,507 other followers