I wrote a short note last week that linked to a thread on the Russian Oracle forum about indexing, and if you’ve followed the thread you will have seen a demonstration that seemed to be proving the point that there were cases where an index rebuild would be beneficial.
Of course it’s not difficult to come up with cases where index rebuilds should make a difference – but it’s harder to come up with demonstrations that look realistic, so I thought I’d review the example to explain why it doesn’t really work as a good example of why you might need to think about rebuilding some production index.
The code is simple – create a table with an index, insert a batch of rows, check the index usage, repeat a few times, then rebuild the index and show that the space usage drops dramatically. Here’s the starting code (with a couple of minor changes):
create table testin (s varchar2(1000)); create index testin on testin (s) tablespace test_16k ; execute dbms_random.seed(0) insert into testin select dbms_random.string('P', 1000) s from dual connect by level <= 1000 ; commit; validate index testin; select height, blocks, lf_blks, lf_rows, lf_rows_len, pct_used from index_stats ;
You’ll note that the test index is using a blocksize of 16KB. Since the key value is 1,000 characters it does make sense to use one of the larger block size for the index although, to highlight the more common variation on that particular theme, it’s a good idea to think about using larger block sizes for index organized tables because a single index entry is often quite large and this can have an unpleasant effect on leaf block splits.
The call to dbms_random.string(‘P’,1000) generates a string of 1,000 characters selected from the full character set – which maximises the possible variation in values.
After the first validation, the code did the following:
insert into testin select ... commit; validate index testin; select ... from index_stats; insert into testin select ... commit; validate index testin; select ... from index_stats; alter index testin rebuild; validate index testin; select ... from index_stats;
This is what the four sets of results from index_stats looked like when I ran the test:
HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 128 99 1000 1013000 63 2 256 217 2000 2026000 58 2 384 320 3000 3039000 59 2 256 215 3000 3039000 87 -- rebuilt
Quite clearly the rebuild has had a significant effect on the index – the size has dropped by nearly 30 percent. But apart from the fact that it tells us that a rebuild can reduce the size of an index, does it tell us about anything that’s likely to happen in a real system ?
The answer is yes. It has given us a good indication that indexes with very long keys are likely to operate with a larger amount of free space per leaf block than indexes with short keys. The pct_used of around 60% is significantly lower than the value of around 70% that you might expect for the more general case b-tree index with random data arrival. (In fact when I ran the test using an 8KB block size the usage was closer to 50% – but then you can only get seven of these index entries into an 8KB block – giving a guaranteed wastage of 1KB).
Has it told us, though, that this type of index merits a rebuild ? Possibly not until you answer a few questions, of which the most important is probably: “What’s going to happen next on your production system ?”
We’ve started with 1,000 rows inserted at a time – is this going to go on for the lifetime of the index and, if so, how long is that lifetime ? Is our test unrepresentative because in “real life” we would be inserting 1,000 rows into a table holding 10,000,000 rows.
Perhaps, following an initial bit of batch loading, we’re going to be dribbling rows into the table in ones and twos – which, again, would make the test unrepresentative of the full lifetime of the index. The impact of an occasional big batch operation (often a delete rather than an insert) on an index is often an indicator that special measures should be taken – to the extent that sometimes we drop an index before a batch job and rebuild it afterwards.
Maybe we’re going to be deleting 1,000 rows rows at random from now on just before, or just after, we do the inserts – possibly with, or without, a commit in between. How would that affect any ideas we have about rebuilds ?
The fact that a rebuild of this index at this point in time gives a significant space saving doesn’t necessarily mean much in terms of the continued use and maintenance of the index. And just to show how potentially misleading this demonstration could be, let’s just repeat the load one more time, and see what the index looks like.
Here are two sets of results – one showing what happens on the next insert if you have rebuilt the index, the other showing what happens if you simply extend the test by one more cycle without rebuilding the index:
HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 256 215 3000 3039000 87 -- rebuilt 2 512 432 4000 4052000 58 -- after next insert HEIGHT BLOCKS LF_BLKS LF_ROWS LF_ROWS_LEN PCT_USED ---------- ---------- ---------- ---------- ----------- ---------- 2 384 320 3000 3039000 59 -- not rebuilt 2 512 413 4000 4052000 61 -- after next insert
Not only is the “final” index larger because you rebuilt it part way through the test – to get to that larger state you have done 217 index leaf blocks splits as you inserted the data, compared to 93 index leaf block splits in the case where you didn’t rebuild the index. As a rough guideline, that’s about 4.5MB of extra redo log generated (trivial in absolute terms, but a significant fraction of the total redo generated).
To me, the importance of the original demonstration is not that it shows us an index that gets smaller when we rebuild it, its importance lies in the fact that it reminds us that we have to think carefully about how well our tests represent real scenarios.