This note is part four of a four-part series, and covers Index fragmentation. The whole series is as follows
- Introduction – with links to parts 2 – 4
- Disk and Tablespace Fragmentation
- Table Fragmentation
- Index Fragmentation – this bit
4. Index “fragmentation”.
The multiple extent and ASSM “fragmentation” that I described in the previous article about table fragmentation applies equally well to indexes, of course, and matters in just the same way – i.e. hardly ever.
When people speak of “index fragmentation” they are usually thinking of the problem of “sparsely populated blocks” - which is also a phenomenon I described in the notes about table fragmentation although there are differences between tables and indexes that we shall consider shortly. There is another interpretation of “index fragmentation”, though, which can be worth considering – the side effects of leaf block splitting which results in “logically adjacent” leaf blocks being widely physically scattered; and we shall also be considering that in the following notes.
We’ll start with bulk deletion and review the same representative cases that we considered with tables (i.e. all the rows deleted from 20% of the block and 20% of the rows deleted from all the blocks) – and when we do this we have to remember that there is a difference between index deletions and table deletions that makes the subsequent behaviour different. When a transaction deletes a row from a table it can reduce the row to a stub of just a few bytes before the commit takes place and immediately make use of the space this reduction frees up in the table block; when a transaction deletes a row from an index it has to leave the entire index entry in place and flag it as deleted – it cannot immediately reuse the space, it has to wait until after the commit. (I posted an example demonstrating this difference a few months ago.)
Another critical difference between tables and indexes is that an index entry has to go in the right place; so when an index block has some free space but is not completely empty the only rows that can go into that block are rows that belong in exactly that part of the index. Moreover, when the leaf block becomes empty it stays linked into the same position in the index even though it has also been linked into the freelist. (My guess is that this probably makes it easier to deal with problems of read-consistency, but it may also relate to problems of rolling back and the cost of changing the three links in the index structure.) So when we look at an index which has had a very large delete at the leading edge, a query for “lowest value” may have to do a range scan through a very large number of empty leaf-blocks before it finds the leaf block that contains any current data – which is why we should always consider the option for a regular coalesce on an index where we regularly delete a lot of data from the low end or, in the more generic though less common case, where we delete a large number of consecutive values anywhere in the index.
In the more general case of a bulk delete we could leave quite a lot empty space in every leaf block which, unlike table free space, we cannot make Oracle reuse by picking a cunning value for pctused because that storage parameter isn’t relevant to indexes. So the particular question we have to consider for indexes is this: how much impact might this empty space have on the application.
The usual considerations of “more space to back up” and “more blocks to keep in the buffer” apply, of course, but we ought to ask if there could be a more significant direct impact on performance from a large number of sparsely populated leaf blocks. The answer is dependent on the application, of course, but we typically use indexes to collate key values and pack them into a small space – and with this thought in mind we can note that the bulk of the work done by many queries comes from visiting the table after collecting a number of key values from an index; consequently the extra work introduced by indexes with a significant amount of empty space per leaf block may be a small fraction of the total work of a query, and we may therefore decide that we don’t want to invest resources in rebuilding an index unless it has become very sparsely populated. (A typical “random arrival” B-tree index will run at an average of about 70% utilisation, 30% free space, per leaf block; I don’t get particularly worried about index related performance until an index gets down to about 50% utilisation unless I have direct evidence that the index is contributing a significant amount of time to a critical set of queries.)
There are two further issues of “fragmentation” in indexes, though, that do not apply to tables. The first is that you do not update index entries – you delete an entry for the old value and insert an entry for the new value. If the changes are effectively random then this shouldn’t cause any of the problems that you associate with bulk deletions; but if you can see some sort of “time-based” pattern to the update – for example if you have an index on a last_updated column then you could end up suffering the worst possible effects of a sparsely populated index. In this type of case you would be (slowly) deleting entries from blocks towards the low end of the index and inserting them in the block at the high end of the index; and the empty space produced by the deletion would not be subject to reuse because rows could not be updated back into the past; moreover if you keep moving rows from the past to the future you keep visiting leaf blocks which are sparsely populated, and if this is an OLTP system where users update one or two rows at a time the seek and update to the index blocks can become a significant fraction of the work done on each update statement. At the very least you need to be aware of this pattern of activity so that you can decide how to measure the impact on performance and create a strategy for dealing with the threat.
The other fragmentation issue with indexes is one where the word “fragmentation” really feels more appropriate. It relates to leaf block splits. If you want to insert an entry into an index leaf block that’s full Oracle has to find a new empty block somewhere, copy roughly half the data from the current leaf block into it and link the block into the existing index structure in the right place. As a consequence, blocks which are “logically adjacent” in the index are not necessarily “physically adjacent”, which means that when you do a large index range scan (or index full scan) you could actually end up doing lots of single block random reads.
This is where SQL Server (and probably Sybase and possibly DB2) become significant. The way that SQL Server deals with free space management for heap tables is not very good, so it is almost an article of faith (dogma even) in SQL Server that all tables should be built using “clustered indexes” – which (in Oracle terms) means that all tables are effectively index organized tables (IOTs). If you’ve tried to cluster your data, and thought about it carefully and deliberately, then leaf block splits are destroying your efforts to keep related data items together. So it’s not really surprising that DBAs with a background in SQL Server (and Sybase and DB2) are so keen on the idea of rebuilding indexes all the time; if you’re rebuilding a clustered index you’re putting the row (i.e. table) data back where you wanted it. (Conveniently this doesn’t require all the other indexes on the table to rebuilt because, like secondary indexes on Oracle’s IOTs, the other indexes in SQL Server (et. al.) will be using the unique (or “uniquified”) key as the row identifier. )
From an Oracle perspective this type of fragmentation generally doesn’t matter – provided you’re thinking about standard B-tree indexes – because, as noted above, most queries will do most of their work visiting the table. But the SQL Server paradigm gives you a clue about when you ought to consider the effects of “fragmentation” and the requirement for index rebuilds more carefully. If, as an Oracle DBA, you have created a table as an index organized table (IOT) then you probably had a good reason for making that choice, and that reason was probably to ensure that data arriving in one order was stored in a different order so that related data items were stored together.
If you’ve created an IOT to keep the data together, then leaf block splits will cause to the data to become a little scattered. Before you worry too much about this, think carefully about the importance you are going to attach to this scattering, and the (possibly marginal) benefit of doing something about it. For purposes of illustration, imagine that a typical query against an important IOT collects 200 rows of 200 bytes. As a heap table this might have required you to read 200 randomly scattered separate blocks, so you decided to implement the table as an IOT. Assuming the worst level of leaf block splits (50/50 with no back-fill appearing) then the 200 rows will go into the IOT at about 20 rows per block for a total of 10 leaf blocks. Due to the timing of leaf block splits it is safe to assume that these 10 blocks will end up scattered fairly randomly throughout the index segment. If you rebuild the index you will be able to pack the data into just five blocks, and those five blocks will (usually) be adjacent blocks in the segment rather than being scattered – and this adjacency probably means you get a little extra performance benefit if the index range scan has to go to disc. (Note – SQL Server operates with extent sizes of 8 blocks of 8KB and the database software is able to co-operate with the operating system to negotiate a readahead of an entire extent in situations like this: this combination of details makes index rebuilds in these circumstance rather more beneficial to SQL Server than it would to Oracle.)
Once you’ve spent a little time thinking about scenarios like this it is much easier to understand how to address the question of how much benefit you might get from rebuilding an IOT. What is the reduction in the total number of block visits you will have to make; how many of those block visits would have been separate disc reads; and how much benefit might you get from Oracle or the operating system, or the hardware drivers, being able to implement some form of readahead to reduce the time it takes to fetch those blocks. (Bear in mind, by the way, an important comment I made in the article on disk fragmentation – just because two blocks appear to be adjacent blocks as far as the Oracle data file is concerned it’s possible that attempts to introduce striping and load balancing at multiple layers will result in the two blocks being on different discs).
It is difficult to create algorithms to deal with free space management in all combinations of circumstances, especially when you are trying to support complex code for handling read-consistency at the same time. Consequently there are cases where unexpected, and expensive, side effects appear in the Oracle code path. See this posting for one such case (or two if you follow the link in the article).
I have written several notes about a problem with leaf block splits that used to appear occasionally in systems that used sequence-generated synthetic keys in an environment with a large number of concurrent processes. This problem could be bypassed very easily in earlier versions of Oracle, but in 10g it seems to appear more frequently (perhaps because of the increase in the number of very busy systems, possibly because of other efficiencies in Oracle’s code path) and the historic correction to the problem is no longer viable. The problem results in sequence-based indexes running at roughly 25% space efficiency – and until Oracle reinstates the maxtrans parameter for indexes, there isn’t much you can do about it that will be approved by Oracle support.