Oracle Scratchpad

August 3, 2017

Rebuilding Indexes

Filed under: Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:00 pm BST Aug 3,2017

One of the special events that can make it necessary to rebuild an index is the case of the “massive DML”, typically a bulk delete that purges old data from a table. You may even find cases where it’s a good idea to mark a couple of your indexes as unusable before doing a massive delete and then rebuild them after the delete.

Despite the fact that a massive delete is an obvious special case it’s still not necessary in many cases to worry about a rebuild afterwards because the space made free by the delete will be smoothly reused over time with very little variation in performance. There is, however, one particular feature that increases the probability of a rebuild becoming necessary – global (or globally partitioned) indexes on partitioned tables. The problem (and the absence of problem in non-partitioned tables) is in the nature of the rowid.

For non-partitioned tables, and partitioned tables with local indexes, the rowid stored in an index is (assuming we’re thinking only of heap tables) stored as a sequence of 6 bytes consisting, in order, of: (tablespace relative file number, block number within file, row number within block). If the table is non-partitioned, or if this is an index segment from a locally partitioned index, all the index entries will be pointing to the same table segment and Oracle knows which segment that is from the data dictionary information – so Oracle can derive the data_object_id of the table segment and convert the tablespace relative file number into the absolute file number to navigate to the right row in the table.

When the index is global or globally partitioned any index entry may point to any of the table’s segments, so the rowid that is stored in the index is expanded by a further 4 bytes to hold the data_object_id of the table segment it is pointing to – and the data_object_id is the leading component: (data_object_id, tablespace relative file number, block number within file, row number within block). Think about what this means when you start to drop “old” partitions and add new partitions. Compare this with what happens when you simply delete a large volume of old data from a table and starting inserting new data. There’s an important difference to the way in which indexes will evolve.

Purging data

When you delete a large volume of data from a (simple, heap) table you will create a lot of empty space in a lot of existing table blocks. If that delete is supposed to get rid of “old” data (and to keep the description simple we’ll assume it’s the first time you’ve done this) it’s very likely that the delete will result in lots of empty blocks near the start of the table – rows that were inserted at the same time will tend to be physically close to each other in the table. This means that future inserts will soon start to reuse those table blocks. Think about what this means for index entries – especially for non-unique keys.

Assume you have 100 rows with value ‘XXX’ for an indexed column. Breaking the rowid into its component parts the index entries will be (‘XXX’,{file_id, block_id, row_number}).  Now assume you delete the oldest 10 rows then, over time, insert 10 more rows with the same key value. You’ve deleted the 10 index entries with the lowest values for (file_id, block_id) but the space that’s become available in the table will be in and around exactly that range of blocks – so the new index entries will probably end up looking very similar to the deleted index entries and inserted in and around the existing index entries for value ‘XXX’, so over time the index is unlikely to allocate much new space.

Now think about what happens when your table it partitioned but the index is global; your index entries are (‘XXX’,{data_object_id, file_id, block_id, row_number}). When you drop the oldest partition you will probably[1] delete all the index entries with the lowest data_object_id. When you start inserting new rows for ‘XXX’ the new table partition will have a data_object_id that is going to be higher than any previous data_object_id – which means you’re going to be inserting rows into the right-hand (high-value) edge of this section of the index. In some cases – typically those where you have a couple of leaf blocks per key value – the index may end up growing significantly because the insertion point for rows in the new partition isn’t in the leaf block with the available space, and it won’t be until you’ve done a few more bulk deletes and the leaf blocks relating to the oldest table partitions become completely empty that the space can be reused.

An example of this type of behaviour probably appeared on the OTN database forum quite recently.  Of course, there are various reasons why indexes can become inefficient, and the degree of inefficiency may only become noticeable over a fairly long period of time; moreover there are various reasons why global indexes are a little problematic, and various reasons why a bulk delete (which is what executing “alter table drop partition” does to a global index) has unpleasant side effects dependent somewhat on the number (and size) of the partitions and on how many you try to drop in one go.

There’s not  a lot you can do about this quirk of global indexes, but it’s always worth taking extra care with partitioned tables and focusing even more carefully on a strategic review of indexes:

  • Does this index really need to exist at all
  • Could this index be replaced by a selective function-based index
  • Does this index really need to be global / globally partitioned
  • How big is this index compared to the size it ought to be
  • Should this index be (basic) compressed
  • Is this index likely to be disrupted by a historic purge – or is there another reason for its undesirable behaviour

 

[1] probably delete entries with the lowest data_object_id” – I have to say this because if you’ve executed a “move partition” at any time a new data_object_id will have been generated for the partition, so the oldest partition could, in principal, have the highest data_object_id. The issue of changing data_object_ids brings a whole new level of complexity to global indexes – but only in a few special cases, fortunately.

 

 

5 Comments »

  1. Dear Jonathan,

    I’m working at a software company that sells a MDM solution relying on Oracle for the data persistance.
    In the early stages of the typical project, we sometimes have to purge the data for a set of entities to perform a full reload using new matching clauses (two customer records could be duplicates if they have the same phonetized name and at least a phone number and an email address in common …) or new business rules for standardisation and / or validation.

    The typical purge is done with truncate table statements.
    I’ve noticed that the truncate table DDL does not trigger an index rebuild, and forgetting to do a manual rebuild can result in a very noticeable performance decrease for the subsequent loads, regardless of whether the table is a partitioned one or not.
    I got used to it and always do a full index rebuild for the truncated tables, but I was kindof expecting a truncate to reposition the highwatermark for the indexes as well (this might be a misunderstanding of what the truncate does behind the scenes, though).
    Would the drop / reuse storage clause make a difference ? Is there a smart way to do it ?

    Any comments welcome

    Comment by Hilaire GODINOT — August 3, 2017 @ 1:28 pm BST Aug 3,2017 | Reply

    • Hilaire,

      You don’t say which version of Oracle you are using, but a truncate of a table will truncate the index and release all but the minimum allocation of extents. Here’s a cut-n-paste from 11.2.0.4:

      
      SQL> select segment_name, extents from user_segments;
      
      SEGMENT_NAME                 EXTENTS
      ------------------------- ----------
      T1                                15
      T1_I1                             17
      
      2 rows selected.
      
      SQL> truncate table t1;
      
      Table truncated.
      
      SQL> select segment_name, extents from user_segments;
      
      SEGMENT_NAME                 EXTENTS
      ------------------------- ----------
      T1                                 1
      T1_I1                              1
      
      

      I wonder if the problem you’re describing with the reload is something different. When you mark an index as unusable and then truncate the table the index automatically becomes usable again. If you’re going through this sequence of steps then your reload may be slow because the index IS usable.

      Another guess: you’re doing a truncate on a partition of a partitioned table without using the “update indexes” (or “update global indexes” if on an older version of Oracle). If you fail to include the “update indexes” then a global (or globally partitioned) index will become invalid. Even if you do specify the “update indexes” the update is simply a bulk delete from the index that (being generic) won’t shrink the segment and drop the high water mark.

      Comment by Jonathan Lewis — August 3, 2017 @ 3:45 pm BST Aug 3,2017 | Reply

  2. Hi Jonathan, in the case of global indexes.

    Wouldn’t dropping a partition delete an entire continuous range of entries in the global index, resulting in one very large “hole” with all empty blocks. I would imagine such an operation would result in only 2 partially complete blocks – one at the “start” and one at the “end”, thus a lot of those would be reusable. Speaking strictly about leaf blocks here.

    Given your post, there must be something else happening that I am missing. What is it?

    Comment by Christo Kutrovsky — August 3, 2017 @ 3:23 pm BST Aug 3,2017 | Reply

    • Christo,

      Think about an orders table partitioned by order date, with a global index on client_code.
      Dropping the oldest orders partition would delete the first few index entries from every client across the entire width of the index.

      As with all cases of indexing – the columns you’re indexing and the arrival pattern of the distinct values makes a big difference to the effects you see.

      Comment by Jonathan Lewis — August 3, 2017 @ 3:33 pm BST Aug 3,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.