Oracle Scratchpad

August 3, 2017

Rebuilding Indexes

Filed under: Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:00 pm GMT Aug 3,2017

One of the special events that can make it necessary to rebuild an index is the case of the “massive DML”, typically a bulk delete that purges old data from a table. You may even find cases where it’s a good idea to mark a couple of your indexes as unusable before doing a massive delete and then rebuild them after the delete.

Despite the fact that a massive delete is an obvious special case it’s still not necessary in many cases to worry about a rebuild afterwards because the space made free by the delete will be smoothly reused over time with very little variation in performance. There is, however, one particular feature that increases the probability of a rebuild becoming necessary – global (or globally partitioned) indexes on partitioned tables. The problem (and the absence of problem in non-partitioned tables) is in the nature of the rowid.

For non-partitioned tables, and partitioned tables with local indexes, the rowid stored in an index is (assuming we’re thinking only of heap tables) stored as a sequence of 6 bytes consisting, in order, of: (tablespace relative file number, block number within file, row number within block). If the table is non-partitioned, or if this is an index segment from a locally partitioned index, all the index entries will be pointing to the same table segment and Oracle knows which segment that is from the data dictionary information – so Oracle can derive the data_object_id of the table segment and convert the tablespace relative file number into the absolute file number to navigate to the right row in the table.

When the index is global or globally partitioned any index entry may point to any of the table’s segments, so the rowid that is stored in the index is expanded by a further 4 bytes to hold the data_object_id of the table segment it is pointing to – and the data_object_id is the leading component: (data_object_id, tablespace relative file number, block number within file, row number within block). Think about what this means when you start to drop “old” partitions and add new partitions. Compare this with what happens when you simply delete a large volume of old data from a table and starting inserting new data. There’s an important difference to the way in which indexes will evolve.

Purging data

When you delete a large volume of data from a (simple, heap) table you will create a lot of empty space in a lot of existing table blocks. If that delete is supposed to get rid of “old” data (and to keep the description simple we’ll assume it’s the first time you’ve done this) it’s very likely that the delete will result in lots of empty blocks near the start of the table – rows that were inserted at the same time will tend to be physically close to each other in the table. This means that future inserts will soon start to reuse those table blocks. Think about what this means for index entries – especially for non-unique keys.

Assume you have 100 rows with value ‘XXX’ for an indexed column. Breaking the rowid into its component parts the index entries will be (‘XXX’,{file_id, block_id, row_number}).  Now assume you delete the oldest 10 rows then, over time, insert 10 more rows with the same key value. You’ve deleted the 10 index entries with the lowest values for (file_id, block_id) but the space that’s become available in the table will be in and around exactly that range of blocks – so the new index entries will probably end up looking very similar to the deleted index entries and inserted in and around the existing index entries for value ‘XXX’, so over time the index is unlikely to allocate much new space.

Now think about what happens when your table it partitioned but the index is global; your index entries are (‘XXX’,{data_object_id, file_id, block_id, row_number}). When you drop the oldest partition you will probably[1] delete all the index entries with the lowest data_object_id. When you start inserting new rows for ‘XXX’ the new table partition will have a data_object_id that is going to be higher than any previous data_object_id – which means you’re going to be inserting rows into the right-hand (high-value) edge of this section of the index. In some cases – typically those where you have a couple of leaf blocks per key value – the index may end up growing significantly because the insertion point for rows in the new partition isn’t in the leaf block with the available space, and it won’t be until you’ve done a few more bulk deletes and the leaf blocks relating to the oldest table partitions become completely empty that the space can be reused.

An example of this type of behaviour probably appeared on the OTN database forum quite recently.  Of course, there are various reasons why indexes can become inefficient, and the degree of inefficiency may only become noticeable over a fairly long period of time; moreover there are various reasons why global indexes are a little problematic, and various reasons why a bulk delete (which is what executing “alter table drop partition” does to a global index) has unpleasant side effects dependent somewhat on the number (and size) of the partitions and on how many you try to drop in one go.

There’s not  a lot you can do about this quirk of global indexes, but it’s always worth taking extra care with partitioned tables and focusing even more carefully on a strategic review of indexes:

  • Does this index really need to exist at all
  • Could this index be replaced by a selective function-based index
  • Does this index really need to be global / globally partitioned
  • How big is this index compared to the size it ought to be
  • Should this index be (basic) compressed
  • Is this index likely to be disrupted by a historic purge – or is there another reason for its undesirable behaviour

 

[1] probably delete entries with the lowest data_object_id” – I have to say this because if you’ve executed a “move partition” at any time a new data_object_id will have been generated for the partition, so the oldest partition could, in principal, have the highest data_object_id. The issue of changing data_object_ids brings a whole new level of complexity to global indexes – but only in a few special cases, fortunately.

 

 

9 Comments »

  1. Dear Jonathan,

    I’m working at a software company that sells a MDM solution relying on Oracle for the data persistance.
    In the early stages of the typical project, we sometimes have to purge the data for a set of entities to perform a full reload using new matching clauses (two customer records could be duplicates if they have the same phonetized name and at least a phone number and an email address in common …) or new business rules for standardisation and / or validation.

    The typical purge is done with truncate table statements.
    I’ve noticed that the truncate table DDL does not trigger an index rebuild, and forgetting to do a manual rebuild can result in a very noticeable performance decrease for the subsequent loads, regardless of whether the table is a partitioned one or not.
    I got used to it and always do a full index rebuild for the truncated tables, but I was kindof expecting a truncate to reposition the highwatermark for the indexes as well (this might be a misunderstanding of what the truncate does behind the scenes, though).
    Would the drop / reuse storage clause make a difference ? Is there a smart way to do it ?

    Any comments welcome

    Comment by Hilaire GODINOT — August 3, 2017 @ 1:28 pm GMT Aug 3,2017 | Reply

    • Hilaire,

      You don’t say which version of Oracle you are using, but a truncate of a table will truncate the index and release all but the minimum allocation of extents. Here’s a cut-n-paste from 11.2.0.4:

      
      SQL> select segment_name, extents from user_segments;
      
      SEGMENT_NAME                 EXTENTS
      ------------------------- ----------
      T1                                15
      T1_I1                             17
      
      2 rows selected.
      
      SQL> truncate table t1;
      
      Table truncated.
      
      SQL> select segment_name, extents from user_segments;
      
      SEGMENT_NAME                 EXTENTS
      ------------------------- ----------
      T1                                 1
      T1_I1                              1
      
      

      I wonder if the problem you’re describing with the reload is something different. When you mark an index as unusable and then truncate the table the index automatically becomes usable again. If you’re going through this sequence of steps then your reload may be slow because the index IS usable.

      Another guess: you’re doing a truncate on a partition of a partitioned table without using the “update indexes” (or “update global indexes” if on an older version of Oracle). If you fail to include the “update indexes” then a global (or globally partitioned) index will become invalid. Even if you do specify the “update indexes” the update is simply a bulk delete from the index that (being generic) won’t shrink the segment and drop the high water mark.

      Comment by Jonathan Lewis — August 3, 2017 @ 3:45 pm GMT Aug 3,2017 | Reply

  2. Hi Jonathan, in the case of global indexes.

    Wouldn’t dropping a partition delete an entire continuous range of entries in the global index, resulting in one very large “hole” with all empty blocks. I would imagine such an operation would result in only 2 partially complete blocks – one at the “start” and one at the “end”, thus a lot of those would be reusable. Speaking strictly about leaf blocks here.

    Given your post, there must be something else happening that I am missing. What is it?

    Comment by Christo Kutrovsky — August 3, 2017 @ 3:23 pm GMT Aug 3,2017 | Reply

    • Christo,

      Think about an orders table partitioned by order date, with a global index on client_code.
      Dropping the oldest orders partition would delete the first few index entries from every client across the entire width of the index.

      As with all cases of indexing – the columns you’re indexing and the arrival pattern of the distinct values makes a big difference to the effects you see.

      Comment by Jonathan Lewis — August 3, 2017 @ 3:33 pm GMT Aug 3,2017 | Reply

  3. I can give you an example where index rebuilding certainly provides benefits.
    I manage a large OLTP database where many tables are populated with millions of rows per day, but older records in the tables are deleted at the rate of millions of rows per day, keeping the tables at a more or less constant size.
    The primary keys on these tables are populated from sequences and therefore always increase in value. Also, many of these tables are referenced by multiple other tables which have indexed foreign key columns pointing to these primary keys.

    For example we have table A with primary key A_ID. 10 million inserts per day go into this table.
    Once per day all records over 40 days old are deleted in a batch job so each time this job runs the 10 million rows with the lowest A_IDs will be deleted

    Eight other tables have indexed foreign key columns which reference A_ID, and older records in these tables are also deleted once per day before table A has its records deleted.
    Over time, the job which deletes records from table A becomes slower and slower and at the end of the month is typically five times slower than at the start of the month, even though the table size and the number of rows being deleted each day, is more or less constant.

    Rebuilding the indexes on the eight indexed foreign key columns instantly speeds up the deletes on table A again.

    Although the rebuilding of the indexes makes no noticeable difference to SELECT statements which use the index, it does make a massive difference to DML operations such as the one described above.

    Comment by Peter — August 18, 2017 @ 4:16 pm GMT Aug 18,2017 | Reply

    • Peter,

      Thanks for the observation.

      It has to be said that if something odd is going to happen to an index it’s likely to be an index where the data arrives in monotonic order – i.e. like a timestamp of sequence number. It’s easy to forget, though, that if you have a foreign key to such a primary key it’s fairly likely that the foreign key index is likely to be building in a very similar fashion.

      As far as odd side effects from a massive delete at the bottom end – it’s possible that that’s related to the way in which the delete is carried out, the timing of commits, the size of the SGA, the version of Oracle, and whether the tablespace uses ASSM or freelist management. There have been a number of problems with ASSM and large deletes – in particular relating to the problem of WHEN to update the bitmap blocks to flag the leaf blocks as empty. (The same problems have appeared for tables, but then the flagging is about the 25/50/75% states, rather than “available for reuse elsewhere”.

      The problem is that if you update the bitmaps as the delete progresses you end up with lots of leaf blocks apparently free for other processes to re-use when actually they’re still in use and could suffer a big rollback; on the other hand you can’t mark them on the commit because you don’t want to spend the time on a slow commit (so you could do a delayed logging block cleanout on some of them); which means you do something on the next delayed block cleanout – which never happens in a sequential index because you never update the low values, you only read through them (on the next big delete).

      If you’re on a very recent version of Oracle I think it would be possible to recode your big delete so that you didn’t have to do a rebuild; unfortunately I’ve never been asked to address that particular problem so I don’t know what the best steps would be. If you ever get into a position where you need to do something about the time it takes to do the delete and rebuild drop me a line, I’d be quite interested to see what’s happening – I’d even consider a no-fix, no-fee deal if I could write up the problem afterwards.

      Couple of thoughts –
      Do you commit after each delete of the child tables, or do just one commit at the end.
      Do you loop for each child table deleting and commiting every N rows (same question regarding the parent).
      Do you delete in index ascending or descending order, or through a tablescan
      Does the process slow down fairly uniformly over the month, or is there a sudden step change
      Do the indexes grow in size as the month progresses – uniformly or a couple of jumps

      No need to answer, but these are all details that might be relevant, or give some clues, about what’s going on.

      Comment by Jonathan Lewis — August 19, 2017 @ 9:46 am GMT Aug 19,2017 | Reply

      • Jonathan, thanks for your comments.

        In answer to some of your questions:

        This is Oracle EE 11.2.0.4 running on a Solaris machine.

        The child tables are deleted one at a time and for each table the rows are deleted in one go through a tablescan (around 1 or 2 million rows are deleted for each of the child tables and these deletes take a couple of minutes each).

        For various reasons the parent table is dealt with slightly differently. A cursor is declared which will select all 10 million rows and then the deletes are done one at a time by primary key in a loop, committing after each 50,000 deletes. The time for each batch of 50,000 deletes is around 1 minute at the start of the month but then there is a sudden step change after a few weeks and this increases to 5 or more minutes and remains that way on subsequent days, or gets slightly worse.
        The child table indexes themselves grow in size steadily as the month progresses though.

        However, as soon as the indexes on the child tables are rebuilt, the parent table deletes revert to 1 minute per 50,000 deletes. This behaviour has been observed pretty consistently for over a year.

        It’s not actually a problem – the indexes are rebuilt online, so there is no downtime, and the deletes tick away in the background each day without affecting the system as a whole (the only downside is the volume of redo generated, which in turn affects the size of the backups of course).

        I was really just posting this to highlight one example where rebuilding indexes can make a big difference – the do we/don’t we discussions on index rebuilds will rumble on for years I guess.

        In my opinion it depends – horses for courses…

        Comment by Peter — August 19, 2017 @ 2:44 pm GMT Aug 19,2017 | Reply

        • Peter,

          Thanks for the description – the sudden step change is an interesting detail.

          If it’s safe to assume that all the child rows for a given primary key are inserted at the same time as the parent row then the continuous growth of the foreign key indexes suggests that part of the problem is Oracle’s failure mark the space management bitmaps in a timely fashion; the step change then has the flavour of a switch during the PK delete from “delayed logging block cleanout” of the foreign key index leaf blocks to “delayed block cleanout” – which could be confirmed by comparing the session stats from a slow 50,000 with a fast 50,000.

          You may find, by the way, that you can reduce the redo generated in the child table deletes by driving the delete through the index: https://jonathanlewis.wordpress.com/2006/11/22/tuning-updates/

          I am a little surprised by the choice to delete primary key rows one at a time: it makes sense as a defensive measure against an accidental appearance of child row that shouldn’t exist, but I’d have considered a “forall” delete with “bulk exceptions” a couple of hundred items at a time (driven in primary key order) – something like this insert: https://jonathanlewis.wordpress.com/2016/06/24/never/

          I suspect that with a little re-engineering you wouldn’t need to rebuild the indexes; but if you’ve got a process that works well enough with no immediate threat or likely future threat then there’s no point in spending effort trying to work out if you could do it better – “if it ain’t broke don’t fix it” is a particularly relevant mantra for IT (though the longer version would include: “if it ain’t going to break any time soon, don’t try to fix it yet”, of course).

          Nice to have a concrete example, by the way – too many of the “arguments” in favour of index rebuilds used to come out as “well I rebuilt all my indexes and everything went faster” with no clues about which rebuilds might have been the interesting ones.

          Comment by Jonathan Lewis — August 19, 2017 @ 4:56 pm GMT Aug 19,2017


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.