Oracle Scratchpad

May 17, 2007

Data Cluster

Filed under: Indexing,Infrastructure,Statistics,Troubleshooting — Jonathan Lewis @ 9:53 pm BST May 17,2007

I received an email today containing the following question:

For packaged applications, like the Oracle EBS, some indexes tend to develop a high Clustering Factor over time, like the one shown below:

 
            BTREE    LEAF    DISTINCT  CLUSTERING       INDEX     TABLE       TABLE 
INDEX NAME  LEVEL  BLOCKS        KEYS      FACTOR    NUM ROWS  NUM BLKS    NUM ROWS 
----------  -----  ------  ----------  ----------  ----------  --------  ---------- 
XXX_PK          3  778150  77,842,100  17,163,350  77,842,100   865,805  77,043,200           

We have some queries that perform range scans on this index and run slower because they have to visit too many blocks from the table. How should one deal with this type of index?

This question raises a couple of important points that are worth reviewing.

As a starting detail: did you notice that the number of “rows” in the index was greater than the number of rows in the table. This can happen when you set a sample size less than 100% to gather stats, and use the “cascade” option. Oracle often uses a larger sample on the index than the one specified for the table, with the type of results that you see above.

This isn’t a big deal – unless the clustering_factor also exceeds the number of rows in the table,  at which point you may find that queries that should “obviously” be using the primary key index start using a slightly less desirable non-unique index. This is a side-effect of the cost calculation which uses the expression selectivity[tb_sel] * clustering_factor when assessing the suitability of the index.

If your primary key clustering_factor is larger than the number of rows in the table then, for a single column index, it will also be greater than the number of distinct values for the column, and the expression selectivity[tb_sel] * clustering_factor will be greater than one for the predicate primary_key_col = constant. If this happens then another index may appear to have  a lower cost than the primary key index.

Coming back to the original question though – what do you do when you have an index where the clustering_factor keeps growing. This question prompts [previously "begs" - see comment 6 below] an even more important question: is the clustering_value telling the truth? There is the pattern to the data scatter that really exists, and there is the pattern that the optimizer thinks exists (and then there may be a pattern to the data scatter that you would like to fool the optimizer into believing).

If your data is well clustered then the clustering_factor should be “small” and the optimizer has a higher probability of using the index. If the data is well clustered but, for some reason, the clustering_factor is “large” then the optimizer may well ignore the index when you know it would be sensible to use it.

If your data is very poorly clustered then the clustering_factor should be “large” and the optimzer is likely to use the index only in special cases (such as unique scan, index-only, or bitmap conversion paths). If your data is very poorly clustered but the clustering_factor is “small” – possibly because you’ve hacked it – then the optimizer may well use the index when it shouldn’t and perform an excessive amount of (logical) I/O.

In this example, the clustering_factor is quite large, and the user is complaining that queries using index range scans are running slowly because Oracle is visiting too many blocks from the table.

This means there probably isn’t a problem with the clustering_factor – if you have to go to the table for the data, you have to go to the table for the data. Although the clustering factor may, in principle, be too small, the fact that we work harder and harder as time passes (i.e. as the data sizes grow) means the index has not been defined appropriately for the queries where it is being used.

As a simple example to demonstrate this point, consider a query involving customer data. We always ask for data for a customer for the last 4 weeks.

 
    select  hst.* 
    from    customer_history hst 
    where   hst.customer_id = 12345 
    and     hst.tx_date > trunc(sysdate) - 28 
    order by 
            hst.tx_date desc 
    ; 

If we have an index just on the (customer_id) column, then as time passes Oracle has to visit more and more table blocks, discarding data that is too old only after it has visited the table.

If we have an index on (customer_id, tx_date), then the work done visiting the table will stay constant as time passes, because the volume of data we examine is independent of the size of the history we have generated.

Of course, if we can recreate the table in our example as an index-cluster on (customer_id), we can minimise the number of excess blocks we visit (although we will still be visiting excessive numbers of rows) and this may be of some benefit.

Alternatively, if we can partition the table by (tx_date) our query will be able to use partition elimination to minimise the number of redundant table blocks we visit – provided the index on (customer_id) is a local index (although this may increase the number of index block visits to an undesirable level if we use a large number of partitions).

In conclusion: the clustering_factor can easily be too large and not reflect the true pattern of data scatter – this can make the optimizer ignore the index when it should be using it. But if the clustering_factor really does reflect the data scatter, and your queries get slower because they are visiting increasing numbers of blocks in the table, then you probably need extra columns in the index to allow you to eliminate some of the visits to the table.

13 Comments »

  1. @Jonathan

    can you explain a bit “…if we can recreate the table in our example as an index-cluster on (customer)…”

    another point: can we say that the index should be rebuild because index num rows > table num rows?
    That is, in the index there are rows deleted from the table.

    Thank you

    Comment by Antonio — May 18, 2007 @ 8:45 am BST May 18,2007 | Reply

  2. Antonio,
    (a) read about clusters at a data structure – chapter 18 of the DBA Admin Guide for 10gR2 is a good starting point.
    (b) No, you can’t say that the index should be rebuilt. If a row has been deleted from the table, its index entry has been deleted from the index.
    Pick 100 blocks at random from a table that has been subject to typical activity, and count the number of rows – use that number to estimate the number of rows in the whole table. Pick 1,000 blocks at random from the primary key index and count the entries in those blocks, use that number to estimate the number of entries in the index. The two answers probably won’t be exactly the same – even though it’s the primary key index. That’s just a reasonable side-effect of random sampling.

    Comment by Jonathan Lewis — May 18, 2007 @ 9:41 am BST May 18,2007 | Reply

  3. @Jonathan

    a) reading!

    b) right! now that I read your answer I realize how “silly” was my question! I was thinking that: you delete “physically” on the table but NOT “physically” on the index. Don’t know why I thought that…

    Comment by Antonio — May 18, 2007 @ 1:57 pm BST May 18,2007 | Reply

  4. Hi Jonathan,
    I did some analysis of the raw trace file to find out how many blocks, out of the total blocks visited by the statement, belonged to the table and how many blonged to the index (the index and table reside in their own tablespaces). Following is the statement, its explain plan and the wait statistics (I have taken out the actual values in the predicate section):
    SELECT TO_NUMBER(ITEM_KEY) ITEM_KEY, ACTIVITY_STATUS
    FROM WF_ITEM_ACTIVITY_STATUSES
    WHERE ITEM_TYPE = AND
    PROCESS_ACTIVITY = AND
    ITEM_KEY IS NOT NULL

    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.01       0.01          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch     2006    110.94     655.55     232869     239492          0       30073
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total     2008    110.95     655.56     232869     239492          0       30073
    Rows     Row Source Operation
    -------  ---------------------------------------------------
      30073  TABLE ACCESS BY INDEX ROWID WF_ITEM_ACTIVITY_STATUSES 
      30073   INDEX RANGE SCAN WF_ITEM_ACTIVITY_STATUSES_PK (object id 33898)

    Rows     Execution Plan
    -------  ---------------------------------------------------
          0  SELECT STATEMENT   GOAL: CHOOSE
      30073   TABLE ACCESS   GOAL: ANALYZED (BY INDEX ROWID) OF 
                  'WF_ITEM_ACTIVITY_STATUSES'
      30073    INDEX   GOAL: ANALYZED (RANGE SCAN) OF 
                   'WF_ITEM_ACTIVITY_STATUSES_PK' (UNIQUE)

    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                    2006        0.00          0.00
      db file sequential read                    232869        0.34        599.23
      latch free                                      8        0.02          0.09
      SQL*Net message from client                  2006        0.03          6.37
    ********************************************************************************

    As it can be seen that there were 232,869 single block reads done during the course of the query. Per the raw trace file, 227861 of those reads were done against the index where as 5008 were against the table which adds up to 232,869. This translate to a ratio of almost 45:1 of index blocks read to the table blocks read. I was expecting larger number of table blocks read than the index blocks due to the high CF?

    Comment by Amir Hameed — May 22, 2007 @ 1:37 am BST May 22,2007 | Reply

  5. Amir, the result is counter-intuitive, although can be explained after a little thought.

    The first thing to remember, though, is that the clustering_factor is Oracle’s count of the number of table block jumps you take as you walk the index in order [See Chapter 5 of the book for a complete description]. But if you have (for example) multiple freelists, or have used ASSM (automatic segment space management) in the tablespace, then the number of jumps could be very large compared to the number of blocks visited, as you could keep revisiting the same blocks.

    The second thing to consider is that the number of rowids you use from the index can be much less than the number of index row entries that you examine – and I suspect that this has happened here.

    I would guess that the activity is similar to the following. Imagine an ‘order lines’ table with an index (status, product_id, order_date). Imagine a query ‘select … where status = ‘A’ and order_date = trunc(sysdate)’.

    All the rows for today will be packed into just a few blocks in the table, but if you use this index to drive the query, you could have thousands of index entries to scan in the index for (say) ProductX where ths status is ‘A’ to find the few rows for today. Do the same for ProductY and so on, and you find that you scan a HUGE swathe of the index, reading hundred of blocks, but just load and re-read a few dozen table blocks.

    This type of accident can happen quite easily if you set the optimizer_index_cost_adj to a low value (anything less than about 20 starts to look a little dangerous) and the optimizer_index_caching to a high number (anything above about 80 starts to look a little dangerous).

    I see that your input has been mess up by the way that white space is eliminated. Check the notes about this on the opening page of the blog. If you want to do a global edit on the tkprof file to change all spaces to the “nbsp” tag and re-post just that piece as a new comment, I’ll paste it back into the original post.

    Comment by Jonathan Lewis — May 22, 2007 @ 8:54 pm BST May 22,2007 | Reply

  6. Interesting usage of the word ‘swathe’. I haven’t seen it used in IT literature before, but it does seem to fit the bill nicely in the given context.

    Comment by Jeroen Wilms — May 23, 2007 @ 8:15 am BST May 23,2007 | Reply

  7. Jeroen, following your comment I did a quick dictionary check in case I had picked one of those English/American anomalies – and discovered that the more common spelling of the word appears be “swath” – which does have a very different meaning from the other meaning of “swathe”.

    For those without immediate dictionary access, my original image was of a (traditional) farmer sweeping across a field with a scythe.

    Comment by Jonathan Lewis — May 23, 2007 @ 3:26 pm BST May 23,2007 | Reply

  8. Jonathan,
    – I just re-read chapter 5 of your book which makes perfect sense but the difference is that in our case, the optimizer is using the index in spite of reporting a high CF on it.
    – We are using multiple freelists and there are 16 defined for this index. This is a high-concurrency workflow table.
    – optimizer_index_cost_adj and optimizer_index_caching parameters are set to 100 and 0 respectively. This is Oracle’s recommended setting for EBS and if I try to change it, Ahmed Alomari will personally come knocking at my door!
    – This is a primary key index and therefore the number of rowids should be the same as the number of entries in the index.
    I am not a big advicate of rebuilding indexes because of the same reasons that you have explained on page 71 of your book unless something has drastically changed that requires index rebuild, like massive deletes because the table retention policy has changed from , say 10 million to 1 million rows, etc. But in this case, it seems that this may be the only option available to optimize the statement because the query is using the index, it is just that it is reading too many index blocks. I am currently running “validate structure” on the index to see the space related statistics. I will also try to hack the CF to a lower value but I suspect that it will not help.
    (Statistics sent by email, and added by JPL).

    Pre-rebuild:
    ============
                                                DELETED
     IDX                LEAF ROWS    DELETED   LEAF ROWS     DIST
    HEIGHT  LEAF ROWS   LENGTH(k)   LEAF ROWS  LENGTH(k)     KEYS
    ------ ----------- ----------- ----------- ---------- -----------
         4  90,590,414   3,019,115  13,015,804    368,400  90,590,414

                                         OPTIMAL  COMPR
        BTREE           USED      %SPACE  COMPR   SPACE
       SPACE(k)       SPACE(k)     USED   COUNT   SAVE
    -------------- -------------- ------ ------- -------
         5,990,670      3,038,769     51       2       0

    Post-rebuild:
    =============
                                                DELETED
     IDX                LEAF ROWS    DELETED   LEAF ROWS     DIST
    HEIGHT  LEAF ROWS   LENGTH(k)   LEAF ROWS  LENGTH(k)     KEYS
    ------ ----------- ----------- ----------- ---------- -----------
         4  77,574,670   2,650,717           5          0  77,574,670

                                         OPTIMAL  COMPR
        BTREE           USED      %SPACE  COMPR   SPACE
       SPACE(k)       SPACE(k)     USED   COUNT   SAVE
    -------------- -------------- ------ ------- -------
         2,979,069      2,662,420     90       2       0

    Comment by Amir Hameed — May 23, 2007 @ 5:15 pm BST May 23,2007 | Reply

  9. Hi Jonathan,

    Swathe is indeed the correct spelling based on the etymological lineage given in the 2nd edition of ODE which ties this word back to the Old High German word schwabe which is what you did in the mid-sixteenth century Germany when you had to clear a field of grass or wheat as there weren’t a lot of database indices to clear in those days…

    Cheers,
    Jeroen

    Comment by Jeroen Wilms — May 24, 2007 @ 1:04 pm BST May 24,2007 | Reply

  10. Amir, I’ve finally had time to review your comment and copy in the stats you sent me.
    A lot of this may depend on the fact that you’re talking about workflow. There is a lot of empty space in this index before rebuilding – but where is it, and how is it used ?

    Workflow tends to add rows to the right hand end of a “queueing” index, and delete (most) rows from the left hand end. Such indexes are perfect candidates for something I call the FIFO problem (and Tom Kyte refers to as ‘sweepers’).

    The index may be nearly empty over a great long section of the left hand side, and well packed at the right-hand end. So your 50% space used may really be half the index with 0%, and half the index with 100% packing.

    If you have to walk a long way through the empty part of the index before getting any useful table data, you are wasting resources in the index. As an added penalty, if you have to walk through a lot of the index and then go to table rows that you can’t process (which is probably why they’re still there in an ‘old’ part of the index) then you’re wasting even more resources.

    This looks like an example of an index where you need to do a detailed analysis (there are a couple of scripts on my website that may help: Index Efficiency, and Index Efficiency pt. 2). It is possible, of course, that if you just think about the way the data is used and the impact this has on the index then you won’t even need to investigate the index.

    Once you understand the usage, you can decide if you need to do a rebuilds from time to time, or whether you can simply get by with a regular coalesce.

    Comment by Jonathan Lewis — May 28, 2007 @ 11:17 am BST May 28,2007 | Reply

  11. Wow, simply perfect. I’ve run into this problem as stated in the post and have been looking into it for a while. Jonathan, your post #14 is very good, but I still don’t understand, why would you have to walk through the empty side of the index, wouldn’t index just “move right” immediately when searching?

    Sorry, I don’t understand why the index would even bother walking the blocks where the table rows don’t exist (the As an added penalty . . .).

    Comment by RobH — May 29, 2007 @ 3:44 pm BST May 29,2007 | Reply

  12. Robh, Assume you have 1,000,000 rows indexed using a sequence number. Now set the value to null in every row with a value less than 999,000 except for the multiples of 1000.

    You now have an index with lots of empty space. It’s probably about 2,000 blocks long, every other block is completely empty but still in the structure, every other block has one entry in it. (This is an approximation, I haven’t created the data to check accuracy but the image is about right).

    If you run the query:

    select * from table where id_col is not null and some_flag = ‘X’

    Oracle will (probably) use the index, and has to start at the left hand edge of the index (where the first available entry has id_col = 1000) and use the forward pointers to move from block to block in the index.

    This is a fairly common coding method for queue driven tasks. If the task can’t process that small number of rows (those old multiples of 1,000) you see a lot of wasted work happening before the tasks gets to the usable data.

    If your code allows you to ignore the “bad” table rows before you go to look at them, you reduce the waste. If you coalesce the index (possibly on a regular basis) you reduce the waste even more.

    Important point: An index block does not go on to the free list until it is completely empty. Then, when it is on the free list, it is still linked into the index structure in its original position until it finally gets re-used. In my example I might have 1,000 empty blocks on the free lists, but I’m not going to take any of them out of the structure until I insert more rows into the table and need more space at the top, or right hand edge, of the index. The coalesce command cleans things up, and removes empty blocks from the structure.

    Comment by Jonathan Lewis — June 1, 2007 @ 10:47 am BST Jun 1,2007 | Reply

  13. [...] in Oracle and why it is beneficial from a performance perspective (for example Jonathan Lewis: http://jonathanlewis.wordpress.com/2007/05/17/data-cluster/). It touches things like the clustering factor for Indexes and the db_file_sequential_read [...]

    Pingback by DBMS_REDEFINITION, clustering and how an outline helps to make it completely ONLINE « Toine van Beckhoven’s Oracle blog — September 24, 2009 @ 10:22 am BST Sep 24,2009 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,873 other followers