Oracle Scratchpad

July 20, 2009

Index Quiz 1

Filed under: Index Explosion,Indexing,Infrastructure — Jonathan Lewis @ 7:41 am BST Jul 20,2009

I’m encroaching on Richard Foote’s territory here – with plans to write a few details about some of the implementation details of Oracle’s B-tree indexes. My strategy, though, is to entertain by asking a few questions that might prompt a little speculation before giving some answers. So …

After running validate index against a particular index on my system, I select the following columns from view index_stats: (using Tom Kyte’s invaluable “one column per line” routine).

HEIGHT                        : 3
BLOCKS                        : 640
LF_ROWS                       : 100008
LF_BLKS                       : 601
LF_ROWS_LEN                   : 1900152
LF_BLK_LEN                    : 3992
DEL_LF_ROWS                   : 0
DEL_LF_ROWS_LEN               : 0
DISTINCT_KEYS                 : 100008
MOST_REPEATED_KEY             : 1
BTREE_SPACE                   : 2423288
USED_SPACE                    : 1908481
PCT_USED                      : 79

Looking at these results, what would your opinion be about rebuilding this index ?

  1. Definitely not
  2. Probably not
  3. Small bias against
  4. Insufficient information
  5. Small bias for
  6. Probably would
  7. Definitely would

Don’t forget to justify your opinion.

As a bonus question – what do you think my blocksize was ?

Footnote (or possibly Foote note): Richard Foote is not allowed to answer the question – as he probably knows where I am going with this set of posts.

Update: My final answer is in comment 9.

[Further reading on Index ITL Explosion]

33 Comments »

  1. 1. Definitely not. No reason to. Assuming 640 blocks of 4KB each, I get a 2.5MB index with 100,008 entries (approx 19 bytes each). This has 0 deleted entries so a rebuild will not “reclaim” space.
    However, the fact that HEIGHT appears to be 3 means that there is a curious pattern to the inserts. It is a multicolumn index (or an index on a large VARCHAR2 column) and, I think, it isn’t defined as UNIQUE index although entries are unique. A pattern of repeated inserts and deletes without intervening commits would cause a non-unique index to keep growing.

    Comment by Hemant K Chitale — July 20, 2009 @ 8:59 am BST Jul 20,2009 | Reply

    • Your first comment about a row size of 19 bytes is correct – it’s non-unique on a date column.

      The height of three is not a big surprise given the other figures I’ve reported, though, altough it is just a bit “unlucky”. We had 601 leaf blocks, so the number of branch rows is very similar – and our virtually unique date column gives us 14 bytes per branch row. Doing the arithmetic, we can work out that we did a split somewhere in the last few leaf blocks that propagated up the index to leave us with two level 1 branches below the root.

      (If the volume of data is stable, the figures suggest*** that we might actually have an index that should be rebuilt or coalesced regularly to stop it from expanding to height three – the cost of doing so won’t be great, and the benefit might be worth the trouble and risk).

      *** Until you notice the anomaly of the lf_blk_len.

      Comment by Jonathan Lewis — July 20, 2009 @ 7:34 pm BST Jul 20,2009 | Reply

    • >However, the fact that HEIGHT appears to be 3 means that there is a curious pattern to the inserts.
      Hemant,
      not sure how you came to that conclusion, but here’s an example of similar index with height of 3 – and without any DML.

      drop table t cascade constraints purge;

      create table t(x) as
      select lpad(rownum, 19, '0') from dual
      connect by level <= 70000;

      create index t_indx on t(x);

      analyze index t_indx validate structure;
      select height, blocks, lf_blks, br_rows, br_blks, used_space, btree_space
      from index_stats;

          HEIGHT     BLOCKS    LF_BLKS    BR_ROWS    BR_BLKS USED_SPACE BTREE_SPACE
      ---------- ---------- ---------- ---------- ---------- ---------- -----------
               3        384        304        303          3    2178148     2456096
      

      Comment by Timur Akhmadeev — July 21, 2009 @ 7:22 am BST Jul 21,2009 | Reply

      • Timur,
        Your example used lpad(,19) – and I wondered if that was because I’d said the rowsize in the index was 19.

        If so, I’d like to point out that my 19 included all row overheads, and the 19 bytes came from:

        Date                7 bytes
        Length byte         1 byte
        Rowid               6 bytes
        Length byte         1 byte  (for non-unique index)
        Lock byte           1 byte
        Flag byte           1 byte
        Row directory entry 2 bytes

        I also mentioned the size of the branch entries – whose size is affected by the nature of my data set – as 14 bytes:


        Date (truncated)    6 bytes
        Length byte         1 byte
        Terminator          1 byte
        Block pointer       4 bytes
        Row directory entry 2 bytes

        The significance of the ‘truncated’ and ‘terminator’ is that a branch entry takes an index key and reduces it to the shortest possible piece that allows Oracle to see the difference it and the next branch entry. With my data set Oracle typically needed only the first six bytes of the date to see the difference.

        Comment by Jonathan Lewis — July 21, 2009 @ 8:01 am BST Jul 21,2009 | Reply

  2. Here are my two cents:

    1) Block Size: BTREE_SPACE/BLOCKS = 2423288/640 = 3786.3875 ~= 4K

    2) “Probably not”

    Assuming PCTFREE=10%, the index block space is not fully utilized (100-10-79=11). So, 11% space is under utilized which comes to 260 KB. It is evident from INDEX_STATS that all the rows are unique. I think that the index is a right-hand index (growing in ascending order). So, 11% space could be due to 50-50 block splits. Few rows were inserted (committed or uncommitted) to the table which fall between the minimum and maximum values. Hence resulting in 50-50 block splits which in turn is causing space wastage.

    By rebuilding this index we would gain 260 KB of space or be able to shrink the allocated block count by 65 (206/4) blocks.

    I would constantly monitor this index and consider rebuilding may be when space waste reaches 20-25% or more.

    Comment by Asif Momen — July 20, 2009 @ 10:51 am BST Jul 20,2009 | Reply

    • Your guess about it being a “right-hand index” is correct. Your method for estimating the space saving of a rebuild is perfectly reasonable – assuming pctfree = 10, and assuming your estimate of the block size is correct.

      An assumption that you’ve made implicitly, I think, is that I haven’t shown you the index in a carefully chosen moment in it’s life – that’s a point I’ll have to make explicit in future examples.

      Comment by Jonathan Lewis — July 20, 2009 @ 7:40 pm BST Jul 20,2009 | Reply

  3. Definitely not

    My opinion — 99.9% of all reorgs, rebuilds, etc are a total and utter waste of time and energy. We spend way way way too much time losing sleep over this non-event.
    (Tom Kyte)

    :lol:

    A question: when we tall of rebuild, we always think of “rebuild for space” but we can also “rebuild for time”.
    As an example we can rebuild the index in reverse order just to optimize response time, can’t we?

    Yep, in some special cases, but in this case I don’t know if this can help… :(

    Comment by lascoltodelvenerdi — July 20, 2009 @ 11:28 am BST Jul 20,2009 | Reply

  4. Interesting observations so far, but I won’t comment until the Americans and Australians have had more time to contribute.

    I will just point out to lascoltodelvenerdi, though, that I did say: “looking at the results”, not “appealing to authority” – and if Tom says 99.9% then that leaves 0.1% so even appearling to authority your answer should have been “2. probably not” rather than “1. definitely not”.
    ;)

    Comment by Jonathan Lewis — July 20, 2009 @ 12:54 pm BST Jul 20,2009 | Reply

    • D’oh!
      Stupid 0.1%!

      (Homer Simpson)

      :lol:

      Seriously.
      I’m learning a lot from this quiz.

      One question: here we got an index with 640 blocks and 601 of them are leaf, this don’t implies that the index structure is “good” because we must consider block split, am I right?

      P.S.
      “set of posts” so more are coming!

      Comment by lascoltodelvenerdi — July 20, 2009 @ 1:37 pm BST Jul 20,2009 | Reply

      • The 640 blocks is the space allocated to the segment – so the difference between that and “leaf blocks plus branch blocks” are the blocks above the (high) highwater mark. But block splits are important in this example. (I didn’t report the information about branch blocks – and it contains an important clue).

        Comment by Jonathan Lewis — July 20, 2009 @ 7:10 pm BST Jul 20,2009 | Reply

  5. Ive been reading all the Richard notes and J. Lewis books specifically on indexes in the last week. And I thought I would be able to slam into this with inciteful and complete knowledge and …. I dont know for sure ;o( I thought I was getting there and now I realise that what I thought I knew I did not – back to the testing board as my observational abilities just cannot cope with this one – sounds like my whole life as a DBA. BTW – Jonathan – Cost Based essentials book – brilliant.. struggled a bit with the hash join area when trying to solve some swapping in a three views joined with qb names apparently not forthcoming form the trace/autotrace – but it definitely helped me sort my problem in the end – thanks.

    Comment by kent — July 20, 2009 @ 1:13 pm BST Jul 20,2009 | Reply

    • Kent,

      Thanks for the comments – glad the information is useful.

      Don’t despair about how much there is to know. Because of our expertise, Richard and I both get called to solve the more unusual problems – which is why we can always come up with behaviour that other people may only see rarely. I hope, though, that by showing some of the oddities, and getting people to discuss them, it will make it easier for people to understand some of the more common behaviour.

      Comment by Jonathan Lewis — July 20, 2009 @ 7:08 pm BST Jul 20,2009 | Reply

  6. 4 – insufficent information, assuming the defaults have been used there is some reclaimable space in the index this doesn’t appear to be a pressing reason to rebuild. Without knowing if there is a performance problem or what processes are using the index and how the data is being populated or updated rebuilding is just a way look busy.

    I’d guess its an 8k block size, LF_BLK_LEN is the usable space per leaf block so 3992 would be too much free space in a 4k block as the fixed header is 113 bytes 4096-113=3983 my guess is there is a really big value of inittrans decreasing the usable space of course Icould be wrong.

    Comment by Chris_c — July 20, 2009 @ 2:03 pm BST Jul 20,2009 | Reply

    • Chris,
      Well done, I wasn’t sure that anyone would pick up that subtle clue for the lf_blk_len. You’re right that this is an 8KB block size – there’s a little extra space lost on index leaf blocks which means that the largest value that you can get for lf_blk_len is actually 3,904 bytes – so the block has to be bigger than 4KB.

      Your guess about initrans is also very good – the index has been hammered by concurrent activity and although I didn’t set a massive initrans, there were ITLs (interested transaction lists) that grew to 169 entries, and this has had massive impact on the index.

      Comment by Jonathan Lewis — July 20, 2009 @ 7:27 pm BST Jul 20,2009 | Reply

  7. Jonathan,

    I’d like to qualify my first response. If, as I suspect, this index got this way because of a pattern of DELETEs and INSERTs without intervening COMMITs (and the Index not being defined as a UNIQUE index), and this pattern is likely to continue every day, I would schedule a periodic rebuild of the index.

    Comment by Hemant K Chitale — July 20, 2009 @ 3:03 pm BST Jul 20,2009 | Reply

  8. PCT_USED of 79% looks like this is a “normal” B-Tree index with no significant space benefit in case of rebuild. It might be beneficial (in terms of used space only) to rebuild it though, if there would be additional information about the data and type of data change activities table is usually undergoing.

    Comment by Timur Akhmadeev — July 20, 2009 @ 5:32 pm BST Jul 20,2009 | Reply

    • Timur,

      Absolutely – in fact 79% is better than average for a “normal B-tree”. Unless you spot the little clue that Chris noticed (or unless you know something about the processing that goes on for the underlying table) there’s no reason to think that this index is in any particularly special state.

      Comment by Jonathan Lewis — July 20, 2009 @ 7:42 pm BST Jul 20,2009 | Reply

  9. As Chris said, the correct answer is “4. Insufficient information”. Obviously I could claim (as Chris pointed out) that if you don’t know how the index is used and why it exists you can’t know whether there is any benefit in rebuilding it; but on top of that I didn’t show you some of the numerical data from index_stats that you really needed to know about:


    BR_ROWS                       : 600
    BR_BLKS                       : 3
    BR_ROWS_LEN                   : 8329
    BR_BLK_LEN                    : 8032

    If you don’t have this information about the branch blocks it looks as if the index block size (as noted by Hemant and Asif) is 4KB and the utilisation is 80% which (as noted by Timur) is about average for a “random arrival B-tree”. That 80% figure pushes the answer towards “2. Probably not”.

    But how can you have a branch block length of 8032 (which means a block size of 8KB) when the leaf block length seems to suggest that the block size is 4KB – unless, like Chris, you’re very familiar with some critical numbers.

    The block length figures allow for the space in the block after all overheads have been catered for; and those overheads include the Interested Transaction List (ITL) which, in this case, happens to have reached 169 – the maximum possible for an 8KB block – in a lot of blocks. The index has apparently been subject to a load of massively concurrent DML from the moment the table was created. Based on the worst case ITL size, the validate command has worked out that available space for a leaf block is (8048 – 169 * 24) = 3,992 bytes. So the index entries seem to be using the available space quite efficiently (80%) – but there is a very unusual overhead in the number of ITL entries per block.

    Note – the DML doesn’t have to occur at a high rate (i.e. number of transactions per second) for a long ITLs to appear, it just needs a number of transactions to stay uncommitted for long enough that their activity overlaps. For example, if you execute just 180 transactions per minute but commit a transaction only once every five minutes there’s a lot of time for the number of active transactions, hence ITL entries, on a single block to grow to 169. (As far as heap tables and IOTs are concerned that’s just one argument for being a little cautious about how you use code that does “select for update, fiddle around for a while, update, commit”.

    If you coalesce (or rebuild) this index, the excess ITL entries disappear and the ITL in each leaf block drops to the size defined by initrans.

    But should you do this coalesce ?

    If the level of concurrency appears only when you are inserting the data then it may be a good idea – and this would only be relevant if the index was the “right-hand index” suggested by Asif.

    If the level of concurrency also appears when the data is subsequently subject to modification and deletion then coalescing the index is probably a bad idea as you may find that the moment the coalesce completes the index explodes catastrophically to make space for the necessary ITL entries. (Oracle will even do a leaf block split on a simple delete if it needs an extra ITL entry in an index block – ITL management on indexes is very different from ITL management on heap tables.)

    So the correct answer in the light of the extra figures (or if you’d known the tablespace block size) is definitely “4. insufficient information” as we can see that something rather unusual is happening in that index and we need to investigate further – and we’ll start to do that in Index Quiz 2.

    Comment by Jonathan Lewis — July 20, 2009 @ 7:45 pm BST Jul 20,2009 | Reply

  10. Damn…
    Just when I was feeling a bit confident (and motivated) about my knowledge of Oracle, you came up with this post and (as Kent said) I am back to square one.
    Just kidding… :)
    Sincere thanks to you and all contributors. Perfect example to show that every opinion helps (and correctness is not that important).

    Comment by Narendra — July 21, 2009 @ 8:25 am BST Jul 21,2009 | Reply

  11. >Your example used lpad(,19) – and I wondered if that was because I’d said the rowsize in the index was 19.
    Jonathan, no.
    I just picked up Hemant’s phrase “approx 19 bytes each”, built an example and forgot about space overheads (not for the first time). BTW, branch blocks of index in my test contains “truncated” values with length of 19, as expected.

    Comment by Timur Akhmadeev — July 21, 2009 @ 9:04 am BST Jul 21,2009 | Reply

  12. [...] started a series of postings relating to Oracle B-Tree indexes that are going to be well worth a read. He’s going to use a [...]

    Pingback by Index Quizzes With Jonathan Lewis (Talk Show Host) « Richard Foote’s Oracle Blog — July 21, 2009 @ 10:49 am BST Jul 21,2009 | Reply

  13. Why is Itl limited to 169 when MAX_TRANS can be set to 255 or should we read Itl is limited to 169 for 8K blocks. But then Oracle would set ini_trans irrelevant to the block size?

    Comment by B. Polarski — July 22, 2009 @ 1:28 pm BST Jul 22,2009 | Reply

    • The size of the ITL is limited to roughly half the size of a block (it’s 169 for 8KB block sizes, 83 for 4KB block sizes, and can reach the full 255 for 16KB block sizes).

      At 24 bytes per ITL entry:
      169 * 24 = 4056,
      83 * 24 = 1992
      – so just a little short in both cases.

      Comment by Jonathan Lewis — July 22, 2009 @ 5:50 pm BST Jul 22,2009 | Reply

  14. [...] Infrastructure — Jonathan Lewis @ 7:54 am UTC Jul 23,2009 This quiz is easier than Index Quiz 1 because you can just run the code, dump blocks, and find the answer. But see if you can work out [...]

    Pingback by Index Quiz 2 « Oracle Scratchpad — July 23, 2009 @ 7:55 am BST Jul 23,2009 | Reply

  15. [...] poised? Jonathan Lewis gives his Index Quiz 1: “I’m encroaching on Richard Foote’s territory here – with plans to write a few details [...]

    Pingback by Log Buffer #155: a Carnival of the Vanities for DBAs | Pythian Group Blog — July 24, 2009 @ 4:40 pm BST Jul 24,2009 | Reply

  16. Hi Jonathan,

    when yout say “the validate command has worked out that available space for a leaf block is (8048 – 169 * 24) = 3,992 bytes”,

    where does the 8048 figure come from?.

    Greetings.

    Comment by Richard — July 30, 2009 @ 6:07 pm BST Jul 30,2009 | Reply

  17. @Richard

    8048 is the size of the block.

    ^_^

    Comment by lascoltodelvenerdi — July 31, 2009 @ 8:22 am BST Jul 31,2009 | Reply

  18. Block size? I thought it should be 8192 bytes (1024*8) !!

    Comment by Richard — July 31, 2009 @ 9:10 am BST Jul 31,2009 | Reply

    • Richard – quite right.

      The 8,048 is the total amount of free space after elimination of the index leaf block overheads (excluding the default 2 ITL entries). It’s just a number I happen to know.

      Comment by Jonathan Lewis — August 6, 2009 @ 3:55 pm BST Aug 6,2009 | Reply

  19. Hi Jonathan,

    I have read several times this post but I can´t still understand what do you mean when you say this:

    “If the level of concurrency appears only when you are inserting the data then it may be a good idea – and this would only be relevant if the index was the “right-hand index” suggested by Asif.”

    Please, could you clarify this point a bit more?, I assume the coalesce is after the insert, right? . Why are inserts especially different?

    In Index Quiz II, you wrote that initrans in index blocks are ignored because it is always possible to get an ITL to do the split. Does this behaviour remain intact in “all” kind on index blocks (branch blocks, leaf blocks …) ?

    Thank you Jonathan for this blog.

    Comment by Richard — August 27, 2009 @ 12:21 pm BST Aug 27,2009 | Reply

    • Richard,

      Sorry for the delay – I missed this question first time around.

      The comment is just trying to pin down a case where the coalesce approach could be most effective.

      If you start with a “clean” time-based or sequence-based index – the “right-hand index” – and then have an accident that results in excess ITLs from there on, a call to coalesce will:

      a) walk the index up to the damage point and won’t have to fix anything before that point
      b) fix the damage at the right hand edge of the index and stop the damage moving forward.

      It’s a case where you’d hope to get maximum benefit for minimum resource consumption.

      In other cases, the coalesce might have to do lots of (essentially pointless) work all over the index just to fix one or two little pockets of damage.

      Comment by Jonathan Lewis — October 4, 2009 @ 10:00 am BST Oct 4,2009 | Reply

  20. Regarding the https://forums.oracle.com/forums/message.jspa?messageID=10301373#10301373 question you answered which i posted .
    Does wait event “enq: TX – allocate ITL entry” suggests a wait, where in multiple session trying to insert same block and waiting for allocate the ITL entry. what could be reason that wait event is not documented ?

    Comment by KG — July 22, 2012 @ 11:04 pm BST Jul 22,2012 | Reply

    • KG,

      I think the only possible answer to your question is that there is simply too much detail to document. Possibly the problem is made worse by the fact that the official documentation is not written by the people who write the code, and the people who write the code probably don’t make a note of every detail of their implementation so we end up with lots of cases where little bits of information are “lost” until someone (usually a DBA or support analyst) re-discovers them as a side effect of researching a performance issue.

      Comment by Jonathan Lewis — July 23, 2012 @ 5:13 pm BST Jul 23,2012 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,003 other followers