Oracle Scratchpad

August 17, 2009

treedump

Filed under: Indexing,Infrastructure,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 5:31 pm GMT Aug 17,2009

Here’s a mechanism for examining indexes in some detail if you think that something odd may be going on inside them. It’s a feature that I first decribed in Practical Oracle 8i, although the book doesn’t mention a problem with it that I subsequently discovered that means I always test it carefully on a small index before I use it on a large one.  Here’s a demonstration – cut from an SQL*Plus session on 10.2.0.3:

SQL> select object_id
  2  from user_objects
  3  where object_name = 'ORD_PLACED'
  4  ;
 OBJECT_ID
----------
    100466
1 row selected.
SQL> alter session set events 'immediate trace name treedump level 100466';
Session altered.

The treedump dumps a trace file with a summary of the index (or index partition) whose object_id you have specified in the level. In general you get one line in the trace file for each block in the index – but there are some versions of Oracle where you get a full symbolic dump for every leaf block, hence the need to check with a small index before trying it on a large one.

The dump will look something like the following (which is the start of a dump of the index sys.i_source1):

----- begin tree dump
branch: 0x40035a 4195162 (0: nrow: 3, level: 2)
   branch: 0x40e2c0 4252352 (-1: nrow: 312, level: 1)
      leaf: 0x40035b 4195163 (-1: nrow: 240 rrow: 240)
      leaf: 0x40215f 4202847 (0: nrow: 218 rrow: 218)
      leaf: 0x40035c 4195164 (1: nrow: 444 rrow: 444)
      leaf: 0x40035d 4195165 (2: nrow: 444 rrow: 444)
       ... etc ...

To generate the dump, Oracle walks the tree structure in order one block at a time. The hexadecimal and decimal numbers starting each line are the block address expressed in two different ways.

The first line of the dump shows the root block. In this case the index has blevel=2 (which means a height of 3), so a walk down the index will visit the root, one level 1 branch, and then hit a leaf block. The root block currently has three pointers (nrow = 3) to branch blocks at level 1.

Note 1 – The root block is normally no different in format from a branch block, but when an index consists of nothing but the root block the root block is formatted as a leaf block not a branch block.
Note 2 – whatever block splits happen, the root block is always kept at the same location – the block after the segment header block and any freelist group blocks.

Indented one step (three characters) line 2 shows you the first branch block of the level below – the first of the level 1 branch blocks in this case. In brackets you see that this is entry -1, has 312  pointers to blocks at the next level and is a level 1 branch block. The significance of level 1 is that the entries will be pointing to leaf blocks. 

It may seem a little odd that Oracle starts at the “-1th” branch block: computer programs usually start counting at zero or one, but Oracle has a rationale for starting at -1 in branch blocks. Most of the entries in a branch block are pairs of the form (partial key, block address) – and these are listed in the block’s row directory – but there is one special entry (called the “leftmost child – kdxbrlmc”) which omits the partial key and gets stored in a separate place outside the row directory. Oracle omits the key to save a little space and infers the value from the block’s parent (in the branch level above). Since the row directory starts counting at row 0, the leftmost child has to be labelled “-1″.  

I haven’t shown it, but 312 lines further down the trace file, you’d find branch block zero, and the other two branch blocks would be a similar sort of distance further down again.

The third line of the trace file is the first leaf block (and again we start counting from -1). Note that there is no “level” in the leaf block. Instead you get two entries for rows: the number of current rows (rrow) in the block and the current size (nrow) of the block’srow directory (see glossary). The row directory may hold references to rows that have been deleted, committed, but not yet cleaned out of the structure, which is why the dump shows two pieces of information.

The part of the index shown in the example is completely clean and tidy (rrow = nrow in every block) but something a little odd has happened at the start of the index as we can see evidence of a 50/50 leaf block split at the low end due to inserts after the index was intially filled – note how the leaf block addresses are nearly in sequence (035b, 215f, 035c, 035d) but a completely different block number appears between the first and third blocks to hold the overflow as the first block split.

For comparison purposes, here’s a section of the index produced during one run of my “index explosion” code:

branch: 0x140118a 20976010 (0: nrow: 551, level: 1)
   leaf: 0x140118b 20976011 (-1: nrow: 409 rrow: 409)
   leaf: 0x140118c 20976012 (0: nrow: 214 rrow: 214)
   leaf: 0x140118d 20976013 (1: nrow: 214 rrow: 214)
   leaf: 0x140118e 20976014 (2: nrow: 213 rrow: 213)
   leaf: 0x1401190 20976016 (3: nrow: 206 rrow: 206)
   leaf: 0x140118f 20976015 (4: nrow: 327 rrow: 327)
   leaf: 0x1401191 20976017 (5: nrow: 169 rrow: 169)
   leaf: 0x1401192 20976018 (6: nrow: 169 rrow: 169)
   leaf: 0x1401194 20976020 (7: nrow: 159 rrow: 159)
   leaf: 0x1401193 20976019 (8: nrow: 169 rrow: 169)
   leaf: 0x1401196 20976022 (9: nrow: 160 rrow: 160)
   leaf: 0x1401195 20976021 (10: nrow: 327 rrow: 327)
    ...
   leaf: 0x140152d 20976941 (544: nrow: 141 rrow: 141)
   leaf: 0x140152c 20976940 (545: nrow: 146 rrow: 146)
   leaf: 0x140152f 20976943 (546: nrow: 140 rrow: 140)
   leaf: 0x140152e 20976942 (547: nrow: 282 rrow: 282)
   leaf: 0x1401530 20976944 (548: nrow: 282 rrow: 282)
   leaf: 0x1401531 20976945 (549: nrow: 138 rrow: 138)

We have an index with blevel = 1, and 551 leaf blocks (numbered -1 to 549).

The first leaf block is 100% packed and underwent a “90/10″ split.

The next 4 leaf blocks show clear evidence of 50/50 splits (each holds roughly half of the maximum 409 rows in the first block).  The next leaf block suggests that a bit of a catastrophe hit the block and its ITL grew sharply – this block is 100% “full” (a fact that I ascertained by doing a block dump), but can only fit 327 entries because its ITL has grown to about 70 entries.

The size of the ITL doesn’t change over the next few blocks, but we can see a few 50/50 splits followed by a 90/10 split – and so it goes on. By the time we get to the end of the list, we’ve had another glitch, and a “full” block now holds only 282 rows, and blocks showing 50/50 splits hold about 140 rows. In this test run, the ITLs grew to a worst case of about 112 entries – not a complete disaster but certainly bad news.

The treedump isn’t detailed enough to tell you everything that’s happened to your index – but it is relatively short (one line per block) and gives you some good clues about what’s been going on. And the presence of the decimal form of the block address makes it easy to dump individual blocks if you want to investigate further, for example:

SQL> select
  2     dbms_utility.data_block_address_file(20976011) file#,
  3     dbms_utility.data_block_address_block(20976011) block#
  4  from
  5     dual
  6  /

     FILE#     BLOCK#
---------- ----------
         5       4491

1 row selected.

SQL> alter system dump datafile 5 block 4491;

System altered.

Footnote:

In some versions of Oracle you get a full block dump of every leaf block – which makes it very expensive to do a treedump.
In versions of Oracle prior to 9.2.0.6 there is a bug which means your session will crash with a “numeric overflow” error if you try to dump an index that has been created automatically to support a unique or primary key constraint. This makes it hard to dump index organized tables(IOTs). This is because bit 12 (4096) been set in ind$.property and the code isn’t expecting such a large number. (Technically it is possible to update the property column directly; it works, but I wouldn’t do it on the production system – but maybe on a backup you’re about to discard.)

Update Dec 2010: (See this comment on a related post): Each line of a treedump takes about 75 bytes, so you need to be able to generate a tracefile of around (75 * leaf_blocks / 1048576)MB. Make sure you check parameter max_dump_file_size before you start.

16 Comments »

  1. Hi Jonathan:
    How did you caluclate the ITL number in these cases?
    -this block is 100% “full”, but can only fit 327 entries because its ITL has grown to about 70 entries.

    – In this test run, the ITLs grew to a worst case of about 112 entries

    Thank you
    Kumar

    Comment by Kumar — August 18, 2009 @ 4:21 am GMT Aug 18,2009 | Reply

  2. Two options –

    Spotting the patterns (some blocks with 327 rows, and a few in between of roughly half that – looks like 90/10 and 50/50 splits) and knowing the typical row size, and making a few guesses.

    Dumping a couple of blocks with “suspicious” row counts.

    Comment by Jonathan Lewis — August 18, 2009 @ 4:58 pm GMT Aug 18,2009 | Reply

  3. Could you give a little more detail on the leftmost child branch block. From another JL article,

    If there is only room for a single entry in each leaf block, you may wonder how Oracle can manage to produce a binary tree at all — surely a binary tree needs two pointers in each branch, and the blocks are too small to hold two keys. But a branch block uses several tricks to maximise efficiency; and one of these tricks is that the first pointer in the block (known as the “leftmost child pointer” — field kdxbrlmc in the block dump) doesn’t have a key value associated with it; its key value is implicitly “something less than the key of the next pointer”.

    So this is a special case when “there is only room for a single entry in each leaf block?”

    If that’s the only time, then this is more of handling a boundary condition than an optimization, no?

    Comment by Mark Brady — August 18, 2009 @ 5:57 pm GMT Aug 18,2009 | Reply

    • Mark,
      I’ve edited your comment to clarify that quote from my the article website (a 45KB word document).

      It’s certainly a nice idea to consider, but history suggests otherwise. I can’t remember exact versions, but there was a time when the longest key was limited to about 40% of the block size (as mentioned here), and this particular trick was in place even in those days.

      Comment by Jonathan Lewis — August 18, 2009 @ 8:01 pm GMT Aug 18,2009 | Reply

  4. [...] 3-How to read treedump for what happened in indexes ? Jonathan Lewis – Treedump [...]

    Pingback by Blogroll Report 14/08/2009 – 21/08/2009 « Coskan’s Approach to Oracle — September 8, 2009 @ 12:07 pm GMT Sep 8,2009 | Reply

  5. [...] (are you likely to have hot spots in the index, or a few very popular values) . Dump a few blocks (see the end of this note) from one of the higher numbered extents in the index to see if they have a lot of ITL entries, or [...]

    Pingback by Index Efficiency 3 « Oracle Scratchpad — March 3, 2010 @ 9:21 pm GMT Mar 3,2010 | Reply

  6. [...] start with an article on the treedump command, which I published a little while ago. This dumps a tracefile describing your index, one [...]

    Pingback by Treedump – 2 « Oracle Scratchpad — March 7, 2010 @ 6:34 pm GMT Mar 7,2010 | Reply

  7. Hi Jonathan,

    I have a remark about dbms_utility.data_block_address_file(20976011) file#. If if I’m not mistaken, the treedump gives the relative DBA (data block address) of the leaves. But block dump takes the absolute file#.
    So in your example dbms_utility.data_block_address_file(20976011) gives the rfile# that we need to convert into file# (from v$datafile – knowing the index segment’s tablespace) before using it for block dump. They were probably equals in your example.

    Plase correct me if I’m wrong.

    Regards,
    Franck.

    Comment by Franck Pachot — April 29, 2010 @ 7:19 am GMT Apr 29,2010 | Reply

    • Franck,

      Thanks for that, it’s a very good point, and one that I had completely overlooked. I think you’re probably correct (it certainly makes sense), but I haven’t actually checked but I will remember to verify it one day.

      I think the difference won’t be visible to many DBA’s, though; my memory of the change in file number formats is that Oracle rigged their implementation so that the relative file number and absolute file number were the same until you had about 1,024 files (possibly 1,022 if you want to be precise) – and there is an event you can set to break this connection so that you can test without having to create a thousand files.

      Comment by Jonathan Lewis — May 2, 2010 @ 11:37 am GMT May 2,2010 | Reply

  8. Hi, In fact I have tested it as I had to treedump (using our example) on a database that has file# != rfile#, and that’s how I discovered it.

    It is not only the number of datafiles that makes them different: when you use transportable tablespaces, the relative file number do not change (or that would involve visiting all data blocks in the tablespace, and all rowids that reference a block within that tablespace…)

    Comment by franckpachot — May 6, 2010 @ 12:21 pm GMT May 6,2010 | Reply

  9. Franck,

    Thanks for that comment – excellent point about the transportable tablespace as well, that’s another of those things that’s easy to forget.

    I put in a “!=” on your comment, I’m guessing that the HTML parser took out a “less than, greater than” that you’d had there.

    Comment by Jonathan Lewis — May 6, 2010 @ 6:20 pm GMT May 6,2010 | Reply

  10. [...] treedump : A useful (though sometimes expensive) option for looking into index structures [...]

    Pingback by Index ITLs « Oracle Scratchpad — June 5, 2010 @ 8:58 am GMT Jun 5,2010 | Reply

  11. Jonathan,

    Few questions to understand the treedump…

    1.I checked first 30K rows and last 1k rows(rows for leaf blocks) in my treedump and i found that ‘nrow’ is same as ‘rrow’ everywhere. How should i interpret it ?
    2.snap of my treedump

    ----- begin tree dump
    branch: 0x40b700d 67858445 (0: nrow: 14, level: 4)
       branch: 0x161bada0 370912672 (-1: nrow: 118, level: 3)
          branch: 0x14036237 335766071 (-1: nrow: 112, level: 2)
             branch: 0x16002253 369107539 (-1: nrow: 187, level: 1)
                leaf: 0x40b700e 67858446 (-1: nrow: 205 rrow: 205)
    

    Am i right in interpretation that,
    there are 14 branch blocks at level 4 (just below root block),118 branch blocks at level-3,112 branch blocks are level-2 and 187 branch blocks at level-1.
    First branch block(i.e. ‘-1′) at level-1 points to 205 leaf blocks…and so on…???

    Comment by Bhavik Desai — December 6, 2010 @ 3:05 pm GMT Dec 6,2010 | Reply

    • Some clarification.
      The “root” block is just another type of branch block – so the first line is the root block which holds 14 entries which point to 14 blocks below it. The root block is at level 4 (blevel = 4, height = 5 for your index).

      The first block pointed to by the root block is at level three, it holds 118 entries; there are 13 other branch blocks at this level

      The first block pointed to by the first level 3 branch block is at level 2, it holds 112 entries; there are 117 branch blocks at this level pointed to by the same parent level 3. We could “guesstimate” that there are roughly 13 * 118 other blocks at level 2 (assuming each of the 14 level 3 branch blocks all hold the same number of pointers.

      etc. etc. etc.

      Comment by Jonathan Lewis — December 7, 2010 @ 7:22 am GMT Dec 7,2010 | Reply

  12. [...] snapshots of v$mystat, calls to “alter system flush buffer_cache”, and event 10200; the treedump can also be very helpful for identifying block [...]

    Pingback by Quiz Night « Oracle Scratchpad — January 23, 2011 @ 6:00 pm GMT Jan 23,2011 | Reply

  13. Oracle Index Rebuild – A Corner Case…

    Regular index rebuilds in Oracle are a vital necessity – this old myth has been proven wrong long ago. But here is the story about an index where at first we did not understand why a rebuild reduced the size so heavily.After a blooper involving a DROP …

    Trackback by BlogNotes to Myself — September 6, 2012 @ 8:20 pm GMT Sep 6,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers