How smashed (in the non-alcoholic sense) can an index be?
One of the components in the cost calculation for an indexed access path is the “blevel” (branch-level) or, indirectly, the “height” of the index. Both count the steps from the root block down to a leaf block (and all leaf blocks are at the same distance from the root – that’s the meaning of “balanced” in the expression “balanced B-tree”) but the height includes the leaf level in the count while the blevel excludes it and counts down only to the lowest level of branch blocks (so height = blevel + 1).
In many cases you will find that even with a few million entries in a single index segment the height may still be only 3 (blevel = 2), and it may take a few tens of millions of rows before an index needs to grow to height = 4 (blevel = 3).
It’s often the case that the number of index entries per leaf block and block pointers per branch block is around 200 to 400, so the rate at which the height/blevel grows is tiny compared to the rate at which the number of rows in the index increases. But algorithms often have weak points, and some time around the year 2000 I started demonstrating an edge case where I could crash a session in less than 3 seconds (and most of that time was spent on Oracle creating the crash dump) by inserting just 25 (carefully designed) rows into a table.
I published an article about this in 2005 (see footnote), but since then the algorithm has changed. My demo worked in versions up to 9.2.0.4; but in later versions Oracle Corp. modified the way that index blocks (possibly just the branch blocks) split at the low end of the index making the harder to achieve a crash. If you have a MOS account you can check Doc ID 1748260.8: OERI:6051 possible during index manipulation.
The change wasn’t a response to my demo, of course; it was in response to a problem that could occur in production systems running some of the big “database agnostic” accounting or HR or CRM systems that created huges indexes on multiple columns. Even when the crash never occured the nature of the application and its indexing strategy could result in some indexes growing to a ridiculous height that made a dramatic difference to the cost calculations (hence the desirability of the “best” index).
It’s harder, and less likely to happen in the wild, but it’s still possible to make the same crash occur even in the newest versions of Oracle. It will (probably) take roughly 8 million (power(2,23) + 1) rows and 32GB of space to crash (or 128GB if you want to play nicely and use an 8KB block size – and tweak my code a little further).
Richard Foote spotted a slide with a surprising blevel in a short presentation about CBO arithmetic by Maria Colgan a couple of days ago, so I thought it would be entertaining to tweak the old code to see if it could still cause the crash. So here it is:
rem
rem Script: silly_index_3a.sql
rem Author: Jonathan Lewis
rem Dated: Mar 2004 / Jan 2022
rem Purpose: Build an index with a large blevel
rem
rem Notes:
rem Uses 2K block size for tablespace holding the index
rem Estimated run-time (for me) with m_blevel = 23 - ca. 1 hour
rem
define m_blevel = 5
define m_rows = power(2,&m_blevel)
drop table t1 purge;
create table t1 (v1 varchar2(8));
create index t1_i1 on t1(substrb(lpad(v1,1469,'0'),1,1469))
tablespace test_2k
;
-- execute snap_my_stats.start_snap
prompt ===================================================
prompt Inserting &m_rows (short) rows in reverse order
prompt ===================================================
begin
for i in reverse 1..&m_rows loop
insert into t1 values (i);
-- commit;
end loop;
end;
/
-- execute snap_my_stats.end_snap
prompt ================
prompt Validating index
prompt ================
validate index t1_i1;
select
lf_rows, height, height-1 blevel, lf_blks, br_blks
from
index_stats
;
column object_id new_value m_object_id
select object_id
from user_objects
where object_name = 'T1_I1'
/
alter session set events 'immediate trace name treedump level &m_object_id';
insert into t1(v1) values('0');
I’ve precreated a tablespace called test_2k with a block size of 2KB for this demo; you’ll need a couple of percent over 32GB for this tablespace.
This script then creates a table in my default tablespace to hold a small character column, and a function-based index on that column that produces a character result of 1469 bytes (which gives me the largest possible index entry that’s allowed in a 2KB block size). The older version of the code used a simple lpad() to do this, but the newer versions decided that that would produce up to 2*1,469 bytes thanks to my default character set – hence the substrb(), note, especially the b for byte.
With the structure in place I’ve then inserted numeric values in descending order into the table so that the index is constantly doing leaf block splits at the left hand (low) end.
Once I’ve populated the table I use a call to validate index so that I can report the number of rows, leaf blocks, branch blocks and the height (and blevel) of the index; then I find it’s object_id so that I can do a treedump of it.
For m_blevel = 5, here are the results of the query against index_stats after the call to validate the index:
LF_ROWS HEIGHT BLEVEL LF_BLKS BR_BLKS
---------- ---------- ---------- ---------- ----------
32 6 5 32 31
As you can see, setting m_blevel = 5 I get an index with blevel = 5, and 2^5 leaf blocks each holding one row. If you set m_blevel to 23 you’ll end up (after about 1 hour, probably) with a blevel of 23 and 8,388,608 rows and leaf blocks (and branch blocks = leaf blocks – 1: hence the 32GB+ requirement for the tablespace … 16M blocks at 2KB per block, plus ASSM overheads).
To show you what’s happening inside the index here’s the treedump (from 19.11.0.0) with m_blevel = 5
branch: 0x4c00204 79692292 (0: nrow: 2, level: 5)
branch: 0x4c0022a 79692330 (-1: nrow: 2, level: 4)
branch: 0x4c00212 79692306 (-1: nrow: 2, level: 3)
branch: 0x4c00206 79692294 (-1: nrow: 2, level: 2)
branch: 0x4c0020c 79692300 (-1: nrow: 2, level: 1)
leaf: 0x4c00209 79692297 (-1: row:1.1 avs:370)
leaf: 0x4c00249 79692361 (0: row:1.1 avs:370)
branch: 0x4c00248 79692360 (0: nrow: 2, level: 1)
leaf: 0x4c00247 79692359 (-1: row:1.1 avs:370)
leaf: 0x4c00246 79692358 (0: row:1.1 avs:370)
branch: 0x4c00244 79692356 (0: nrow: 2, level: 2)
branch: 0x4c00243 79692355 (-1: nrow: 2, level: 1)
leaf: 0x4c00242 79692354 (-1: row:1.1 avs:370)
leaf: 0x4c0024f 79692367 (0: row:1.1 avs:370)
branch: 0x4c0024e 79692366 (0: nrow: 2, level: 1)
leaf: 0x4c00235 79692341 (-1: row:1.1 avs:370)
leaf: 0x4c00239 79692345 (0: row:1.1 avs:370)
branch: 0x4c00238 79692344 (0: nrow: 2, level: 3)
branch: 0x4c00237 79692343 (-1: nrow: 2, level: 2)
branch: 0x4c00236 79692342 (-1: nrow: 2, level: 1)
leaf: 0x4c00234 79692340 (-1: row:1.1 avs:370)
leaf: 0x4c00233 79692339 (0: row:1.1 avs:370)
branch: 0x4c00232 79692338 (0: nrow: 2, level: 1)
leaf: 0x4c00231 79692337 (-1: row:1.1 avs:370)
leaf: 0x4c00230 79692336 (0: row:1.1 avs:370)
branch: 0x4c0023f 79692351 (0: nrow: 2, level: 2)
branch: 0x4c0023e 79692350 (-1: nrow: 2, level: 1)
leaf: 0x4c0023d 79692349 (-1: row:1.1 avs:370)
leaf: 0x4c0023c 79692348 (0: row:1.1 avs:370)
branch: 0x4c00225 79692325 (0: nrow: 2, level: 1)
leaf: 0x4c0022d 79692333 (-1: row:1.1 avs:370)
leaf: 0x4c0022c 79692332 (0: row:1.1 avs:370)
branch: 0x4c0022b 79692331 (0: nrow: 2, level: 4)
branch: 0x4c00229 79692329 (-1: nrow: 2, level: 3)
branch: 0x4c00228 79692328 (-1: nrow: 2, level: 2)
branch: 0x4c00227 79692327 (-1: nrow: 2, level: 1)
leaf: 0x4c00226 79692326 (-1: row:1.1 avs:370)
leaf: 0x4c00224 79692324 (0: row:1.1 avs:370)
branch: 0x4c00223 79692323 (0: nrow: 2, level: 1)
leaf: 0x4c00222 79692322 (-1: row:1.1 avs:370)
leaf: 0x4c0022f 79692335 (0: row:1.1 avs:370)
branch: 0x4c0022e 79692334 (0: nrow: 2, level: 2)
branch: 0x4c00215 79692309 (-1: nrow: 2, level: 1)
leaf: 0x4c00219 79692313 (-1: row:1.1 avs:370)
leaf: 0x4c00218 79692312 (0: row:1.1 avs:370)
branch: 0x4c00217 79692311 (0: nrow: 2, level: 1)
leaf: 0x4c00216 79692310 (-1: row:1.1 avs:370)
leaf: 0x4c00214 79692308 (0: row:1.1 avs:370)
branch: 0x4c00213 79692307 (0: nrow: 2, level: 3)
branch: 0x4c00211 79692305 (-1: nrow: 2, level: 2)
branch: 0x4c00210 79692304 (-1: nrow: 2, level: 1)
leaf: 0x4c0021f 79692319 (-1: row:1.1 avs:370)
leaf: 0x4c0021e 79692318 (0: row:1.1 avs:370)
branch: 0x4c0021d 79692317 (0: nrow: 2, level: 1)
leaf: 0x4c0021c 79692316 (-1: row:1.1 avs:370)
leaf: 0x4c00208 79692296 (0: row:1.1 avs:370)
branch: 0x4c00207 79692295 (0: nrow: 2, level: 2)
branch: 0x4c00205 79692293 (-1: nrow: 2, level: 1)
leaf: 0x4c0020f 79692303 (-1: row:1.1 avs:370)
leaf: 0x4c0020e 79692302 (0: row:1.1 avs:370)
branch: 0x4c0020d 79692301 (0: nrow: 2, level: 1)
leaf: 0x4c0020b 79692299 (-1: row:1.1 avs:370)
leaf: 0x4c0020a 79692298 (0: row:1.1 avs:370)
----- end tree dump
As you can see, every branch block (which includes the root block) holds exactly 2 entries, and every leaf block holds just one row.
Once you’ve tested the code with a couple of small starting values you might want to skip the validate index and treedump steps – they might take quite a long time (especially since the treedump will write a trace file of 16M+ lines). The other thing to watch out for is that the script will generate something like 200GB of redo and 72GB of undo – so you might want to remove the comment marker from the commit in my PL/SQL loop and check your auto undo settings and auto extend settings on the undo files.
I should also point out that I’ve used 1469 for the substrb(lpad()) because that’s the largest string I can use for the index definition – but the same index pattern of use (i.e. one row per leaf block, two children per branch block) will appear if you reduce this (for a 2KB block size, using ASSM) to 915 bytes. (And this is instructive because if you use the smaller size and then eliminate the “reverse” in the loop the index branch blocks pack more efficiently and the blevel is smaller (even though the index leaf-block count is unchanged.)
In the good old days (9.2.0.4 and earlier) the maximum allowed height for a B-tree index was 24 (blevel = 23). I haven’t got a spare 32GB in any of my virtual machines at present so I haven’t checked to see if this is still true; but if you do run a test with m_blevel – 23, the final line of the script (inserting a zero row) should result in an ORA-00600 error with first parameter 6051 if the limit hasn’t changed.
Footnote
Here’s a link to the original document (how_high.doc) that I wrote and published in dbazine in 2005, now available (very slowly) on the Wayback Machine, and made redundant by a change in the branch block split algorithm in 10g.
Footnote 2
At some point I did discover a bug note on MOS (Metalink, in those days) that reported a performance problem due to an index with a very high blevel (I have a vague memory of it reaching double digits, but not very getting close to the limit – possibly something like 15). So there is a serious point to this post – bad column definitions with bad index definitions (and a silly block size) and a bit of bad luck with the pattern of data insertion can lead to an unexpected problems.