Oracle Scratchpad

July 28, 2009

Index Explosion

Filed under: Index Explosion,Indexing,Infrastructure,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 7:28 pm BST Jul 28,2009

In Index Quiz 1 and Index Quiz 2 I demonstrated a couple of details of how the ITL (interested transaction list) behaves in indexes. In this note I’m going to explain how these details can result in a nasty waste of space in indexes.

The two points I made in the previous posts were:

  1. at high levels of concurrency you can “lose” a lot of space in an index leaf block to the ITL
  2. when a leaf block splits the ITL of the newer block is a copy of the ITL from the older block

One consequence of point 2 is that you need only have one brief burst of activity that makes one or two ITLs grow to an uncharacteristic size, after which every leaf block that splits off in that portion of the index from then onwards will have a large ITL even if it doesn’t need it.

Combine this with something that I think is probably a bug – but which may actually have been deliberately designed in as a feature (with an unexpected side effect) for concurrency reasons – and surprises appear.

In Index Quiz 1 I showed you an index with the following statistics, and pointed out that it had lost about 50% of it’s available space because its ITLs had grown to the maximum (for an 8KB block) of 169 entries – and pointed out that this can be a side effect of very high concurrency. In fact, when I ran the test case, I was running just eight concurrent processes on a machine with two CPUs.

Here’s some code if you want to try the test on your own systems:


rem
rem     Script:         order_create.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2009
rem

create sequence s1   cache 250000;

create table orders (
        date_placed	date	not null,
        order_id	number(8),
        time_stamp	timestamp,
        id_owner	number(2),
        padding		varchar2(168)
)
tablespace test_8k_assm
initrans 8
;

create index ord_placed on orders(date_placed)
tablespace test_8k
initrans 9
;

The tablespace test_8k_assm is using 8KB blocks, 1MB uniform extent sizes, and automatic segment space management (ASSM). The tablespace test_8k is similar but uses freelist (manual) segment space management. I’ve used initrans 8 on the table and initrans 9 on the index because I tested eight concurrent processes (the setting on the table was probably redundant given I was using ASSM and used less than 16 concurrent sessions in the test).

After creating the sequence, table, and index , you can run the following PL/SQL from as many different sessions as you like (adjusting the limit on the loop if necessary):


rem
rem     Script:         order_entry.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2009
rem

declare
        m_ord	number(8)	:= 0;
begin
        while m_ord <= 100000 loop

                insert into orders (
                        date_placed, order_id, padding
                )
                values (
                        sysdate - 1000 + s1.nextval/1000,
                        s1.nextval,
                        rpad('x',168)
                )
                returning order_id into m_ord;

--              dbms_lock.sleep(0.01);

                commit write immediate wait;  -- 10g feature
        end loop;
end;
/

The code is designed to model an order entry system with 1,000 orders per day over the last 1,000 days. But the rate of data entry is, of course, accelerated to extremes.

If you run just one copy of the PL/SQL and validate the index afterwards you should get results like this (running 10.2.0.3):

HEIGHT                        : 2
BLOCKS                        : 256
LF_ROWS                       : 100001
LF_BLKS                       : 238
LF_ROWS_LEN                   : 1900019
LF_BLK_LEN                    : 8000
BR_ROWS                       : 237
BR_BLKS                       : 1
BR_ROWS_LEN                   : 3312
BR_BLK_LEN                    : 8032
BTREE_SPACE                   : 1912032
USED_SPACE                    : 1903331
PCT_USED                      : 100

The index space is perfectly (100%) used – it’s the natural consequence of the sequential nature of the data inserts; and although we specified (and Oracle “ignored” initrans 9) we can see that the lf_blk_len is 8,000 which means every leaf block still has the minimum two entries in its ITL.

If you run two copies of the script (inserting 50,000 rows each), you may see results more like this:

HEIGHT                        : 2
BLOCKS                        : 384
LF_ROWS                       : 100002
LF_BLKS                       : 255
LF_ROWS_LEN                   : 1900038
LF_BLK_LEN                    : 7976
BR_ROWS                       : 254
BR_BLKS                       : 1
BR_ROWS_LEN                   : 3544
BR_BLK_LEN                    : 8032
BTREE_SPACE                   : 2041912
USED_SPACE                    : 1903582
PCT_USED                      : 94

There’s just a few percent “lost” space, and the lf_blk_len is showing that some blocks have acquired a third entry in the ITL – hence the 24 byte drop from 8,000 to 7,976.

If you run something like my “index efficiency” code to check how well each block in the index is used, you may find something like this:

ROWS_PER_BLOCK     BLOCKS TOTAL_ROWS RUNNING_TOTAL
-------------- ---------- ---------- -------------
           206         12       2472            12
           208          1        208            13
           209          1        209            14
           211          2        422            16
           212          1        212            17
           214         15       3210            32
           260          1        260            33
           410          1        410            34
           419        221      92599           255
               ---------- ----------
sum                   255     100002

Most blocks are full (ca. 420 entries), after a “90/10 leaf node split”, but a few have done a 50/50 split (ca. 210 entries). The 50/50 splits are what you get on a multi-user system with multiple CPUs. Occasionally a session will get its sequence number then get pre-empted by the operating system, allowing another session to get and insert a higher value: with a little bad luck this will happen just as a leaf block fills.

Important Note: The laptop that I used to generate these results has 2 CPUs – if you try running this test on a machine with a single CPU then the concurrent test may give dramatically different results.

Now repeat with more copies of the PL/SQL (scaling down the number of rows per session as you increase the number of sessions). As the number of concurrent sessions grows the space requirement will climb, and the lf_blk_len will eventually drop to 3,992 and the number of leaf blocks (lf_blks) will probably be two or three times as large as you got in serial execution (In the worst case it could be four times as large, but this would only happen if every block had lost half its space to ITLs and did a 50/50 split).

If you have N CPUs (N > 1) then I would expect to see the problem starting to appear somewhere between N+1 and 2N concurrent sessions – but if you want to short-cut the testing just go for 4N sessions and see what happens.  If the CPU is already heavily loaded before you start then the problem will appear with fewer concurrent sessions (I got some dramatic results by running with just a couple of sessions whilst doing a full machine virus scan at maximum speed).

The test isn’t deterministic – the results will depend on things like the version of Oracle, size of the redo log buffer, the size of the redo log files, the speed of the I/O subsystem, the number of CPUs, the operating system, the process ids of the sessions you happen to connect to (that’s a side effect of ASSM) , and any workload that happens to be going on at the time. But the bottom line is this – if you’ve got hot spots in indexes that are subject to a lot of concurrent DML then you can find yourself wasting space unnecessarily in that area of the index.

In the next installment of the series I’ll make some comments about what I think is happening, and discuss how to address the issue. But before I finish, here’s how odd you can make index_stats look if you engineer a bizarre accident – it must be a really efficient index, I’ve used 174% of the available space !

HEIGHT                        : 2
BLOCKS                        : 384
NAME                          : ORD_PLACED
LF_ROWS                       : 110008
LF_BLKS                       : 301
LF_ROWS_LEN                   : 2090152
LF_BLK_LEN                    : 3992
BR_ROWS                       : 300
BR_BLKS                       : 1
BR_ROWS_LEN                   : 4191
BR_BLK_LEN                    : 8032
DISTINCT_KEYS                 : 110008
MOST_REPEATED_KEY             : 1
BTREE_SPACE                   : 1209624
USED_SPACE                    : 2094343
PCT_USED                      : 174
ROWS_PER_KEY                  : 1
BLKS_GETS_PER_ACCESS          : 3

Update Sept 2018

I’ve just re-run the test code on 18.3.0.0 on a (virtual) machine with 4 CPUs, running 4 concurrent insert statements.

By the time I had inserted 6,000 orders the ITL count (itc value) on the last few index leaf blocks had jumped to 39, so there still seems to be some room for improvement in this area. I’ve tried increasing the degree of concurrency and number duration of the test – but so far I’ve not gone above 39 ITL entries (a couple of other people have run this test on 12c over the years and reported 41 ITL entries) – so perhaps there’s a fairly hard limit that can be reached quite easily (using about 1KB from the block). Somewhere I’m sure I had a note (or found a note on the Internet) about how the fix probably worked, but I can’t find it at present. There is a parameter “_index_split_chk_cancel” (introduced in 12.2, default value 5) which looks as if it might be relevant, but it doesn’t seem to have an effect on the problem and might be something to do with how hard Oracle tries to re-use empty index blocks before allocating a newly formatted one.

Update Oct 2021

The effect is still present up to 21.3.0.0 – but I did extend the testing by one extra detail. I varied the setting of initrans on the index and found that the limit on the number of ITL slots allocated seemed to be related to the setting of initrans – the maximum number of ITL slots reached kept reaching something close to 30 + initrans.

[My catalogue page for the “Index Explosion” series ]

13 Comments »

  1. If you want a consistent example of the aforementioned worst case scenario: “In the worst case it could be four times as large, but this would only happen if every block had lost half its space to ITLs and did a 50/50 split”; all you need to do is to use RAC and run two sessions on each node (with at least one CPU per node) and be sure to specify the ‘order’ qualifier in the create sequence statement (otherwise some of the session will short-circuit their execution).

    Comment by Jeroen — July 29, 2009 @ 7:43 am BST Jul 29,2009 | Reply

    • Jeroen,

      That’s an interesting thought – have you actually tried it, or seen it ?

      I have to say that if you have set the ORDER option on a sequence it’s also NOCACHE – which means you would see lots of waits for the “SQ” enqueue, and lots of waits for “gc current block 2-way” or “gc current block busy” – not to mention “log file sync” as the block from seq$ bounced back and forth.

      Having said that; a sequence based index with a large sequence cache size on RAC will always be doing 50/50 splits on all nodes except one – so going highly concurrent in that case could give you 25% effective utilisation over most of the index.

      Update: The comment about ORDER implying NOCACHE is wrong. The impact of order is that there is a single cached sequence value shared by all instances, and it gets passed from instance to instance through a global SV enqueue (resulting in waits for DFS Lock Handle). The sequence is still CACHE (unless explicitly set to NOCACHE). The higher incidence of “gc current block busy” etc. is, at most, a side effect of the rate of sequence use if the CACHE size has not been set large enough (and it’s common to see the CACHE size left to its default of 20, even for very high use sequences). The SQ enqueue is taken each time the seq$ entry is updated and the block from the seq$ table is, of course, subject to cache fusion etc.

      Comment by Jonathan Lewis — August 6, 2009 @ 3:50 pm BST Aug 6,2009 | Reply

      • My observations are based on an experiment with Oracle 11.1.0.7 running on a Linux operating system. I don’t recall seeing a lot of SQ-type waits in the AWR reports I generated, but do recall seeing many waits for the “DFS lock handle” event. I did some google-ing on this and located an article on the Pythian web site that explains in detail the synchronisation of sequences between nodes in an RAC system. The URL is http://www.pythian.com/news/383/sequences-in-oracle-10g-rac.

        Comment by Jeroen — August 12, 2009 @ 12:57 am BST Aug 12,2009 | Reply

  2. PCT_USED : 174
    But before I finish, here’s how odd you can make index_stats look if you engineer a bizarre accident – it must be a really efficient index, I’ve used 174% of the available space !

    Is that a bug to show 174%?

    Comment by Daniel — August 3, 2009 @ 12:43 pm BST Aug 3,2009 | Reply

    • Daniel,
      “Is that a bug to show 174%”.

      Since it’s not possible to use 174% of the space available in an index, I think the only way to describe it is as a bug.
      If you want to be charitable, you could call it an approximation that loses accuracy as the variation in ITL usage increases.

      Comment by Jonathan Lewis — August 6, 2009 @ 4:21 pm BST Aug 6,2009 | Reply

  3. A little update on this topic. I’ve just been sent an email pointing me to Metalink Bug 8767925.

    If you do a search on the bug number you will be able to find the abstract for it that says: “ADD MORE LOGIC FOR RESERVING ITL SPACE FOR INDEX LEAF BLOCKS.”

    The interesting thing about the note is that the “Modified date” (when I first saw it) was 05-AUG-2009 – just one week after I posted the test case above.

    I can’t help wondering whether this is a coincidence or whether someone at Oracle Support (or one of their customers) has been reading my blog and just discovered the cause of a problem that’s been bugging them.

    Comment by Jonathan Lewis — August 13, 2009 @ 11:40 am BST Aug 13,2009 | Reply

  4. […] comparison purposes, here’s a section of the index produced during one run of my “index explosion” […]

    Pingback by treedump « Oracle Scratchpad — August 17, 2009 @ 5:32 pm BST Aug 17,2009 | Reply

  5. A little update:

    This problem is now visible as a bug on Metalink (MOS). Bug number 9865890, raised by a client of mine because they didn’t like having indexes with 2.3 million blocks when they should have been closer to 1.2 million blocks.

    The test case in the bug is from the code above, but run on an AIX box with 16 CPUs.

    Comment by Jonathan Lewis — August 23, 2010 @ 5:03 pm BST Aug 23,2010 | Reply

    • Latest update:

      This bug has been linked to base bug 8767925 (see comment #4 above) which is now reported as “fixed in version 12.1”; so the client that raised bug 9865890 has asked for a backport to 11.1.0.7.

      Comment by Jonathan Lewis — November 22, 2010 @ 12:13 pm GMT Nov 22,2010 | Reply

      • Jonathan, As per Oracle support note,”Bug 8767925 – ITL wasting a lot of space in indexes with high concurrency (Doc ID 8767925.8)”, The fix for 8767925 is first included in 11.2.0.4(server patch set) and in 12.1.0.1(base release). I have tested in Oracle RDBMS version 11.2.0.4.1 (Linux x86_64 bit,64 CPUs) with 5 concurrent users. The result shows that they didn’t fix it.

        HEIGHT                        : 2
        BLOCKS                        : 640
        NAME                          : ORD_PLACED
        PARTITION_NAME                :
        LF_ROWS                       : 100005
        LF_BLKS                       : 574
        LF_ROWS_LEN                   : 1900095
        LF_BLK_LEN                    : 3988
        BR_ROWS                       : 573
        BR_BLKS                       : 1
        BR_ROWS_LEN                   : 7974
        BR_BLK_LEN                    : 8028
        DEL_LF_ROWS                   : 0
        DEL_LF_ROWS_LEN               : 0
        DISTINCT_KEYS                 : 100005
        MOST_REPEATED_KEY             : 1
        BTREE_SPACE                   : 2297140
        USED_SPACE                    : 1908069
        PCT_USED                      : 84
        ROWS_PER_KEY                  : 1
        BLKS_GETS_PER_ACCESS          : 3
        PRE_ROWS                      : 0
        PRE_ROWS_LEN                  : 0
        OPT_CMPR_COUNT                : 0
        OPT_CMPR_PCTSAVE              : 0
        -----------------
        
        
        
        ROWS_PER_BLOCK     BLOCKS     ROW_CT CUMULATIVE_BLOCKS
        -------------- ---------- ---------- -----------------
                   101          5        505                 5
                   102          4        408                 9
                   103          3        309                12
                   104          3        312                15
                   107          1        107                16
                   109         18       1962                34
                   112         31       3472                65
                   113         31       3503                96
                   114         21       2394               117
                   115         19       2185               136
                   116          6        696               142
                   117          5        585               147
                   118          6        708               153
                   119         11       1309               164
                   120          4        480               168
                   121        115      13915               283
                   163          1        163               284
                   207          1        207               285
                   208          1        208               286
                   209         49      10241               335
                   211          1        211               336
                   214          1        214               337
                   231          1        231               338
                   232        231      53592               569
                   417          2        834               571
                   418          3       1254               574
                       ---------- ----------
        sum                   574     100005
        
        
        Block header dump:  0x01000143
         Object id on Block? Y
         seg/obj: 0x16a01  csc: 0x00.666c43f9  itc: 151  flg: E  typ: 2 - INDEX
             brn: 0  bdba: 0x1000140 ver: 0x01 opc: 0
             inc: 0  exflg: 0
        
        

        Number of ITL entries in this leaf block is 151.

        Comment by dbabibleantony — June 27, 2014 @ 7:08 pm BST Jun 27,2014 | Reply

        • Thanks for the details. Prompted by your report I’ve just run the test on 12c (12.1.0.1) with 2 CPUs and 4 processes and got the following results from index_stats:

          
          HEIGHT                        : 2
          BLOCKS                        : 640
          NAME                          : ORD_PLACED
          LF_ROWS                       : 100009
          LF_BLKS                       : 517
          LF_ROWS_LEN                   : 1900171
          LF_BLK_LEN                    : 7060
          BR_ROWS                       : 516
          BR_BLKS                       : 1
          BR_ROWS_LEN                   : 6894
          BR_BLK_LEN                    : 8028
          DISTINCT_KEYS                 : 100009
          MOST_REPEATED_KEY             : 1
          BTREE_SPACE                   : 3658048
          USED_SPACE                    : 1907065
          PCT_USED                      : 53
          ROWS_PER_KEY                  : 1
          BLKS_GETS_PER_ACCESS          : 3
          
          

          The 7060 for lf_blk_len is enough to show that there’s still a problem – the value shouldn’t be less than approximately 7,900 unless we have blocks with far more than 4 or 5 ITL slots allocated.

          And here’s a fragment of a blockdump to show the inappropriately large interested transaction count.

          ]
          
           seg/obj: 0x1845d  csc: 0x00.69af4f  itc: 41  flg: -  typ: 2 - INDEX
               fsl: 0  fnx: 0x0 ver: 0x01
          
          

          Unfortunately you can’t give feedback on bug notes otherwise I’d let Oracle support know the bug is still present.

          On the other hand, I’ve just noticed that bug 8767925 is showing an update at 4:00 am today – so maybe you’ve passed on your finding, or someone at Oracle Support picked up your comment here.

          Comment by Jonathan Lewis — June 28, 2014 @ 5:45 pm BST Jun 28,2014

  6. For those who can read Russian – here’s an example of what an index leaf block looks like when the going gets really tough: http://sql.ru/forum/actualthread.aspx?tid=754211&hl=itl

    Comment by Jonathan Lewis — December 6, 2010 @ 10:03 pm GMT Dec 6,2010 | Reply


RSS feed for comments on this post.

Leave a reply to Daniel Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.