Oracle Scratchpad

September 29, 2016

IOT limitation

Filed under: Execution plans,Infrastructure,IOT,Oracle — Jonathan Lewis @ 10:17 am BST Sep 29,2016

In the right circumstances Index Organized Tables (IOTs) give us tremendous benefits – provided you use them in the ideal fashion. Like so many features in Oracle, though, you often have to compromise between the benefit you really need and the cost of the side effect that a feature produces.

The fundamental design targets for an IOT are that you have short rows and only want to access them through index range scans of the primary key. The basic price you pay for optimised access is the extra work you have to do as you insert the data. Anything you do outside the two specific targets is likely to lead to increased costs of using the IOT – and there’s one particular threat that I’ve mentioned twice in the past (here and here). I want to mention it one more time with a focus on client code and reporting.

create table iot1 (
        id1     number(7.0),
        id2     number(7.0),
        v1      varchar2(10),
        v2      varchar2(10),
        padding varchar2(500),
        constraint iot1_pk primary key(id1, id2)
organization index
including id2

insert into iot1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
        mod(rownum,311)                 id1,
        mod(rownum,337)                 id2,
        to_char(mod(rownum,20))         v1,
        to_char(trunc(rownum/100))      v2,
        rpad('x',500,'x')               padding
        generator       v1,
        generator       v2
        rownum <= 1e5 --> comment to bypass wordpress format issue


        dbms_stats.gather_table_stats( ownname => user,
                tabname          => 'IOT1'
                method_opt       => 'for all columns size 1'

alter system flush buffer_cache;

select table_name, blocks from user_tables where table_name = 'IOT1' or table_name like 'SYS_IOT_OVER%';
select index_name, leaf_blocks from user_indexes where table_name = 'IOT1';

set autotrace traceonly
select max(v2) from iot1;
set autotrace off

I’ve created an index organized table with an overflow. The table definition places all columns after the id2 column into the overflow segment. After collecting stats I’ve then queried the table with a query that, for a heap table, would produce a tablescan as the execution plan. But there is no “table”, there is only an index for an IOT. Here’s the output I get (results from 11g and 12c are very similar):

TABLE_NAME               BLOCKS
-------------------- ----------
SYS_IOT_OVER_151543        8074

-------------------- -----------
IOT1_PK                      504

| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT      |         |     1 |     4 | 99793   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE       |         |     1 |     4 |            |          |
|   2 |   INDEX FAST FULL SCAN| IOT1_PK |   100K|   390K| 99793   (1)| 00:00:04 |

     100376  consistent gets
       8052  physical reads

The index segment has 504 leaf blocks, the overflow segment has 8,074 used blocks below the high water mark. The plan claims an index fast full scan of the index segment – but the physical reads statistic looks more like a “table” scan of the overflow segment. What’s actually happening ?

The 100,000+ consistent reads should tell you what’s happening – we really are doing an index fast full scan on the index segment, and for each index entry we go to the overflow segment to find the v2 value. Oracle doesn’t have a mechanism for doing a “tablescan” of just the overflow segment – even though the definition of the IOT ought (apparently) to be telling Oracle exactly which columns are in the overflow.

In my particular test Oracle reported a significant number of “db file scattered read” waits against the overflow segment, but these were for “prefetch warmup”; in a normal system with a buffer cache full of other data this wouldn’t have happened. The other interesting statistic that showed up was “table fetch continued row” – which was (close to) 100,000, again highlighting that we weren’t doing a normal full tablescan.

In terms of normal query processing this anomaly of attempted “tablescans” being index driven probably isn’t an issue but, as I pointed out in one of my earlier posts on the topic, when Oracle gathers stats on the “table” it will do a “full tablescan”. If you have a very large table with an overflow segment it could be a very slow process – especially if you’ve engineered the IOT for the right reason, viz: the data arrives in the wrong order relative to the order you want to query it, and you’ve kept the rows in the IOT_TOP short by dumping the rarely used data in the overflow. With this in mind you might want to make sure that you write a bit of special code that gathers stats only on the columns you know to be in the IOT_TOP, and creates representative numbers for the other columns, then locks the stats until the next time you want to refresh them.



  1. The idea of a “tablescan” of the overflow segment might work in special cases, but in the general case the column where the row is split between index and overflow row pieces can vary from row-to-row, and hence Oracle cannot rely on the including clause to know which columns reside in the overflow segment. Here are some examples:

    1. Including column can be changed via alter table, and this will affect rows subsequently inserted into the IOT, but does not affect existing rows in the table. e.g. if we initially create the table as:

      create table iot(i int primary key, j int, a varchar2(4000), b varchar2(4000))
        organization index
        overflow including a;
      insert into iot values(1, 2, 'a', 'b');

    we would have column B in the overflow segment. Now if we alter the including column and insert a row:

      alter table iot including j; -- metadata-only operation, existing rows are not reformatted
      insert into iot values(3, 4, 'x', 'y');

    then the new row would have columns A and B in the overflow. Since the column where we split between index and overflow varies from row to row, Oracle would not know how to find the right column in the overflow row piece if it did a “tablescan” of the overflow segment.

    2. Similarly, the size of the row piece in the index is limited based on data block size (similar to an ordinary B-Tree index in Oracle), and this can cause the column where we split between index and overflow to vary from row to row even if including is specified and static. e.g.

      create table iot(i int primary key, a varchar2(4000), b varchar2(4000), c varchar2(4000))
        organization index
        overflow including b;
      insert into iot values(1, 'a', 'b', 'c');
      insert into iot values(2, RPAD('x', 3000), RPAD('y', 3000), RPAD('z', 3000));

    For i = 1 it splits at column B per the including specification, so column C will be the first column in the overflow row piece.
    For i = 2 a row piece containing is too big for the index, so it splits at column A and thus columns B and C will be stored in the overflow row piece.

    Comment by George Eadon — October 18, 2016 @ 10:22 pm BST Oct 18,2016 | Reply

    • George,

      Thanks for doing such a nice presentation of the issues that make a strategy that looks obvious turn out to be much more complex than first thoughts suggest. (I have to say, I had forgotten that you could redefine what should be included after data had been added)

      Comment by Jonathan Lewis — October 21, 2016 @ 8:06 am BST Oct 21,2016 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by