Oracle Scratchpad

July 16, 2018

Direct IOT

Filed under: 12c,Infrastructure,IOT,Oracle — Jonathan Lewis @ 1:02 pm BST Jul 16,2018

A recent (automatic?) tweet from Connor McDonald highlighted an article he’d written a couple of years ago about an enhancement introduced in 12c that allowed for direct path loads to index organized tables (IOTs). The article included a demonstration that seemed to suggest that direct path loads to IOTs were of no benefit and ended with the comment (which could be applied to any Oracle feature):

“Direct mode insert is a very cool facility but it doesn’t mean that it’s going to be the best option in every situation.”

Clearly it’s necessary to pose the question – “so when would direct mode insert be a good option for IOTs?” – because if it’s never a good option you have to wonder why it has been implemented. This naturally leads on to thinking about which tests have not yet been done – what aspects of IOTs did Connor not get round to examining in his article. (That’s a standard principle of trouble-shooting or testing or investigation: when someone shows you a test case (or when you think you’ve finished testing) one thing you should do before taking the results as gospel is to ask yourself what possible scenarios have not been covered by the test.)

So if you think “IOT” what are the obvious tests once you’ve got past the initial step of loading the IOT and seeing what happens?

  • First, I think, would be “What if the IOT weren’t empty before the test started”
  • Second would be “IOTs can have overflow segments, what impact might one have?”
  • Third would be “Do secondary indexes introduce any side effects?”
  • Finally “What happens with bitmap indexes and the requirement for a mapping table?”

(Then, of course, you can worry about mixing all the different possibilities together – but for the purposes of this note I’m just going to play with two simple examples: non-empty starting tables, and overflow segments.)

Here’s some code to define a suitable table:

rem
rem     Script:         122_direct_iot.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested 
rem             12.2.0.1

create table t2 
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        3 * rownum                      id,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
order by
        dbms_random.value
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );
end;
/

create table t1(
        id,
        v1,
        padding,
        constraint t1_pk primary key(id)
)
organization index
-- including v1
-- overflow
nologging
as
select * from t2
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

I’ve created a heap table t2 that holds 100,000 rows with an id column that arrives randomly ordered; then I’ve used this table as a source to create an IOT (called t1), with the option to have an overflow segment that contains just the 50 character padding column.

I’ve used 3 * rownum to define the id column for t2 so that when I insert another copy of t2 into t1 I can add 1 (or 2) to the t2.id and interleave the new data with the old data. (That’s another thought about IOT testing – are you loading your data in a pre-existing order that suits the special nature of IOTs or is it arriving in a way that’s badly out of order with respect to the natural ordering of the IOT; and does your data go in above the current high value, or is it spread across the whole range, or do you have a partial overlap with the top end of the range and then run on above it.)

Having created the starting data set, here’s the test:


execute snap_my_stats.start_snap
execute snap_events.start_snap

insert 
        /*  append */
into t1
select
        id + 1, v1, padding
from
        t2
;


execute snap_events.end_snap
execute snap_my_stats.end_snap

All I’m doing is using a couple of my snapshot packages to check the work done and time spent while inserting 100,000 interleaved rows – which are supplied out of order – into the existing table. In the text above the “append” is a comment, not a hint, so I’ll be running the test case a total of 4 times:

  • no overflow, with and without the hint
  • with the overflow, with and without the hint

(Then, of course, I could run the test without the overflow but an index (i.e. testing the effect of secondary indexes) on v1).

Here are some summary figures from the tests – first from the test without an overflow segment:

                                      Unhinted       With Append
                                  ============      ============
CPU used when call started                 153               102
CPU used by this session                   153               102
DB time                                    166               139

redo entries                           130,603            42,209
redo size                           78,315,064        65,055,376

sorts (rows)                                30           100,031

You’ll notice that with the /*+ append */ hint in place there’s a noticeable reduction in redo entries and CPU time, but this has been achieved at a cost of sorting the incoming data into order. The reduction in redo (entries and size) is due to an “array insert” effect that Oracle can take advantage of with the delayed index maintenance that takes place when the append hint is legal (See the section labelled Option 4 in this note). So even with an IOT with no overflow there’s a potential benefit to gain from direct path loading that depends on how much the new data overlaps the old data, and there’s a penalty that depends on the amount of sorting you’d have to do.

What happens in my case when I move the big padding column out to an overflow segment – here are the equivalent results:


Headline figures                      Unhinted       With Append
================                  ============      ============
CPU used when call started                 158                52
CPU used by this session                   158                52
DB time                                    163                94
redo entries                           116,669            16,690
redo size                           51,392,748        26,741,868
sorts (memory)                               4                 5
sorts (rows)                                33           100,032

Interestingly, comparing the new unhinted results with the previous unhinted results, there’s little difference in the CPU usage between having the padding column in the “TOP” section of the IOT compared to having it in the overflow segment, though there is a significant reduction in redo (the index entries are still going all over the place one by one and causing leaf block splits, but the overflow blocks are being pinned and packed much more efficiently). The difference between having the append hint or not, though, is damatic. We drop to one third of the CPU time (despite still having 100,000 rows to sort) and half the redo. One of the side effects of the overflow, of course, is that the things being sorted are much shorter (only the id and v1 columns that go into the TOP section, and not the whole IOT row).

So, if you already have an overflow segment that caters for a significant percentage of the row it looks as if the benefit you could get from using the /*+ append */ hint could far outweigh the penalty you have to pay in sorting. Of course, an IOT with a large overflow doesn’t look much different from a heap table with index – so perhaps that result isn’t very surprising.

I’ll close by re-iterating Connor’s closing comment:

Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.

Before you dive in and embrace it, or ruthlessly push it to one side, make sure you do some testing that reflects the situations you have to handle.

 

1 Comment »

  1. […] IOTs and direct path loads – /*+ append */ (July 2018) […]

    Pingback by IOTs | Oracle Scratchpad — February 11, 2021 @ 9:00 pm GMT Feb 11,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.