A recent (automatic?) tweet from Connor McDonald highlighted an article he’d written a couple of years ago about an enhancement introduced in 12c that allowed for direct path loads to index organized tables (IOTs). The article included a demonstration that seemed to suggest that direct path loads to IOTs were of no benefit and ended with the comment (which could be applied to any Oracle feature):
“Direct mode insert is a very cool facility but it doesn’t mean that it’s going to be the best option in every situation.”
Clearly it’s necessary to pose the question – “so when would direct mode insert be a good option for IOTs?” – because if it’s never a good option you have to wonder why it has been implemented. This naturally leads on to thinking about which tests have not yet been done – what aspects of IOTs did Connor not get round to examining in his article. (That’s a standard principle of trouble-shooting or testing or investigation: when someone shows you a test case (or when you think you’ve finished testing) one thing you should do before taking the results as gospel is to ask yourself what possible scenarios have not been covered by the test.)
So if you think “IOT” what are the obvious tests once you’ve got past the initial step of loading the IOT and seeing what happens?
- First, I think, would be “What if the IOT weren’t empty before the test started”
- Second would be “IOTs can have overflow segments, what impact might one have?”
- Third would be “Do secondary indexes introduce any side effects?”
- Finally “What happens with bitmap indexes and the requirement for a mapping table?”
(Then, of course, you can worry about mixing all the different possibilities together – but for the purposes of this note I’m just going to play with two simple examples: non-empty starting tables, and overflow segments.)
Here’s some code to define a suitable table:
rem rem Script: 122_direct_iot.sql rem Author: Jonathan Lewis rem Dated: Jun 2018 rem rem Last tested rem 12.2.0.1 create table t2 as with generator as ( select rownum id from dual connect by level <= 1e4 -- > comment to avoid WordPress format issue ) select 3 * rownum id, lpad(rownum,10,'0') v1, lpad('x',50,'x') padding from generator v1, generator v2 where rownum <= 1e5 -- > comment to avoid WordPress format issue order by dbms_random.value ; begin dbms_stats.gather_table_stats( ownname => null, tabname => 'T2', method_opt => 'for all columns size 1' ); end; / create table t1( id, v1, padding, constraint t1_pk primary key(id) ) organization index -- including v1 -- overflow nologging as select * from t2 ; begin dbms_stats.gather_table_stats( ownname => null, tabname => 'T1', method_opt => 'for all columns size 1' ); end; /
I’ve created a heap table t2 that holds 100,000 rows with an id column that arrives randomly ordered; then I’ve used this table as a source to create an IOT (called t1), with the option to have an overflow segment that contains just the 50 character padding column.
I’ve used 3 * rownum to define the id column for t2 so that when I insert another copy of t2 into t1 I can add 1 (or 2) to the t2.id and interleave the new data with the old data. (That’s another thought about IOT testing – are you loading your data in a pre-existing order that suits the special nature of IOTs or is it arriving in a way that’s badly out of order with respect to the natural ordering of the IOT; and does your data go in above the current high value, or is it spread across the whole range, or do you have a partial overlap with the top end of the range and then run on above it.)
Having created the starting data set, here’s the test:
execute snap_my_stats.start_snap execute snap_events.start_snap insert /* append */ into t1 select id + 1, v1, padding from t2 ; execute snap_events.end_snap execute snap_my_stats.end_snap
All I’m doing is using a couple of my snapshot packages to check the work done and time spent while inserting 100,000 interleaved rows – which are supplied out of order – into the existing table. In the text above the “append” is a comment, not a hint, so I’ll be running the test case a total of 4 times:
- no overflow, with and without the hint
- with the overflow, with and without the hint
(Then, of course, I could run the test without the overflow but an index (i.e. testing the effect of secondary indexes) on v1).
Here are some summary figures from the tests – first from the test without an overflow segment:
Unhinted With Append ============ ============ CPU used when call started 153 102 CPU used by this session 153 102 DB time 166 139 redo entries 130,603 42,209 redo size 78,315,064 65,055,376 sorts (rows) 30 100,031
You’ll notice that with the /*+ append */ hint in place there’s a noticeable reduction in redo entries and CPU time, but this has been achieved at a cost of sorting the incoming data into order. The reduction in redo (entries and size) is due to an “array insert” effect that Oracle can take advantage of with the delayed index maintenance that takes place when the append hint is legal (See the section labelled Option 4 in this note). So even with an IOT with no overflow there’s a potential benefit to gain from direct path loading that depends on how much the new data overlaps the old data, and there’s a penalty that depends on the amount of sorting you’d have to do.
What happens in my case when I move the big padding column out to an overflow segment – here are the equivalent results:
Headline figures Unhinted With Append ================ ============ ============ CPU used when call started 158 52 CPU used by this session 158 52 DB time 163 94 redo entries 116,669 16,690 redo size 51,392,748 26,741,868 sorts (memory) 4 5 sorts (rows) 33 100,032
Interestingly, comparing the new unhinted results with the previous unhinted results, there’s little difference in the CPU usage between having the padding column in the “TOP” section of the IOT compared to having it in the overflow segment, though there is a significant reduction in redo (the index entries are still going all over the place one by one and causing leaf block splits, but the overflow blocks are being pinned and packed much more efficiently). The difference between having the append hint or not, though, is damatic. We drop to one third of the CPU time (despite still having 100,000 rows to sort) and half the redo. One of the side effects of the overflow, of course, is that the things being sorted are much shorter (only the id and v1 columns that go into the TOP section, and not the whole IOT row).
So, if you already have an overflow segment that caters for a significant percentage of the row it looks as if the benefit you could get from using the /*+ append */ hint could far outweigh the penalty you have to pay in sorting. Of course, an IOT with a large overflow doesn’t look much different from a heap table with index – so perhaps that result isn’t very surprising.
I’ll close by re-iterating Connor’s closing comment:
Direct mode insert is a very cool facility, but it doesn’t mean that it’s going to be the best option in every situation.
Before you dive in and embrace it, or ruthlessly push it to one side, make sure you do some testing that reflects the situations you have to handle.
[…] IOTs and direct path loads – /*+ append */ (July 2018) […]
Pingback by IOTs | Oracle Scratchpad — February 11, 2021 @ 9:00 pm GMT Feb 11,2021 |