Oracle Scratchpad

November 27, 2012

IOT Load

Filed under: Infrastructure,IOT,Oracle — Jonathan Lewis @ 5:15 pm GMT Nov 27,2012

When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.

As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that loading from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:

rem
rem	create an empty IOT
rem	Note the columns used for the PK
rem

create table iot_load(
	id	number,
	name	varchar2(32),
	padding	varchar2(100),
	constraint iot_pk primary key(name, id)
)
organization index
;

rem
rem	Create a load file
rem	But since we are running UNIX use the mknod command to
rem	create a pipe before spooling to that pipe.
rem

set pagesize 0
set feedback off
set newpage 0
set trimspool on
set linesize 180
set timing off
set termout off

column object_name format a38

host mknod /tmp/iot_load.dat p

spool /tmp/iot_load.dat

select r, object_name, padding
from
(
	select
		rownum	r,
		object_name,
		rpad('x',100,'x') padding
	from
		all_Objects
	where
		rownum <= 10000
)
order by
	object_name, r
;

spool off

This script creates a target IOT, and uses a simple select statement (you get a clue about how old this code is by the fact that I’ve referenced all_objects, and not a “connect by” with dual) to generate a set of date that is sorted in an order that is a suitable match for the definition of the IOT. In Connor’s case he could have used a simple ‘select from source IOT order by primary key’. I’ve set up a few SQL*Plus environment detals to make the output completely flat and undecorated. A key point, though, is that I’m spooling to a file called /tmp/iot_load.dat and that file has been pre-created as a UNIX pipe. When you run this script the select statement will hang almost immediately because the pipe will become full, at which point you need another process to start emptying it. So from another UNIX session run the following (changing the userid and password as necessary):

sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat

Here’s the content of the control file iot_load.ctl

OPTIONS (direct = true)
UNRECOVERABLE
LOAD DATA
TRUNCATE
INTO table iot_load
sorted indexes (iot_pk)
(
	id		position(01:10)	char,
	name		position(12:49) char,
	padding		position(51:150) char
)

We use direct path load (which means virtually no undo) in unrecoverable mode (which means virtually no redo). But the unrecoverable bit only works because we start by truncating the IOT, and promising that the incoming data is sorted in order of the IOT’s primary key index. And since the data is appearing in order, Oracle doesn’t have to sort it before inserting it. Net result: no undo, no redo (apart from metadata) and no sorting – i.e. minimum overhead for loading the IOT – we simply walk the source in order and copy it into the target.

As this session reads from the pipe, the first session can resume writing into the pipe, so the data flows from source to target through a small piece of shared memory (techically a bit of the file-system buffer, I suppose) until the source session closes the pipe with its “spool off”.

11 Comments »

  1. Hello Jonathan,

    After a quick search of the documentation, I don’t see any equivalent to the “sorted indexes” clause when loading from an external table. Is there such a thing, or is this one advantage of SQL*Loader over external tables?

    Comment by Stew Ashton — November 27, 2012 @ 8:14 pm GMT Nov 27,2012 | Reply

    • Stew,

      This option exists simply to tell Oracle that it can physically build an index in order as the data arrives, but the raw data from external tables never ends up in the database, so never needs to be indexed, so the feature doesn’t apply.

      There is an option with sqlldr, though to turn a control file into an external table definition (sqlldr … external_table = generate_only) so I ran through a little test with my control file to see what would happen:

      [jonathan@linux01 working]$  sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat external_table=generate_only
      

      Gradually hacking my way through the control file, I got the following errors:

      SQL*Loader-144: Conflicting load methods: direct=true/external_table=generate_only specified.
      SQL*Loader-268: UNRECOVERABLE keyword may be used only in direct path.
      SQL*Loader-256: SORTED INDEXES option allowed only for direct path
      
      

      Comment by Jonathan Lewis — December 13, 2012 @ 6:54 pm GMT Dec 13,2012 | Reply

  2. Reblogged this on lava kafle kathmandu nepal.

    Comment by lkafle — November 28, 2012 @ 3:22 am GMT Nov 28,2012 | Reply

  3. Nice post and a good example of re-using old code. Do you happen to know if there is an equivalent windows utility to the unix mknod program for the pipe function?

    Incidentally, and off topic, I note your ‘Curent Moon’ status is behind. If you pop your head out of the window you’ll see it’s a full moon today!

    Comment by Tony Sleight — November 28, 2012 @ 7:13 am GMT Nov 28,2012 | Reply

  4. As an off-topic, may i know whats the purpose of rpad(‘x’,100,’x’) padding in your example?

    Thanks,

    Comment by pandeesh — November 28, 2012 @ 7:28 am GMT Nov 28,2012 | Reply

    • Pandeesh,

      It’s a general purpose bit of padding that I use to create rows of a reasonable size. It’s just an attempt to avoid extreme boundary conditions of very short rows in some of my tests.

      Comment by Jonathan Lewis — December 13, 2012 @ 6:58 pm GMT Dec 13,2012 | Reply

  5. When Oracle is able to write the ‘select’ content into pipe(spool file), why Oracle is not able to write the 10046 tracing content to pipe(10046 trace file) ?

    -Yasser

    Comment by Yasser Khan — December 5, 2012 @ 10:13 am GMT Dec 5,2012 | Reply

    • Yasser,

      Since there’s no mechanical reason why it shouldn’t be possible, I think the answer has to be that the code has been fixed to avoid that possibility. (I tried to create a pipe with the name of a trace file that had not yet been opened, but when I enabled tracing (dbms_monitor call) the session didn’t try to write to the pipe. I suppose I could try using truss (or tusc, or some such) to see exactly what the session did – but I’d rather just suggest that Oracle doesn’t allow you to trace to a file because the session would have to stop and wait for another process to read from the pipe all the time, and this might cause problems.

      I think Tanel Poder once demonstrated a cunning trick where he managed to get the trace file written to the screen as he ran some SQL – but I don’t remember how he did it, or even what platform he was running on at the time. (You could try searching his blog.)

      Comment by Jonathan Lewis — December 13, 2012 @ 9:48 pm GMT Dec 13,2012 | Reply

      • Thanks for your time in researching into this issue. At last i was able to simulate it in RHEL but not in SUSE and SOLARIS.

        By the way its Tanel Poder’s trick which i was trying to simulate it.

        Comment by Yasser Khan — December 17, 2012 @ 10:59 am GMT Dec 17,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers