Oracle Scratchpad

November 27, 2011

IOT Answer

Filed under: Infrastructure,IOT,Oracle — Jonathan Lewis @ 10:03 pm BST Nov 27,2011

It was good to see the answers to the last Quiz Night accumulating. The problem posed was simply this: I have two IOTs and I’ve inserted the same data into them with the same “insert as select” statement. Can you explain the cost of a particular query (and it’s the same for both tables) and extreme differences in work actually done. Here’s the query, the critical stats on the primary key indexes, the shared plan, and the critical execution statistic for running the plan.


SQL> select max(v1) from tX;

INDEX_NAME  BLEVEL LEAF_BLOCKS    AVG_LPK    AVG_DPK    CLUF
---------- ------- ----------- ---------- ---------- -------
T3_PK            1           8          1          1    2000
T4_PK            1          12          1          1    2000

---------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |     3 |  2003 |
|   1 |  SORT AGGREGATE       |       |     1 |     3 |       |
|   2 |   INDEX FAST FULL SCAN| Tx_PK |  2000 |  6000 |  2003 |
---------------------------------------------------------------

Statistics (for tX = table t3)
----------------------------------------------------------
       2012  consistent gets

Statistics (for tX = table t4)
----------------------------------------------------------
         17  consistent gets

The answer was in the definition of the two tables – which is why I didn’t supply the DDL to create the structures. In fact I had made it virtually impossible for anyone to get to a totally accurate result because two of the columns were defined as varchar2(40) but held values which were no more than 4 characters long; nevertheless several people deduced that the answer related to overflow segments, and different declarations for the overflow for the two tables.

Here are the original declarations of the two tables (the t1 table used the same select statement, selecting 2,000 rows, to create a simple heap table):

create table t3 (
	id1, id2, v1, v2, padding,
	constraint t3_pk primary key(id1, id2)
)
organization index
including id2                      --  Note the included column
overflow
as
select
	mod(rownum,20)			id1,
	trunc(rownum,100)		id2,
	to_char(mod(rownum,20))		v1,
	to_char(trunc(rownum,100))	v2,
	rpad('x',500)			padding
from
	t1
where
	rownum = 0
;

create table t4 (
	id1, id2, v1, v2, padding,
	constraint t4_pk primary key(id1, id2)
)
organization index
including v2                      --  Note the included column
overflow
as
select
	mod(rownum,20)			id1,
	trunc(rownum,100)		id2,
	to_char(mod(rownum,20))		v1,
	to_char(trunc(rownum,100))	v2,
	rpad('x',500)			padding
from
	t1
where
	rownum = 0
;

Note especially the included clause and the overflow keyword. A better choice of syntax for the included clause to be redefined as the exclude everything after clause because that is (essentially) the effect it has. All columns after the included column are stored in a separate segment and do not appear in the index segment.

With this in mind, we can understand that the v1 column will be in the overflow segment for t3, but in the index segment for t4. The existence of an overflow segment accounts for the cost reported, and the difference in the content of the overflow accounts for the difference in the work done.

Overflow costing

When we have an overflow segment, Oracle can produce a clustering_factor for the index segment. The optimizer then assumes that ANY column – other than primary key columns – referenced in the select list must be in the overflow, whether or not that is in accordance with the object definition. This behaviour accounts for the large cost reported for both queries: the data scatter in the overflow segment is very bad because of the way I generated the data, and the optimizer assumes that the v1 value will require a visit to the overflow in both queries.

Run-time resources

When I run the queries, Oracle finds the v1 column in the index segment for t4 as it does the index fast full scan, so the work done is basically the buffer visits for each block in the index segment (plus a couple extra for the segment header visits); however, for each index entry visited in the t3 fast full scan Oracle finds that it has to jump to a different overflow block from the previous one it was looking at, so the number of block visits is essentially the same as the number of rows in the table. (autotrace and tkprof didn’t quite agree on the number of blocks visited – which is why the autotrace figure is a little lower than you might expect.)

Footnote:

The included clause requires a little more subtlety than just checking “columns after the included column”. The internal (data dictionary) column order is not necessarily the same as the visible order when you create or describe the object. One of the critical differences is that Oracle moves the primary key columns to be the first columns in the internal order; it will also move a long column (if you have one) to be the last column.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers