Oracle Scratchpad

January 3, 2013

Skip Scan 2

Filed under: CBO,Indexing,Oracle — Jonathan Lewis @ 6:42 pm UTC Jan 3,2013

Here’s a question that is NOT a trick question, it’s demonstrating an example of optimizer behaviour that might come as a surprise.
I have an index (addr_id0050, effective_date), the first column is numeric, the second is a date. Here’s a query with an execution plan that uses that index:

define m_date='30-Jan-2013'

select
	small_vc
from	t1
where
	addr_id0050 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    18 |   396 |    23 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |    18 |   396 |    23 |
|*  2 |   INDEX SKIP SCAN           | T1_I0050 |    18 |       |     5 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID050">=24 AND "EFFECTIVE_DATE"=TO_DATE(' 2013-01-30'
              00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID050"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))

So here’s the question – given that my where clause includes a predicate on the first column of the index that would allow an index range scan to take place, wouldn’t you expect Oracle to do a range scan, and how does a skip scan work in this case ?

To push the point a little further I have another column, also numeric, in the same table which appears in a similar index (addr_id2500, effective_date). Here’s the equivalent query with its execution plan.

select
	small_vc
from	t1
where
	addr_id2500 between 24 and 26
and	effective_date = to_date('&m_date', 'dd-mon-yyyy')
;

------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |     1 |    23 |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1       |     1 |    23 |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_I2500 |     1 |       |     4 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ADDR_ID2500">=24 AND "EFFECTIVE_DATE"=TO_DATE('
              2013-01-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "ADDR_ID2500"<=26)
       filter("EFFECTIVE_DATE"=TO_DATE(' 2013-01-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

In this case, with the same starting predicate (but referencing the alternative column), the optimizer produces the index range that we might expect.

The index skip scan isn’t just for cases where the first column of an index is missing from the list of predicates. The basic principle is that the optimizer has the option to do a small number of tiny range scans on an index when the alternative is to do a tablescan or a very large index range scan.

As you might guess from my column names, addr_id0050 holds 50 distinct values, so the range 24 to 26 actually accounts for 6% of the total volume of the index. On the other hand, there are over 800 distinct values for effective_date. There is even an index on (effective_date) but that’s going to pick up lots of rows that are widely scattered throughout the table then throw away 94% of them so Oracle has decided it’s too expensive to use.

So the optimizer has worked out that it can probe for (24, 30th Jan), (25, 30th Jan), and (26, 30th Jan) as the most efficient access path. Of course, it doesn’t know that it’s probing for exactly those values but statistically it assumes that there are only a few possible values that will show up if it probes the index on the first column – in effect using an “inlist iterator” on the first column without knowing in advance what it’s going to find in the list.

When the optimizer sees the query using addr_id2500 (which holds 2,500 distinct values) the arithmetic faviours the option we are familiar with. A simple range scan based on values between 24 and 26 is going to range through roughly l/800th of the index – which is a tiny number of leaf blocks (just one or two, in my case) so the optimizer decides to do that and check every index entry on the way to see if the effective_date holds a suitable value.

If you want to repeat the experiment, I was using 11.2.0.3 with 1MB uniform extents and freelist management. The SQL to create the table and indexes is as follows:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,2500)				addr_id2500,
	mod(rownum,50)					addr_id0050,
	trunc(sysdate) + trunc(mod(rownum,2501)/3)	effective_date,
	lpad(rownum,10,'0')				small_vc,
	rpad('x',100)					padding
from
	generator	v1,
	generator	v2
where
	rownum <= 250000
;

create index t1_i1 on t1(effective_date);
create index t1_i2500 on t1(addr_id2500, effective_date);
create index t1_i0050 on t1(addr_id0050, effective_date);

-- collect stats, no histograms, 11g auto sample size

I had disabled CPU costing to get repeatable results – depending on parameter settings whether you have system stats enabled or not then you may need to tweak the code to change the relative numbers of distinct values in the numeric columns before you see the switch between range scan and skip scan.

January 1, 2013

Stale Stats

Filed under: Oracle,Statistics — Jonathan Lewis @ 6:02 pm UTC Jan 1,2013

The question of how to list objects in need of stats collection appeared on the OTN database forum today from a DBA who had a system collecting stats at the weekend, but wanted to check whether there were any objects with stale stats during the week. It’s actually very easy to do this check because the schema and database calls to gather stats have an option to “list stale”; they also allow you to “list empty”, and “list auto” – the latter being the objects that would be subject to collection if you change the option to (the default) “gather”. “List auto” seems to be the concatenation of “list stale” and “list empty”, by the way.

So, after reading up the manual pages on dbms_stats ($ORACLE_HOME/rdbms/admin/dbmsstat.sql, or the PL/SQL Packages reference) all it takes is a little wrapping to do the job. The code below is a little overkill because it covers 6 different options in one go:

declare
	m_objects	dbms_stats.ObjectTab;
begin
--	dbms_stats.gather_schema_stats(
--		ownname	=> user,
	dbms_stats.gather_database_stats(
		options	=> 'LIST AUTO',
		objlist	=> m_objects
	);

	dbms_output.new_line;
	dbms_output.put_line('Auto: ' || m_objects.count);
	dbms_output.put_line('--------------------');
	for i in 1..m_objects.count loop
		dbms_output.put(rpad(m_objects(i).ownname,10));
		dbms_output.put(rpad(m_objects(i).objname,32));
		dbms_output.put(rpad(nvl(m_objects(i).partname,' '),32));
		dbms_output.put(rpad(nvl(m_objects(i).subpartname,' '),32));
		dbms_output.put(rpad(m_objects(i).objtype,6));
		dbms_output.new_line;
	end loop;

--	dbms_stats.gather_schema_stats(
--		ownname	=> user,
	dbms_stats.gather_database_stats(
		options	=> 'LIST STALE',
		objlist	=> m_objects
	);

	dbms_output.new_line;
	dbms_output.put_line('Stale: ' || m_objects.count);
	dbms_output.put_line('--------------------');
	for i in 1..m_objects.count loop
		dbms_output.put(rpad(m_objects(i).ownname,10));
		dbms_output.put(rpad(m_objects(i).objname,32));
		dbms_output.put(rpad(nvl(m_objects(i).partname,' '),32));
		dbms_output.put(rpad(nvl(m_objects(i).subpartname,' '),32));
		dbms_output.put(rpad(m_objects(i).objtype,6));
		dbms_output.new_line;
	end loop;

--	dbms_stats.gather_schema_stats(
--		ownname	=> user,
	dbms_stats.gather_database_stats(
		options	=> 'LIST EMPTY',
		objlist	=> m_objects
	);

	dbms_output.new_line;
	dbms_output.put_line('Empty: ' || m_objects.count);
	dbms_output.put_line('--------------------');
	for i in 1..m_objects.count loop
		dbms_output.put(rpad(m_objects(i).ownname,10));
		dbms_output.put(rpad(m_objects(i).objname,32));
		dbms_output.put(rpad(nvl(m_objects(i).partname,' '),32));
		dbms_output.put(rpad(nvl(m_objects(i).subpartname,' '),32));
		dbms_output.put(rpad(m_objects(i).objtype,6));
		dbms_output.new_line;
	end loop;

end;
/

The code inludes the three list options, and both database and schema calls – choose which one to comment out, the call to the schema gather needs the ownname parameter included.

There are a couple of irritating little details with the feature – best described through an example of the output, which I’ve taken from the call to “list empty” for the sys schema from a copy of 11.2.0.3 that I happened to have around.

SYS       WRI$_OPTSTAT_SYNOPSIS$          P_74334                                                      TABLE
SYS       WRI$_OPTSTAT_SYNOPSIS$          P_74333                                                      TABLE
SYS       SYS_IOT_TOP_66252                                                                            INDEX
SYS       SYS_IOT_TOP_66254                                                                            INDEX
SYS       SYS_IOT_TOP_66257                                                                            INDEX
SYS       SYS_IOT_TOP_66260                                                                            INDEX
SYS       SYS_IOT_TOP_66302                                                                            INDEX

I didn’t code in any headings, but you can see that the first two lines are reporting partitions that have no stats – but the object type (limited by the definition built into the dbms_stats package) is simply “TABLE” rather than “TABLE PARTITION”. The last five lines report objects that are the physical instantiation of index organized tables so, although they are indexes and we can call gather_index_stats(), we would probably prefer to know the table name that they came from. Neither problem is a big issue, of course; it wouldn’t be very difficult to modify the PL/SQL to return the values we would prefer to see.

For the enterprising and enthusiastic – tonight’s assignment.  Wrap code like this into a pipelined function so that you can issue a call like:

select * from table(stats_required('list stale'[,{schema}));

Update (3rd Jan)

Thanks to Guest at comment 4 for his offering which captures a critical point (the autonomous transaction problem). Here’s my quick solution:

create or replace function stats_required(
	i_status	in	varchar2	default 'LIST AUTO',
	i_schema	in	varchar2	default user
)
return dbms_stats.ObjectTab pipelined
as
	pragma autonomous_transaction;
	m_objects	dbms_stats.ObjectTab;
begin
	if upper(i_status) not in (
		'LIST AUTO', 'LIST STALE', 'LIST EMPTY'
	) then
		return;
	end if;

	if i_schema is null then
		dbms_stats.gather_database_stats(
			options	=> i_status,
			objlist	=> m_objects
		);
	else
		dbms_stats.gather_schema_stats(
			ownname	=> i_schema,
			options	=> i_status,
			objlist	=> m_objects
		);
	end if;
	commit;

	for i in 1..m_objects.count loop
		pipe row(m_objects(i));
	end loop;

	return;

end;
/

December 28, 2012

Quiz Night

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 11:30 am UTC Dec 28,2012

Here’s a little puzzle if you have nothing better to do between now and the new year. The following extract came from 11.2.0.3, but could have come from 10.2.0.5 or 9.2.0.8 (and many others). I’ve got a simple heap table where the last thing I (or anyone) did was “alter table t1 move” before dumping the first (data) block of the table. Looking at the resulting trace file, I see the following:

fsbo=0x56e
fseo=0xf4d
avsp=0x5f
tosp=0x5f

If you need to have the acronyms decoded they are (according to my best guess):
fsbo – free space, beginning offset
fseo – free space, end offset
avsp – available space
tosp – total space

Doing the arithmetic, the free space starts at offset 0x56e and ends at 0xf4d, which means the free space gap is 2,527 bytes; but the total space available for use is only 0x5f bytes, i.e. 95 bytes. So what has happened to the other 2,432 ?

Remember – I dumped the block immediately after issuing “alter table t1 move”, so there are no issues of delayed block cleanout, uncommitted transactions etc. to worry about.

Footnote: the reason why you have “available space” and “total space” is to keep track of the space made available by deleted rows. The “avsp” (usually) reports the size of the gap between the row directory and the row heap; the “tosp” includes the space in the holes left in the row heap after rows have been deleted (or updated in a way that moves them up to the top of the heap, leaving a gap behind them, or updated in situ in a way that reduces the row length leaving a little hole).

December 20, 2012

Evaluations

Filed under: Uncategorized — Jonathan Lewis @ 8:19 pm UTC Dec 20,2012

The evaluations for UKOUG 2012 are in – though perhaps not all of them have been loaded into the “Speaker Lounge” yet. I’m fairly pleased with my scores.  On a range of 1 – 6:

Presentation skills: 5.8, 5.79, and 5.72

Of course I have to say that great presentation skills don’t necessarily mean that the presentation was good. I’ve seen a couple of presentations in the past where the presenter was brilliant at communicating his topic – unfortunately at least half of what he  said was wrong. Conversely I’ve sat through a few presentations where the material was brilliant, but the presenter just needed lots more practice.

Fortunately I think the following scores show that it wasn’t all show and no substance:

Content Evaluation: 5.56, 5.36, 5.28

My favourite comment: on my B-trees masterclass that lasted 1 hour 50 minutes: “It should have lasted 2 hr 30″. I gave the audience a break at half time – it’s very gratifying when you do that and most of the audience come back when there are competing presentations that they could go to.

 

December 11, 2012

Catch-up

Filed under: Uncategorized — Jonathan Lewis @ 5:05 pm UTC Dec 11,2012

Looking back at the comments audit trail on wordpress I see that I’ve got nearly two months of comments waiting for answers – and some of those answers need serious thought that I haven’t had time to supply yet. But I’ve got a (fairly) free day on Thursday so I’ll see what I can do to bring the comments up to date.

I’ve just had a great day at the Trivadis CBO days – Maria Colgan (the optimizer lady) on the 20th anniversary of the CBO, then Joze Senegacnik on transformations, me on Strategies for Statistics in 11g, ending with Randolf Geist on Parallel Execution (including a reference to one of my older blog items which I now think could well be wrong – so I may have to spend Thursday reviewing it and looking at his analysis instead of working through the comments).

Update 13th Dec

Well, it’s taken me all day – but I seem to have caught up with the comment trail. I’ve probably missed some from the preceding weeks, though, so I’ll be trawling backwards for a little while yet.

I have had time, though, to do a little more work on the parallel query and hash join buffered blog – and it’s very interesting, and I’m definitely wrong, so I’m going to have to prioritise the job of correcting it.

December 10, 2012

Viewing Figures

Filed under: Site History — Jonathan Lewis @ 10:51 am UTC Dec 10,2012

It’s time for another landmark. The blog hit 3,000,000 views on 30th November  - but I’ve been too busy to do my usual analysis of posts. The latest half million took 42 weeks (five weeks less than the previous half million increment) and here are the latest stats on popular views:

First the top five across the lifetime of the blog (excluding the home page and opening menu) and the old faviourites haven’t changed:

AWR / Statspack menu 57,477 Updated from time to time
NOT IN 40.952 Feb 2007
Cartesian Merge Join 33,367 Dec 2006
dbms_xplan in 10g 33,279 Dec 2006
SQL Profiles (10g) 25,219 Feb 2007

If we look at just the last year, the top five are:

AWR / Statspack menu 19.307 Updated from time to time
Oracle Core 9.996 Dec 2011
Shrink Tablespace 8,788 Feb 2010
AWR Reports 8,548 Feb 2011
NOT IN 8,026 Feb 2007

A couple of old favourites (dbms_xplan in 10g and SQL Profiles (10g) have finally fallen off the top slots, and Cartesian Merge Join didn’t make it back after slipping last year – but it’s interesting (and not entirely surprising) to see how NOT IN is still coming up after so many years.

The other statistic of note is the number of followers the blog has, currently 1,197 (although it was 1,185 on 30th Nov – I probably picked up a few more immediately after the UKOUG annual conference).

 

December 7, 2012

Update Error

Filed under: Execution plans,Oracle,subqueries,Troubleshooting — Jonathan Lewis @ 5:10 pm UTC Dec 7,2012

When doing updates with statements that use correlated subqueries, it’s important to make sure that your brackets (parentheses) are in the right place. Here are two statements that look fairly similar but have remarkably different results – and sometimes people don’t realise how different the statements are:

update t1
set
        padding = (
                select
                        t2.padding
                from    t2
                where   t2.id = t1.id
        )
where   t1.small_vc &lt;= lpad(20,10,'0')
;

update t1
set
        padding = (
                select
                        t2.padding
                from    t2
                where   t2.id = t1.id
                and     t1.small_vc &lt;= lpad(20,10,'0')
        )
;

The first statement will update rows in t1 where t1.small_vc <= lpad(20,10,’0′), copying the value of padding from t2 where (if) the id columns match, but setting t1.padding to null if there is no match.

The second statement will update every single row in t1 – some of the rows will probably be updated as expected (i.e. to get the same result as the first statement) but every row where t1.small_vc is greater than lpad(20,10,’0′) will have the padding column set to null.

Originally I created this little demo to respond to a question on OTN – but when I ran it with rowsource execution statistics enabled, I discovered that it also happened to demonstrate an odd bug in that feature – even in 11.2.0.3. Here are the two plans corresponding to the above two statements. The update is supposed to update 20 rows in t1, the incorrect form of the code will update all 10,000 rows in the table:


20 rows updated.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  73tz2pqv4z8rc, child number 0
-------------------------------------
update t1 set  padding = (   select    t2.padding   from t2   where
t2.id = t1.id  ) where t1.small_vc &lt;= lpad(20,10,'0')

Plan hash value: 417405447

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |       |      1 |        |      0 |00:00:00.01 |     219 |
|   1 |  UPDATE                      | T1    |      1 |        |      0 |00:00:00.01 |     219 |
|*  2 |   TABLE ACCESS FULL          | T1    |      1 |     20 |     20 |00:00:00.01 |     173 |
|   3 |   TABLE ACCESS BY INDEX ROWID| T2    |     20 |      1 |     20 |00:00:00.01 |      45 |
|*  4 |    INDEX UNIQUE SCAN         | T2_PK |     20 |      1 |     20 |00:00:00.01 |      25 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(&quot;T1&quot;.&quot;SMALL_VC&quot;&lt;='0000000020')
   4 - access(&quot;T2&quot;.&quot;ID&quot;=:B1)

10000 rows updated.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  ax22zpj8xdwpn, child number 0
-------------------------------------
update t1 set  padding = (   select    t2.padding   from t2   where
t2.id = t1.id   and t1.small_vc &lt;= lpad(20,10,'0')  )

Plan hash value: 2442374960

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |       |      3 |        |      0 |00:00:00.27 |   21576 |
|   1 |  UPDATE                       | T1    |      3 |        |      0 |00:00:00.27 |   21576 |
|   2 |   TABLE ACCESS FULL           | T1    |      3 |  10000 |  29692 |00:00:00.03 |     514 |
|*  3 |   FILTER                      |       |  19692 |        |     40 |00:00:00.01 |      85 |
|   4 |    TABLE ACCESS BY INDEX ROWID| T2    |     40 |      1 |     40 |00:00:00.01 |      85 |
|*  5 |     INDEX UNIQUE SCAN         | T2_PK |     40 |      1 |     40 |00:00:00.01 |      45 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(:B1&lt;='0000000020')
   5 - access(&quot;T2&quot;.&quot;ID&quot;=:B1)

Note, in both cases, how the number of rows reported in the update operation (line 1) is zero although we might expect it to match the A-rows value reported in the table access full operation (line 2) which tells us how many rows were identified for update. But something strange has happened in the line reporting the table access full in the second case – clearly there’s a bug somewhere because there were only 10,000 rows in the table, yet the tablescan has (apparently) identified 29,692 rows for update.

Not only is the tablescan line reporting silly numbers, the filter line is as well (because that should only start 10,000 times – once per row in the table) moreover the subquery lines (4 and 5) should only start 20 times each because the filter should eliminate the need to run the subquery for all rows except the 20 where “T1″.”SMALL_VC”<=’0000000020. Even stranger is the fact that when I ran this test several times in a row (recreating the tables each time), the starts and A-rows values for the filter and table access full lines changed (decreasing) every time!

It took me a little while to figure out what was going on – but the doubling of the starts for the subquery gave me a hint, confirmed when I took a snapshot of how the session stats changed when I ran the second update. It’s just another variant of the odd “write consistency” behaviour that can appear when running tablescan through a large volume of data and updating it (although I have to say that I hadn’t seen it happening for a mere 10,000 rows before).  Oracle starts the tablescan and update, and after a while decides that it has to rollback and restart the update – the 20 rows that were subject to the subquery were the first 20 rows of the table, which is why they were almost sure to trigger the subquery twice each, had they appeared half way through the table the effect might not have been so consistent).

Footnote:

If you want to repeat my test for yourselves, here’s the code to generate the data sets:

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level &lt;= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum &lt;= 1e4
;

create table t2
as
select * from t1
;

alter table t1 add constraint t1_pk primary key(id);
alter table t2 add constraint t2_pk primary key(id);

November 30, 2012

v$sqlstats

Filed under: AWR,Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 5:45 pm UTC Nov 30,2012

Sometimes you find bugs on MOS (Metalink, OCIS, whatever) that make you feel positively ill. I’ve just been on a customer site where (in passing) they mentioned that one of their historic queries avainst v$sqlstats now tool just over one second (CPU) in 11.2.0.3 when it had previously taken about 200ms on 10.2.0.4***. After a little checking it seemed likely that the change was possibly related to the fact that they had increased the size of the SGA significantly, allowing for a much larger shared pool and library cache; however there have been numerous code changes in the shared pool area on the route from 10g to 11g, so I decided to check MOS to see if anyone else had seen a similar problem. I found this:

Bug 13550185 : QUERIES ON V$SQLSTATS MAY CAUSE CRASH OR HANG

Version 11.2.0.3

Fixed in 12.1

Last updated: 29th Nov 2012

There is a documented workaround in the base bug - stop collecting SQL execution statistics, but this rather defeats the point of having AWR snapshot capturing all that “SQL ordered by … “ stuff. On the other hand, if some queries against v$sqlstats can cause sessions to crash or the database to hang then every AWR snapshot might be the one that hangs your database !

The bug in question references Solaris 10 on SPARC, but a quick search for the bug number finds 66 other entries in the bug database and the first one I looked at was on Redhat Linux 5; and the bug also describes itself as “platform generic”. On a positive note, the status is currently “80 – Development to QA/Fix Delivered Internal”, and when I raised the reported the threat to the client, they contacted their Oracle Support contact and discovered that there was a patch available already (number 15033625), and that they had already installed it as a side effect of installing a merge patch to fix a problem with corruption in the shared pool. If you search the patch lists on MOS you’ll find that there are patches for many different versions in the 11.2.0.2 and 11.2.0.3 range, for several different platforms.

*** An increase from 200 ms to 1 second may not seem something worth worrying about, but the client ran the query every 10 seconds because (as it said on the packet in 10g) v$sqlstats is latch free and very cheap to run and (b) they have SLAs which mean they basically need to know if any of their critical, high-frequency, queries are going bad within one minute of the problem starting to occur.

November 27, 2012

IOT Load

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 5:15 pm UTC Nov 27,2012

When I introduced Connor McDonald’s blog a few days ago, it was because we had exchanged a couple of email messages (through the Oak Table Network) about how to minimise the resource consumption when copying a load of data from one IOT to another of the same structure. His problem was the way in which the obvious way of copying the data resulted in a massive sort even though, in principle, it should not have been necessary to sort anything since the data could have been extracted in order by walking the existing IOT.

As a suggestion I referenced a comment I had made in the Addenda to Practical Oracle 8i about 12 years ago when I had first solved the problem of loading an IOT with minimal logging and no sorting. At the time I had been loading data from a sorted file into an empty table that was then going to be exchnaged into a partitioned IOT – but it crossed my mind that load from a flat file and loading from a UNIX pipe were pretty much the same thing, so perhaps Connor could workaround his problem by making one session spool to a pipe while another session was reading it. In the end, he simply created a massive temporary tablespace, but I thought I’d modify a test script I wrote a few years ago to demonstrate my idea – and here it is:

rem
rem	create an empty IOT
rem	Note the columns used for the PK
rem

create table iot_load(
	id	number,
	name	varchar2(32),
	padding	varchar2(100),
	constraint iot_pk primary key(name, id)
)
organization index
;

rem
rem	Create a load file
rem	But since we are running UNIX use the mknod command to
rem	create a pipe before spooling to that pipe.
rem

set pagesize 0
set feedback off
set newpage 0
set trimspool on
set linesize 180
set timing off
set termout off

column object_name format a38

host mknod /tmp/iot_load.dat p

spool /tmp/iot_load.dat

select r, object_name, padding
from
(
	select
		rownum	r,
		object_name,
		rpad('x',100,'x') padding
	from
		all_Objects
	where
		rownum &lt;= 10000
)
order by
	object_name, r
;

spool off

This script creates a target IOT, and uses a simple select statement (you get a clue about how old this code is by the fact that I’ve referenced all_objects, and not a “connect by” with dual) to generate a set of date that is sorted in an order that is a suitable match for the definition of the IOT. In Connor’s case he could have used a simple ‘select from source IOT order by primary key’. I’ve set up a few SQL*Plus environment detals to make the output completely flat and undecorated. A key point, though, is that I’m spooling to a file called /tmp/iot_load.dat and that file has been pre-created as a UNIX pipe. When you run this script the select statement will hang almost immediately because the pipe will become full, at which point you need another process to start emptying it. So from another UNIX session run the following (changing the userid and password as necessary):

sqlldr userid=test_user/test control=iot_load data=/tmp/iot_load.dat

Here’s the content of the control file iot_load.ctl

OPTIONS (direct = true)
UNRECOVERABLE
LOAD DATA
TRUNCATE
INTO table iot_load
sorted indexes (iot_pk)
(
	id		position(01:10)	char,
	name		position(12:49) char,
	padding		position(51:150) char
)

We use direct path load (which means virtually no undo) in unrecoverable mode (which means virtually no redo). But the unrecoverable bit only works because we start by truncating the IOT, and promising that the incoming data is sorted in order of the IOT’s primary key index. And since the data is appearing in order, Oracle doesn’t have to sort it before inserting it. Net result: no undo, no redo (apart from metadata) and no sorting – i.e. minimum overhead for loading the IOT – we simply walk the source in order and copy it into the target.

As this session reads from the pipe, the first session can resume writing into the pipe, so the data flows from source to target through a small piece of shared memory (techically a bit of the file-system buffer, I suppose) until the source session closes the pipe with its “spool off”.

November 23, 2012

Consistency

Filed under: Indexing,Oracle — Jonathan Lewis @ 5:18 pm UTC Nov 23,2012

Here’s a funny little glitch – typical of the sort of oddity that creeps into the data dictionary from time to time – cut-n-pasted from 11.1.0.7:

SQL&gt; select count(*) from user_indexes;

  COUNT(*)
----------
      1074

1 row selected.

SQL&gt; select count(*) from user_objects where object_type = 'INDEX';

  COUNT(*)
----------
       917

1 row selected.

We seem to have 157 indexes that aren’t indexes – what could they be. Perhaps a small enhancement to our query on user_indexes (the one that has the larger result) will help:

SQL&gt; select index_type, count(*) from user_indexes group by index_type order by count(*) desc;

INDEX_TYPE                    COUNT(*)
--------------------------- ----------
NORMAL                             854
LOB                                157
IOT - TOP                           40
FUNCTION-BASED NORMAL               13
CLUSTER                             10

5 rows selected.

Spot the missing (or excess, depending on your viewpoint) 157 indexes. LOB indexes (the system generated index that maps a LOB segment) aren’t reported as being of type index, and the following predicate in the definition of the view user_objects tells us why not:

  and (o.type# not in (1  /* INDEX - handled below */,
                      10 /* NON-EXISTENT */)
       or
       (o.type# = 1 and 1 = (select 1
                              from sys.ind$ i
                             where i.obj# = o.obj#
                               and i.type# in (1, 2, 3, 4, 6, 7, 9))))

Compare this with the definition of view user_indexes, and we can see that user_objects is explicitly excluding LOB and “IOT – NESTED” indexes.

        decode(i.type#, 1, 'NORMAL'||
                          decode(bitand(i.property, 4), 0, '', 4, '/REV'),
                      2, 'BITMAP', 3, 'CLUSTER', 4, 'IOT - TOP',
                      5, 'IOT - NESTED', 6, 'SECONDARY', 7, 'ANSI', 8, 'LOB',
                      9, 'DOMAIN'),

This changes in 11.2.0.3, where type 8 (LOB) is explicitly included in the user_objects view.

Does anyone have an example of an ANSI index, or an “IOT – NESTED” index ?

November 21, 2012

Plan Order

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:53 pm UTC Nov 21,2012

The previous post reminded me of another (fairly special) case where the order of operations in an execution plan seems to be wrong according to the “traditional” strategy for reading execution plans. Here’s a simple select statement with its execution plan to demonstrate the point:

select
	small_vc
from
	t1
where
	exists (
		select	null
		from	f1
		where	f1.id       = t1.id
		and	f1.small_vc = t1.small_vc
	)
and
	exists (
		select	null
		from	f2
		where	f2.id = 21
	)
;

------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    29 |    51   (2)| 00:00:01 |
|*  1 |  FILTER               |       |       |       |            |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |     1 |    29 |    51   (2)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | F1    |    20 |   280 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T1    | 10000 |   146K|    48   (0)| 00:00:01 |
|*  5 |   INDEX UNIQUE SCAN   | F2_PK |     1 |    13 |     0   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "F2" "F2" WHERE "F2"."ID"=21))
   2 - access("F1"."ID"="T1"."ID" AND "F1"."SMALL_VC"="T1"."SMALL_VC")
   5 - access("F2"."ID"=21)

The traditional strategy for reading this plan (recursively operate the child rows of each parent row in the order that they appear) would say: we scan table F1 and buildilng a hash table in memory, then scan table T1 probing the hash table to perform the hash semi join of line 2. For each row that survives the hash join, the filter operation at line 1 tells us to run the subquery against F2 to see if the row should be passed forward as an output of the select statement.

You might like to pause briefly at this point to convince yourself that this is the way we usually interpret the indentation of an execution plan.

It’s not what happens in this special case. Notice that the second subquery isn’t correlated – it need only run once for Oracle to decide whether or not it will return any data. As a side effect of this special case, the plan operates “upside down”. Here’s the same execution plan pulled from memory after enabling rowsource execution statistics. It’s implrtant to be aware that in my test case the data in F2 doesn’t have a row where id = 21.

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  5bd1m72cz4awy, child number 0
-------------------------------------
select  small_vc from  t1 where  exists (   select null   from f1   where f1.id       = t1.id   and
f1.small_vc = t1.small_vc  ) and  exists (   select null   from f2   where f2.id = 21  )

Plan hash value: 1423735592

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
|*  1 |  FILTER               |       |      1 |        |      0 |00:00:00.01 |       1 |       |       |          |
|*  2 |   HASH JOIN RIGHT SEMI|       |      0 |      1 |      0 |00:00:00.01 |       0 |   825K|   825K|          |
|   3 |    TABLE ACCESS FULL  | F1    |      0 |     20 |      0 |00:00:00.01 |       0 |       |       |          |
|   4 |    TABLE ACCESS FULL  | T1    |      0 |  10000 |      0 |00:00:00.01 |       0 |       |       |          |
|*  5 |   INDEX UNIQUE SCAN   | F2_PK |      1 |      1 |      0 |00:00:00.01 |       1 |       |       |          |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - access("F1"."ID"="T1"."ID" AND "F1"."SMALL_VC"="T1"."SMALL_VC")
   5 - access("F2"."ID"=21)

Look carefully at the starts column. Line 1 (filter) started once; line 5 (the second child to the filter) started once and returned no rows; line 2 (the first child to the filter) never started – it didn’t have to because by this point Oracle had already determined that any data it generated would be eliminated by the non-existence of a match from line 5.

Summary

There are a few special case plans where the normal “first child first” rule (“… more what you’d call guidelines than actual rules.” — Captain Barbossa) for reading execution plans doesn’t apply. The “constant subquery” introduces one of them.

Footnote:

If you want to repeat the experiment on different versions of Oracle, here’s the code to generate my test data:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table f1 as select * from t1 where id <= 20;
create table f2 as select * from t1 where id <= 20;

alter table f1 add constraint f1_pk primary key(id);
alter table f2 add constraint f2_pk primary key(id);

-- collect stats (compute, no histograms)

November 19, 2012

Plan timing

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 9:59 pm UTC Nov 19,2012

I’ve copied the following question up from a recent comment because it’s a useful generic type of question, and I think I can answer it in the few minutes I have to spare.

Hi Jonathan. I have a query plan where I cannot explain how time adds up. I did the ALTER SESSION trick but it changed nothing. I ran this, and got the following plan. I have two questions (I appoligize for not being able to format this code and plan but I saw not formatting buttons on the insert box).

1) @ step #8 in the plan, the query jumps to 3 and 1/2 minutes. This step says VIEW but gives no indication of what it did that actually took 3 and 1/2 minutes. Can you explain or give me some idea how to find out what is being done on this line that takes that long. Especially with so few rows.

2) the total elapsed time of the query was 3 minutes 45 seconds. This jives with step #1 that says 3:44.54. But if A-TIME is supposed to be among other things, the sum of child steps, then how to we explain step #2. Step #2 says 3 minutes 3 minutes 44 seconds. But the sum of its child steps (#3 and #8) is7 minutes and 25 seconds. Did this query do these steps “in parallel” so to speak? Or is there something really off in the numbers and if so how to I prove it?

This is not something that is super pressing but I sure would like to be able to explain the differences to people. It is making it somewhat harder to get people to believe I know what I am doing when I cannot such obvious issues as these two things.

Thanks, Kevin Meade

Fortunately Kevin included enough information to allow me to understand what was going on. In this case that means Here’s the query and the execution plan pulled from memory – showing the execution statistics. As follows:

ALTER SESSION SET STATISTICS_LEVEL=ALL
/

drop table kevtemp1
/

create table kevtemp1
nologging
as
SELECT SRCE_EFF_START_TMSP,
       AGGREGATECLAIMUID
FROM (
     SELECT  A.AGGREGATECLAIMUID,
             A.SNAPSHOT_DT,
             MAX(A.SRCE_EFF_START_TMSP) SRCE_EFF_START_TMSP
     FROM (SELECT PI.AGGREGATECLAIMUID,
                  PI.SRCE_EFF_START_TMSP,
                  (SELECT DISTINCT BW.SNAPSHOT_DT
                   FROM RRS_SHR.RRS_ETL_BATCH_WINDOW BW, RRS_SHR.RRS_LOAD_STATUS RLS
                   WHERE RLS.ASTG_LOAD_STATUS = 'STARTED'
                   AND RLS.SNAPSHOT_DT = BW.SNAPSHOT_DT
                   AND BW.RPTG_WINDOW_TYPE = 'D'
                   AND BW.OBJECT_NM = 'R_AGGREGATE_CLAIM_SEED'
                   AND PI.SRCE_EFF_START_TMSP
                       BETWEEN
                       BW.BEGIN_TMSP
                       AND
                       BW.END_TMSP
                  ) AS SNAPSHOT_DT
           FROM RRS_PSTG.P_PCD_AGGREGATECLAIM PI
           WHERE PI.BATCH_ID IN
                 (SELECT src_batch_id
                  FROM batch_processing_queue
                  WHERE dm_batch_id =
                        batch_id_pkg.get_current_batch_id('PCD','ATM')
                 )
          ) A
          GROUP BY A.AGGREGATECLAIMUID, A.SNAPSHOT_DT
     )
/

Table created.

Elapsed: 00:03:45.14

Plan hash value: 3515505567

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT          |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |       |       |          |
|   1 |  LOAD AS SELECT                 |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |   530K|   530K|  530K (0)|
|   2 |   HASH GROUP BY                 |                             |      1 |  84487 |  48482 |00:03:44.41 |     245K|   8386 |      0 |  3128K|  1284K| 3538K (0)|
|   3 |    HASH UNIQUE                  |                             |  33524 |      1 |  33524 |00:03:40.98 |     236K|     12 |      0 |  1270K|  1270K|  523K (0)|
|   4 |     NESTED LOOPS                |                             |  33524 |      1 |  33524 |00:00:08.35 |     236K|     12 |      0 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID| RRS_ETL_BATCH_WINDOW        |  33524 |     11 |  33524 |00:00:07.84 |     203K|      9 |      0 |       |       |          |
|*  6 |       INDEX RANGE SCAN          | IDX1_RRS_ETL_BATCH_WINDOW   |  33524 |     41 |   5414K|00:00:04.09 |   86482 |      5 |      0 |       |       |          |
|*  7 |      INDEX UNIQUE SCAN          | IUK2_RRS_LOAD_STATUS        |  33524 |      1 |  33524 |00:00:00.31 |   33535 |      3 |      0 |       |       |          |
|   8 |    VIEW                         |                             |      1 |  84487 |  48484 |00:03:43.99 |     245K|   8386 |      0 |       |       |          |
|*  9 |     HASH JOIN                   |                             |      1 |  84487 |  48484 |00:00:02.63 |    9186 |   8374 |      0 |  1023K|  1023K| 1243K (0)|
|* 10 |      INDEX RANGE SCAN           | IAK1_BATCH_PROCESSING_QUEUE |      1 |      5 |    102 |00:00:00.22 |     855 |     44 |      0 |       |       |          |
|  11 |      TABLE ACCESS FULL          | P_PCD_AGGREGATECLAIM        |      1 |   1536K|   1536K|00:00:01.47 |    8331 |   8330 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(("BW"."BEGIN_TMSP"=:B1)
   7 - access("RLS"."SNAPSHOT_DT"="BW"."SNAPSHOT_DT" AND "RLS"."ASTG_LOAD_STATUS"='STARTED')
   9 - access("PI"."BATCH_ID"="SRC_BATCH_ID")
  10 - access("DM_BATCH_ID"="BATCH_ID_PKG"."GET_CURRENT_BATCH_ID"('PCD','ATM'))

46 rows selected.

Elapsed: 00:00:01.75

Working through the layers of the query, from the outside towards the middle, we have a simple select from an inline view. The inline view, however, is an aggregate view that selects from another inline view. One of the “columns” of the innermost inline view is an scalar subquery that joins two tables, and the view also uses an IN subquery. So we need to identify all these pieces in the execution plan.

Lines 9 – 11 of the plan show us that the IN subquery of the innermost view has been unnested, producing a hash join. The VIEW operator tells us that this join has been optimized as a “leaf” query block. Lines 3 – 7 give us the separately optimized execution plan for the “select distinct” inline scalar subquery; I am slightly surprised that this part of the plan doesn’t appear after the VIEW operator at line 8, indented at the same level as line 9, since it is a plan that generates a value for each row returned in the view.

Note: although the plan in lines 3 – 7 could be executed once for each row returned by the plan in lines 9 – 11, the number of executions could (and in this case is) fewer than the worst case thanks to scalar subquery caching.

The location of the scalar subquery in the overall plan does make it harder to understand the various time and resource usage figures – but once you understand what the query has to do, it is easier to work out why the numbers have a dislocated look. The time in line 8 is 3:43.99 because it includes the time in line 9 (2.63 seconds) that generates 48,484 rows, plus the time in line 3 (3:40.98) that comes from running the scalar subquery 33,524 times.

Summary

Subqueries can appear in various places in execution plans – when they are scalar subqueries representing columns in the select list, they appear ABOVE the query block that calls them. This can be very confusing when you’re trying to follow the trail of where the time and resources are first used when you’ve collected rowsource execution statistics.

November 15, 2012

New Blog

Filed under: Oracle — Jonathan Lewis @ 9:19 am UTC Nov 15,2012

I’ve just discovered that Connor McDonald has had a blog for ages, and I hadn’t noticed because it’s not getting accumulated on the Oak Table site. So, for a brief period only, here’s the URL.

November 13, 2012

Busy – busy

Filed under: Uncategorized — Jonathan Lewis @ 8:57 am UTC Nov 13,2012

You may have noticed that I haven’t added any new material for some time – sometimes life just gets too busy with other things.

As a stop gap for those who like to browse interesting Oracle information, I’ve added a few more links to my “Bits and Pieces” page – the newly added items are marked with ***

 

November 6, 2012

Learning

Filed under: Uncategorized — Jonathan Lewis @ 5:47 pm UTC Nov 6,2012

Possibly an item to file under “philosophy”, but a question came up in the seminar I was presenting today that prompted me to say blog (very briefly) about why I manage to be so good at avoiding errors and inventing workarounds to problems. You probably know that you may see an execution plan change when you add a foreign key constraint to a table – but today someone in the class asked me if this would still work if the constraint were to disabled. The immediate response that sprang to my mind was “surely not” – but the second thought was that I didn’t really know the answer and would have to check; and the third thought was that maybe it wouldn’t if disabled, but what about novalidate; and the fourth thought was whether the setting for query_rewrite_integrity would make a difference; and the fifth thought was to wonder if there were other conditions that mattered.

So hey-ho for the weekend, when I have to set up a test case for a query that changes plan when I add a foreign key constraint – and then try (at least) eight different combinations of circumstances to check what it really takes to make the change happen.

 

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,398 other followers