Oracle Scratchpad

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm BST Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

March 28, 2014

Juggernaut

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 8:12 am BST Mar 28,2014

One of the problems of “knowing” so much about Oracle is that the more you know the more you have to check on each new release of the software. An incoming ping on my posting “Lock Horror” reminded me that I was writing about 11.2.0.1, and the terminal release is 11.2.0.4, and the whole thing may have changed in 12.1.0.1 – so I ought to re-run some tests to make sure that the articel is up to date if it’s likely to be read a few times in the next few days.

Unfortunately, although I often add a URL to scripts I’ve used to confirm results published in the blog, I don’t usually include a script name in my blog postings  to remind me where to go if I want to re-run the tests. So how do I find the right script(s) ? Typically I list all the likely scripts and compare dates with the date on the blog; so here’s what I got for “lock”.


SQL> host ls -ltr *lock*.sql | grep -v block
-rwxr-xr-x 1 jonathan dba 1569 Jun 28  2002 c_bitlock.sql
-rwxr-xr-x 1 jonathan dba 1303 Oct  5  2002 ddl_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1875 Oct  7  2002 ddl_deadlock_2.sql
-rwxr-xr-x 1 jonathan dba 1654 Aug  6  2003 hw_lock.sql
-rwxr-xr-x 1 jonathan dba 2626 Sep 17  2004 lock_oddity.sql
-rwxr-xr-x 1 jonathan dba 1804 Sep 17  2004 lock_speed.sql
-rwxr-xr-x 1 jonathan dba 3194 May  8  2006 space_locks.sql
-rwxr-xr-x 1 jonathan dba 4337 Jan  3  2008 tm_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1149 Jan  3  2008 show_lock.sql
-rwxr-xr-x 1 jonathan dba 2068 Apr 21  2008 hw_lock_2.sql
-rwxr-xr-x 1 jonathan dba 1482 Feb  5  2010 tt_lock.sql
-rwxr-xr-x 1 jonathan dba 1692 Feb 16  2010 to_lock.sql
-rwxr-xr-x 1 jonathan dba 3308 Jun  1  2010 skip_locked.sql
-rwxr-xr-x 1 jonathan dba 2203 Nov  2  2010 deadlock_statement.sql
-rwxr-xr-x 1 jonathan dba 2883 Nov  3  2010 merge_locking.sql
-rwxr-xr-x 1 jonathan dba 1785 Dec 14  2010 sync_lock.sql
-rwxr-xr-x 1 jonathan dba  984 Apr 23  2011 para_dml_deadlock.sql
-rwxr-xr-x 1 jonathan dba 4305 Jun  4  2011 locking_fifo.sql
-rwxr-xr-x 1 jonathan dba 5970 Jun  5  2011 locking_fifo_2.sql
-rwxr-xr-x 1 jonathan dba  917 Jun 30  2011 ul_deadlock.sql
-rwxr-xr-x 1 jonathan dba  936 Jul  8  2011 funny_deadlock.sql
-rwxr-xr-x 1 jonathan dba  741 Sep  8  2011 row_lock_wait_index.sql
-rwxr-xr-x 1 jonathan dba 2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba 4561 Feb  6  2013 dbms_lock.sql
-rwxr-xr-x 1 jonathan dba 1198 Apr  6  2013 libcache_locks.sql
-rwxr-xr-x 1 jonathan dba 5636 Nov 27 19:40 ash_deadlocks.sql
-rwxr-xr-x 1 jonathan dba  379 Mar 27 19:17 fk_constraint_locks.sql

Nothing leaps out as an obvious candidate, though “funny_deadlock.sql” catches my eye for future reference; maybe I should look for “foreign key”.

SQL> host ls -ltr *fk*.sql | grep -v fkr
-rwxr-xr-x 1 jonathan dba  2140 Jun 16  2005 fk_check.sql
-rwxr-xr-x 1 jonathan dba  2897 Jun 16  2005 fk_order.sql
-rwxr-xr-x 1 jonathan dba   650 Oct 26  2007 pk_fk_null.sql
-rwxr-xr-x 1 jonathan dba  5444 Nov  4  2007 c_fk.sql
-rwxr-xr-x 1 jonathan dba  1568 Dec  5  2008 null_fk.sql
-rwxr-xr-x 1 jonathan dba  2171 Mar  2  2009 fk_anomaly_2.sql
-rwxr-xr-x 1 jonathan dba  3922 Mar  2  2009 fk_anomaly.sql
-rwxr-xr-x 1 jonathan dba  5512 Oct 15  2009 fk_check_2.sql
-rwxr-xr-x 1 jonathan dba  1249 Feb 15  2010 c_pk_fk_2.sql
-rwxr-xr-x 1 jonathan dba  1638 Feb 16  2010 c_pk_fk_3.sql
-rwxr-xr-x 1 jonathan dba  5121 Jun  1  2012 c_pt_fk_2.sql
-rwxr-xr-x 1 jonathan dba  4030 Jun  5  2012 c_pt_fk_3.sql
-rwxr-xr-x 1 jonathan dba  2062 Jun  5  2012 c_pt_fk_3a.sql
-rwxr-xr-x 1 jonathan dba  2618 Sep 23  2012 c_pk_fk_02.sql
-rwxr-xr-x 1 jonathan dba  1196 Oct 19  2012 deferrable_fk.sql
-rwxr-xr-x 1 jonathan dba  2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba  4759 Sep  1  2013 fk_bitmap.sql
-rwxr-xr-x 1 jonathan dba  1730 Sep 30 07:51 virtual_fk.sql
-rwxr-xr-x 1 jonathan dba  3261 Dec 22 09:41 pk_fk_gets.sql
-rwxr-xr-x 1 jonathan dba  8896 Dec 31 13:19 fk_delete_gets.sql
-rwxr-xr-x 1 jonathan dba 10071 Dec 31 14:52 fk_delete_gets_2.sql
-rwxr-xr-x 1 jonathan dba  4225 Jan 14 11:15 c_pk_fk.sql
-rwxr-xr-x 1 jonathan dba  2674 Jan 14 13:42 append_fk.sql
-rwxr-xr-x 1 jonathan dba  1707 Feb 10 12:34 write_cons_fk.sql
-rwxr-xr-x 1 jonathan dba  9677 Feb 24 17:23 c_pt_fk.sql
-rwxr-xr-x 1 jonathan dba   379 Mar 27 19:17 fk_constraint_locks.sql

(The “-fkr” is to eliminate scripts about “first K rows optimisation”). With a little luck, the dates are about right, c_pk_fk_2.sql and c_pk_fk_3.sql will be relevant. So keep an eye on “Lock Horror” for an update in the next few days.

You’ll notice that some of the scripts have a very old datestamp on them - that’s an indication of how hard it is to keep up; when I re-run a script on a new version of Oracle I invariably add a “Last Tested:” version to the header, and a couple of notes about changes.  A couple of my scripts date back to June 2001 – but that is, at least, the right century, and some people are still using Oracle 7.

Footnote

It should be obvious that I can’t test everything on every new release – but it’s amazing how often on a client site I can recognize a symptom and pick at script that I’ve used in the past to construct the problem – and that’s when a quick bit of re-testing helps me find a solution or workaround (or Oracle bug note).

 

March 26, 2014

Diagnostics

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 2:13 pm BST Mar 26,2014

Here’s a little test you might want to try. Examine the following script, and decide what sort of symptoms you would see in the AWR report.


create global temporary table gtt1(n1 number);

execute dbms_workload_repository.create_snapshot;

insert into gtt1 values(1);
truncate table gtt1;

-- repeat insert/truncate for a total of 100 cycles

execute dbms_workload_repository.create_snapshot;

-- generate an AWR report across the interval.

I don’t need anyone to tell me their results – but if your predictions and the actual results match then you can give yourself a pat on the head.
You might also like to enable SQL trace for all the inserts/truncate to see if that shows you anything interesting.

This is one of the simpler scripts of the 3,500 I have on my laptop that help me interpret the symptoms I see in client systems.

March 3, 2014

Flashback Fail ?

Filed under: Bugs,Oracle,Troubleshooting — Jonathan Lewis @ 4:19 pm BST Mar 3,2014

Sitting in an airport, waiting for a plane, I decided to read a note (pdf) about Flashback data archive written by Beat Ramseier from Trivadis.  I’d got about three quarters of the way through it when I paused for thought and figured out that on the typical database implementation something nasty is going to happen after approximately 3 years and 9 months.  Can you guess why ?

It’s all about smon_scn_time – which normally records one row every five minutes (created by smon) with a continuous cycle of 24 hours – typically giving you about 1,440 rows in the table. The table is in a cluster, and the cluster key is the instance (thread) number. Clearly this was originally a clever idea from someone who realised that a cluster key of thread number would be beneficial if you had a RAC system with multiple instances – each instance gets its own blocks and the data for any one instance is as well clustered as possible.

The trouble is, when you enable flashback data archive smon no longer sticks to a 24 hour cycle, it just keeps adding rows. Now on my 8KB block tablespace I see 6 rows per block in the table/cluster – which means I get through 48 blocks per days,  17,520 blocks per year, and in 3 years and 9 months I’ll get to roughly 65,700 blocks – and that’s the problem.  An index entry in a cluster index points to a chain of cluster blocks, and the last two bytes of the “rowid” in the index entry identify which block within the chain the cluster key scan should start at – and two bytes means you can only have 65,536 blocks for a single cluster key.

I don’t know what’s going to happen when smon tries to insert a row into the 65,535th (-ish) block for the current thread – but it ought to raise an Oracle error, and then you’ll probably have to take emergency action to make sure that the flashback mechanisms carry on running.

Although oraus.msg indicates that it’s an error message about hash clusters it’s possible that the first sight will be: Oracle error: “ORA-02475 maximum cluster chain block count of %s has been exceeded”. If you’re using a 16KB block size then you’ve got roughly 7.5 years, and 32KB block sizes give you about 15 years (not that that’s a good argument for selecting larger block sizes, of course.)

Footnote:

Searching MoS for related topics (smon_scn_time flashback) I found doc ID: 1100993.1 from which we could possibly infer that the 1,440 rows was a fixed limit in 10g but that the number of rows allowed in smon_scn_time could increase in 11g if you enable automatic undo management. I also found a couple of bugs relating to index or cluster corruption – fixed by 11.2.0.4, though.

 

 

February 10, 2014

Row Migration

Filed under: Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 6:55 pm BST Feb 10,2014

At one of the presentations I attended at RMOUG this year the presenter claimed that if a row kept increasing in size and had to migrate from block to block as a consequence then each migration of that row would leave a pointer in the previous block so that an indexed access to the row would start at the original table block and have to follow an ever growing chain of pointers to reach the data.

This is not correct, and it’s worth making a little fuss about the error since it’s the sort of thing that can easily become an urban legend that results in people rebuilding tables “for performance” when they don’t need to.

Oracle behaves quite intelligently with migrated rows. First, the migrated row has a pointer back to the original location and if the row has to migrate a second time the first place that Oracle checks for space is the original block, so the row might “de-migrate” itself; however, even if it can’t migrate back to the original block, it will still revisit the original block to change the pointer in that block to refer to the block it has moved on to – so the row is never more than one step away from its original location. As a quick demonstration, here’s some code to generate and manipulate some data:

create table t1 (
id	number(6,0),
v1	varchar2(1200)
)
pctfree 0
;

prompt	==========================================
prompt	The following code fits 74 rows to a block
prompt	==========================================

insert into t1 select rownum - 1, rpad('x',100) from all_objects where rownum <= 75;
commit;

prompt	======================================
prompt	Make the first row migrate and dump it
prompt	======================================

update t1 set v1 = rpad('x',400) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',2)

prompt	===========================================================
prompt	Fill the block the long row is now in, force it to migrate,
prompt	then dump it again.
prompt	===========================================================

insert into t1 select rownum + 75, rpad('x',100) from all_objects where rownum <= 75;
commit;

update t1 set v1 = rpad('x',800) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

prompt	========================================================
prompt	Fill the block the long row is now in and shrink the row
prompt	to see if it returns to its original block. (It won't.)
prompt	========================================================

insert into t1 select rownum + 150, rpad('x',100) from all_objects where rownum <= 75;
commit;

update t1 set v1 = rpad('x',50) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

prompt	========================================================
prompt	Make a lot of space in the first block and force the row
prompt	to migrate again to see if it migrates back. (It does.)
prompt	========================================================

delete from t1 where id between 1 and 20;
commit;

update t1 set v1 = rpad('x',1200) where id = 0;
commit;

alter system flush buffer_cache;
execute dump_seg('t1',3)

My test database was using 8KB blocks (hence the 74 rows per block), and 1MB uniform extents with freelist management. The procedure dump_seg() takes a segment name as its first parameter and a number of blocks as the second (then the segment type and starting block as the third and fourth) and dumps the first N data blocks of the segment. To demonstrate what goes on, I’ve extracted the content of the first row (id = 0) after each of the four dumps:

After the first update - the column count (cc) is zero and the "next rowid" (nrid) is row 1 of block 0x0140000b

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x0140000b.1

After the second update - the next rowid is row 7 of block 0x0140000c

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x1  cc: 0
nrid:  0x0140000c.7

After the third update (shrinking the row) the row hasn't moved from block 0x0140000c

tab 0, row 0, @0xb3
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x0140000c.7

After the fourth update (making space, and growing the row too much) the row moves back home

tab 0, row 0, @0x4c1
tl: 1208 fb: --H-FL-- lb: 0x2  cc: 2
col  0: [ 1]  80
col  1: [1200]
78 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20

My calls to dump blocks included the blocks where the row migrated to, so we’ll have a look at the target locations (as given by the original row location’s nrid) in those blocks over time. First we check block 0x0140000b, row 1 after the first two migrations:

tab 0, row 1, @0x1d7f
tl: 414 fb: ----FL-- lb: 0x2  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [400]

tab 0, row 1, @0x1d7f
tl: 2 fb: ---DFL-- lb: 0x1

After the first migration (the row arrives here) we have a “head rowid” (hrid) pointer telling us where the row came from. After the second migration, when the row has moved on, we simply have a typical “deleted stub” – two bytes reserving the row directory entry until the commit has been done and cleaned out.

Then we can examine the second target (0x140000c, row 7) on the second and third and fourth updates:


tab 0, row 7, @0x1966
tl: 814 fb: ----FL-- lb: 0x2  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [800]

tab 0, row 7, @0xb1
tl: 62 fb: ----FL-- lb: 0x1  cc: 2
hrid: 0x0140000a.0
col  0: [ 1]  80
col  1: [50]

tab 0, row 7, @0xb1
tl: 2 fb: ---DFL-- lb: 0x2

As you can see, on arrival this location gets the original rowid as its “head rowid” (hrid), and it knows nothing about the intermediate block where the row was briefly in transit. I’ve copied the length byte (in square brackets) of column 1 in the dumps so that you can see that the row stayed put as it shrank. We can then see on the last update that we are left with a deleted stub in this block as the row migrates back to its original location when we try to extend it beyond the free space in this block.

Migrated rows are only ever one step away from home. It’s not nice to have too many of them, but it’s not necessarily a disaster.

RAC Plans

Filed under: Execution plans,Hints,Oracle,RAC,Troubleshooting — Jonathan Lewis @ 1:12 pm BST Feb 10,2014

Recently appeared on Mos – “Bug 18219084 : DIFFERENT EXECUTION PLAN ACROSS RAC INSTANCES”

Now, I’m not going to claim that the following applies to this particular case – but it’s perfectly reasonable to expect to see different plans for the same query on RAC, and it’s perfectly possible for the two different plans to have amazingly different performance characteristics; and in this particular case I can see an obvious reason why the two nodes could have different plans.

Here’s the query reported in the bug:

SELECT /*+ INDEX(C IDX3_GOD_USE_LOT)*/
   PATTERN_ID, STB_TIME
    FROM mfgdev.MTR_AUTO_GOD_AGENT_BT C
   WHERE 1 = 1
     AND EXISTS (SELECT /*+ INDEX(B IDX_MTR_STB_LOT_EQP)*/ 1
            FROM MFGDEV.MTR_STB_BTH B
           WHERE B.PATTERN_ID = C.PATTERN_ID
             AND B.STB_TIME = C.STB_TIME
             AND B.ACTUAL_START_TIME < SYSDATE
             AND EXISTS (SELECT /*+ INDEX(D CW_LOTID)*/
                   1
                    FROM F14DM.DM_CW_WIP_BT D
                   WHERE D.LOT_ID = B.LOT_ID
                     AND D.SS = 'BNKI'));

See the reference to “sysdate”. I can show you a system where you had a 15 minute window each day (until the problem was addressed) to optimize a particular query if you wanted a good execution plan; if you optimized it any other time of day you got a bad plan – and the problem was sysdate: it acts like a peeked bind variable.

Maybe, on this system, if you optimize the query at 1:00 am you get one plan, and at 2:00 am you get another – and if those two optimizations occur on different nodes you’ve just met the conditions of this bug report.

Here’s another thought to go with this query: apparently it’s caused enough problems in the past that someone’s written a couple of hints into the code. With three tables and two correlated subqueries in the code a total of three index() hints is not enough. If you’re going to hard-code hints into a query then take a look at the outline it generates when it does the right thing, and that will tell you about the 15 or so hints you’ve missed out. (Better still, consider generating an SQL Baseline from the hinted code and attaching it to the unhinted code.)

February 5, 2014

Minus

Filed under: Execution plans,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 5:42 pm BST Feb 5,2014

Here’s a little script to demonstrate an interesting observation that appeared in my email this morning (that’s morning Denver time):

create table t1
as
select * from all_objects where rownum = 1;

delete from t1;
commit;

create table t2
as
select * from all_objects where rownum <= 100000;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

prompt  ======================
prompt  And now the test query
prompt  ======================

select * from t1
minus
select * from t2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
alter session set events '10046 trace name context off';

Clearly the first query block in the test query will return no rows, and since the MINUS operator returns rows from the first result set that do not appear in the second result set there is no need for Oracle to run the second query block. Well, guess what …

The ‘create where rownum = 1′ followed by ‘delete’ is a lazy workaround to avoid side effects of deferred segment creation so that you can run the script on any (recent) version of Oracle. The flush, combined with 10046 trace, allowed me to see waits that showed which objects Oracle scanned and when, and the display_cursor() was just icing on the cake.

I’ve checked 11.2.0.4 and 12.1.0.1, and both of them scan t1 first and then scan t2 unnecessarily.

This surprised me slightly given how smart the optimizer can be, but I guess it’s one of those boundary cases where the optimizer has just one strategy for an entire class of queries. I couldn’t think of any “legal” way to control the effect, but here’s the first dirty trick that came to my mind. If you’re sure that the first subquery is going to be cheap and you’re worried that the second subquery is expensive, you could do the following:

select v2.*
from
	(select * from t1 where rownum = 1)	v1,
	(
		select * from t1
		minus
		select * from t2
	)	v2
;

Introduce a spurious query to return one row from the first subquery and join it do the MINUS query. If the inline view doesn’t return any rows Oracle short-circuits the join, as shown by the following execution path with stats:

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   1 |  MERGE JOIN CARTESIAN  |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   2 |   VIEW                 |      |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|*  3 |    COUNT STOPKEY       |      |      1 |        |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   4 |     TABLE ACCESS FULL  | T1   |      1 |      1 |      0 |00:00:00.01 |       2 |     32 |       |       |          |
|   5 |   BUFFER SORT          |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   6 |    VIEW                |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   7 |     MINUS              |      |      0 |        |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|   8 |      SORT UNIQUE       |      |      0 |      1 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|   9 |       TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
|  10 |      SORT UNIQUE       |      |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 | 73728 | 73728 |          |
|  11 |       TABLE ACCESS FULL| T2   |      0 |  70096 |      0 |00:00:00.01 |       0 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(ROWNUM=1)

The only thing to watch out for is that the “rownum = 1″ doesn’t make the optimizer switch to an unsuitable “first_rows(1)” execution plan.

February 2, 2014

Clustaghhh!

Filed under: clusters,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 10:27 pm BST Feb 2,2014

It doesn’t matter which bit of Oracle technology you want to use, eventually someone, somewhere, runs into the special case where something nasty happens. Here’s an edge case for people using (index) clusters – Oracle Bug  17866999 ora-1499 for cluster following rman convert

It comes from a conversation on Oracle-L where Jack van Zanen reported a problem of inconsistent results after migrating data between platforms using rman to converts some tablespaces containing index clusters. This is the starting post where he shows a query that is clearly getting the wrong answer (select where channel_number = 503 obviously shouldn’t return data with channel_number 501).

Unfortunately the “audit-trail” on this problem is incomplete because I accidentally started sending email to Jack instead of the list (reply, rather than reply all) – but given that the problem column was the cluster key of an index cluster the obvious guess was that something had gone wrong as the data for this key value had grown and chained into a second (or third) block. To confirm this I asked Jack to select the rowids reported for the cluster and use the dbms_rowid package to convert them into distinct (file_number, block_number) values:

select
	dbms_rowid.rowid_relative_fno(rowid)	file_no,
	dbms_rowid.rowid_block_number(rowid)	block_no,
	count(*)				rows_in_block
from
	&m_table	t1
group by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
order by
	dbms_rowid.rowid_relative_fno(rowid),
	dbms_rowid.rowid_block_number(rowid)
;

With a few file and block numbers in hand, we then needed to see the results from dumping the blocks. Jack sent me the trace files, and I found that my guess was correct – and also discovered that I had been on a private discussion, so I posted the results back to the list. As you can see from that posting, the cluster key (indicated by the initial K in the flag byte, and by its presence in “tab 0″ in the block) is 4 columns long, and I’ve got entries from two blocks where the cluster keys are pointing to each other – but the cluster keys don’t hold the same value.

It’s an extraordinary corruption – the data structures appear to be totally self-consistent and perfectly clean, and yet the data content is wrong.  (Usually I find that a corruption means the data becomes unusable).

Oracle eventually tracked this down and reproduced it; it’s a problem that can arise if you are using index clusters and use rman to do a cross-platform conversion – and it’s not going to be easy to fix it. I don’t suppose many people are likely to see this bug (it’s been around for years, apparently, but there aren’t many people who use clusters, and of those perhaps very few have migrated between platforms using rman) – but at present the only workaround is to export and reimport the data; which is not nice when you’re talking about terabytes of clusters.

January 26, 2014

Pagination

Filed under: Bugs,Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:08 pm BST Jan 26,2014

I was involved in a thread on Oracle-L recently started with the question: “How many LIOs is too many LIOs”. Rather than rewrite the whole story, I’ve supplied a list of links to the contributions I made, in order – the final “answer” is actually the answer to a different question – but travels an interesting path to get there.#

I’ve got a script to emulate the requirement so that people can see for themselves the bug that I mention in post 15; I’ll try to add a couple of notes to it and publish it some time, but for the moment I’ll just remind myself that it’s called (slightly counter-intuitively: no_sort_problem.sql)

January 6, 2014

LOB changes

Filed under: Infrastructure,LOBs,Oracle,Troubleshooting — Jonathan Lewis @ 7:10 pm BST Jan 6,2014

It’s always useful to collect baseline information – especially when it helps you notice that the baseline has moved in a way that might explain the next performance problem you see. Here’s an example demonstrating the benefit.

I have a table with a LOB column that is stored out of line. Many years ago I decided that I wanted to compare how the redo generation varied as I change the LOB from cached to nocache (with nologging). So here was one of my simplest test scripts (stripped to a minimum):


create table t1 (
	id		number,
	cached_lob	clob
)
lob (cached_lob) store as text_lob_cache(
	disable storage in row 
	cache
--	nocache logging	
)
;

insert into t1
select
	rownum,
	rpad('x',4000)
from
	all_objects
where
	rownum <= 1000
;

When I first ran this test (twice, one for each option) I noticed that cached LOBs recorded just the redo needed to describe the data that had been written into them (so at 1,000 rows of 4KB each that would be in the ballpark of 4MB with some overheads), but the nocache option created redo for the entire block (so nearer 8MB, plus overheads). There were, of course, all sorts of other side effects of the change – some of them very nasty – but this was one of the more important differences to note.

A little while ago I was prompted to repeat the test – so I ran the whole thing three times on 10.2.0.5, 11.2.0.4, and 12.1.0.1; it’s fascinating to see the way the key redo stats change:


10.2.0.5
========
CACHE
Name                                                                     Value
----                                                                     -----
redo entries                                                             5,588
redo size                                                            5,288,836

NOCACHE
redo entries                                                             5,578
redo size                                                            9,407,824



11.2.0.1
========
CACHE
redo entries                                                             5,223
redo size                                                            5,277,520

NOCACHE
redo entries                                                             5,220
redo size                                                            1,212,992
redo size for direct writes                                             52,000


12.1.0.1
========
CACHE
redo entries                                                             5,231
redo size                                                            9,343,444

NOCACHE
redo entries                                                             5,291
redo size                                                            1,301,700
redo size for direct writes                                             60,000

As you can see the 10.2.0.5 results mirror the first results I got – the redo size is consistent with the volume of data when the LOBs are cached, and matches whole block dumps when they’re not cached – but something very clever seems to appear in the 11.2.0.5 results – the redo for the lob component of nocache LOBs seems to have disappeared – there’s a drop of (roughly) 8,192,000 bytes from the 10g figures: is this a clever trick (filesystem-like logging ?) or is it a reporting error. Then 12c is even stranger – the redo for the cached LOBs now matches block logging (8MB + overheads) instead of data logging – and the redo for the nocache LOBs is still “missing”.

To be investigated further – possibly by dumping redo log files, but initially simply by watching when log file switches take place – to see if the changes are reporting errors or genuine changes.

December 31, 2013

Troubleshooting

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 5:30 pm BST Dec 31,2013

If it’s a long night, and you’ve got nothing better to do waiting for the new year, then here’s some reading to keep you entertained. A collation of Tanel Poder’s trouble-shooting blog postings, dating from June 2007 to September 2013:

  1. When the wait interface is not enough
  2. No magic is needed, systematic approach will do
  3. More adventures in the process stack
  4. Diagnosing a long parsing issue
  5. Sampling v$ stuff with waitprof really fast using SQL
  6. Understanding execution plans with OS explain
  7. Sampling latch holder statistics using latchprof
  8. Even more details latch troubleshooting using latchprofx
  9. Process stack profiling from SQL Plus using ostackprof
  10. Index unique scan doing multiblock reads
  11. Complex wait chain signature analysis with ash_wait_chains sql

December 23, 2013

Random slowdown

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 6:48 pm BST Dec 23,2013

One of the old, old questions came up on OTN a few days ago:  “What reasons are there for a piece of (batch) SQL to run very slowly occasionally when it usually runs very quickly ?” Then a similar one appeared a couple of days later. There are plenty of reasons, so I thought I’d start listing a few and see how many other (realistic) reasons other people wanted to add.

(more…)

December 11, 2013

Null Quiz

Filed under: Oracle,Performance,Troubleshooting,Tuning — Jonathan Lewis @ 6:42 pm BST Dec 11,2013

Here’s an example I saw a few months ago of the confusion caused by NULL. As the owner of the problem put it: the first query, run from SQL*Plus for testing purposes, takes no time to complete; but when “put into a pl/sql cursor” (as shown in the second query) it takes ages to complete.

(more…)

August 26, 2013

Index Sorting

Filed under: latches,Oracle,Troubleshooting — Jonathan Lewis @ 7:48 am BST Aug 26,2013

From time to time I’ve mentioned the fact that the optimizer will sort indexes alphabetically by name as the last tie-breaker when two plans have the same cost. Thanks to an email that arrived a couple of days ago I’ve been alerted to event 10089 (which has been around since at least 8.1.7.4) with the description: “CBO Disable index sorting”.

(more…)

July 12, 2013

Wrong Index 2

Filed under: CBO,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 5:17 pm BST Jul 12,2013

A couple of days ago I wrote an article about Oracle picking the “wrong index” after an index rebuild, and I mentioned that the sample data I had generated looked a little odd because it came from a script I had been using to investigate a completely different problem. This note describes that other problem, which appeared on the Oracle-L mailing list last month.

Stripped to a bare minimum, here’s the issue: we have a simple query against a single table with two indexes IDX2_AUFTRAG(arsAuftragsNr, dategAuftragsNr) and IDX7_AUFTRAG(arsAuftragsNr), and a predicate “arsAuftragsNr = {constant}”. Since the second column in the two-column index is irrelevant (we can’t use it to avoid visiting the table, and it’s not part of a group by or order by clause), and since adding a column to an index is likely to increase the clustering_factor and leaf block count of the index, we would probably expect to see Oracle pick the single column index as the path to the table – but it doesn’t, it picks the two-column index.

(more…)

Next Page »

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers