Oracle Scratchpad

May 15, 2014

Subquery with OR

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:23 pm BST May 15,2014

Prompted by a pingback on this post, followed in very short order by a related question (with a most gratifying result) on Oracle-L, I decided to write up a note about another little optimizer enhancement that appeared in 12c. Here’s a query that differs slightly from the query in the original article:


select
	id, modded, mod_15
from
	t1
where
	t1.mod_15 = 1                     -- originally t1.mod_15 > 0
and	(   t1.modded is null             -- originally t1.modded = 0
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

As a general principle, the “OR EXISTS” stops the optimizer from unnesting the subquery, so my original article suggested a workaround that required you to rewrite the query with a UNION ALL, using the lnnvl() function (where possible) as the easy way to eliminate accidental duplication. Take a look at the plans for my new query, though – first in 11.2.0.4, then in 12.1.0.1:


Execution Plan for 11.2.0.4
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    34 |   374 |    50   (0)| 00:00:01 |
|*  1 |  FILTER            |       |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."MODDED" IS NULL OR  EXISTS (SELECT 0 FROM "T2" "T2"
              WHERE "T2"."ID"=:B1))
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"=:B1)

Execution Plan for 12.1.0.1
------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    27 |   378 |    50   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI NA|       |    27 |   378 |    50   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T1    |   667 |  7337 |    50   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN  | T2_PK |     1 |     3 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."MOD_15"=1)
   3 - access("T2"."ID"="T1"."MODDED")

As expected, 11.2.0.4 has had to use a filter subquery approach – but 12.1.0.1 has found a different path. For this special “is null” case the optimizer has unnested the subquery and used a “null aware (NA) semi-join”. In this very small example there is no change in the reported cost, and the mechanics of the execution plan will be quite similar at run time – but in real systems there are bound to be cases where the new strategy is more efficient.

Unfortunately …

Bug 18650065 (fixed in 12.2) rears it’s ugly head: WRONG RESULTS ON QUERY WITH SUBQUERY USING OR EXISTS.
I can demonstrate this with the following code:


update t1 set modded = null
where id <= 30;
commit;

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

alter table t1 add constraint t1_pk primary key(id);

select
	id, modded, mod_15
from
	t1
where
	t1.id = 1                     -- previously mod_15 = 1
and	(   t1.modded is null
	 or exists (
		select	null
		from	t2
		where	t2.id = t1.modded
	    )
	)
;

And here’s the output from the above script:


30 rows updated.

Commit complete.

        ID     MODDED     MOD_15
---------- ---------- ----------
         1                     1

1 row selected.

Table altered.

no rows selected

I’ve modified a few rows so that the “null-aware” bit of the new transformation matters, but I’ve now got a data set and transformation where I get the wrong results because I’ve defined a primary key (unique would have done) on a critical column in the query. If you check the execution plan you’ll find that the optimizer has switched from a null aware semi-join to a simple nested loop join.

There is a workaround for this problem – disable the relevant feature:

alter session set "_optimizer_null_accepting_semijoin"=false;

For Reference:

Here’s the SQL to generate the data for the above demonstration:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 5000
)
select
	rownum			id,
	mod(rownum,999)		modded,
	mod(rownum,15)		mod_15,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

update t1 set modded = null where modded = 26;

create index t1_i1 on t1(id);
create index t1_i2 on t1(modded);

create table t2
as
select
	2 * rownum		id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	all_Objects
where
	rownum <= 20
;	

alter table t2 add constraint t2_pk primary key(id);

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,175 other followers