Oracle Scratchpad

February 21, 2012

Not In – 2

Filed under: CBO,Execution plans,Oracle,Performance,subqueries,Tuning — Jonathan Lewis @ 9:24 pm BST Feb 21,2012

My note on “NOT IN” subqueries is one of the most popular on my blog, staying in the top 5 hits for the last five years – but it’s getting a bit old, so it’s about time I said something new about “NOT IN” – especially since the Null Aware Anti Join has been around such a long time. The example I want to talk about is, as so often, something that came up as a problem on a customer site. Here’s a bit of SQL to model the situation, which is currently running under Oracle 11.1.0.7:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 10000
)
select
	rownum			id1,
	trunc((rownum-1)/10000)	id2,
	trunc((rownum-1)/10000)	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000;

-- collect stats, compute, no histograms

alter table t1 add constraint t1_pk primary key(id1, id2);
alter index t1_pk invisible;
create index t1_i2 on t1(id1, id2, n1);

create type jpl_scalar as object(x1 number);
/
create type jpl_table as table of jpl_scalar;
/

create type jpl_scalar3 as object(x1 number, x2 number, x3 number);
/
create type jpl_table3 as table of jpl_scalar3;
/

There are a couple of oddities in the model – one is that the second index starts with the same columns as the primary key index, and this is to emulate the functionality of the client system; the other is that I’ve made the primary key index invisible, the nature of the client data and code was such that the primary key index was not used in the query I’m about to emulate and I’ve made the index invisible so that I don’t have to mess around with statistics or complicated data routines to make the appropriate behaviour appear in the model.

Now that I’ve got the model in place, let’s take a look at the query:

delete
from	t1
where
	(id1) in (
		select x1 from table(cast(:b1 as jpl_table)) v1
	)
and	(id1, id2, n1) not in (
		select x1, x2, x3 from table(cast(:b2 as jpl_table3)) v2
	)
;

This query is actually trying to delete a load of data from the table. The surrounding PL/SQL package populates two collections with up to 1,000 items, The first collection identifies rows that may need to be deleted but the second collection rows then identifies rows from the first set that should not be deleted. You’ll notice that the first collection identifies rows only by the first column of the primary key, and the second collection uses both parts of the key and a non-key column to countermand the deletion. Typically both collections would hold close to the 1,000 item limit set by the developer, and typically only one or two rows would end up being deleted each time the statement ran (the id1 column tends to be “nearly unique” across the first collection).

With that overview in mind, thinking particularly of the number of rows intially identified and the number of rows that survive, look at the execution plan:

------------------------------------------------------
| Id  | Operation                            | Name  |
------------------------------------------------------
|   0 | DELETE STATEMENT                     |       |
|   1 |  DELETE                              | T1    |
|   2 |   NESTED LOOPS                       |       |
|   3 |    SORT UNIQUE                       |       |
|   4 |     COLLECTION ITERATOR PICKLER FETCH|       |
|*  5 |    INDEX RANGE SCAN                  | T1_I2 |
|*  6 |     COLLECTION ITERATOR PICKLER FETCH|       |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("ID1"=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))
       filter( NOT EXISTS (SELECT 0 FROM TABLE() "KOKBF$" WHERE
              LNNVL(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)<>:B1) AND
              LNNVL(SYS_OP_ATG(VALUE(KOKBF$),2,3,2)<>:B2) AND
              LNNVL(SYS_OP_ATG(VALUE(KOKBF$),3,4,2)<>:B3)))
   6 - filter(LNNVL(SYS_OP_ATG(VALUE(KOKBF$),1,2,2)<>:B1) AND
              LNNVL(SYS_OP_ATG(VALUE(KOKBF$),2,3,2)<>:B2) AND
              LNNVL(SYS_OP_ATG(VALUE(KOKBF$),3,4,2)<>:B3))

The optimizer has unnested the first collection (IN list) sorted the set for uniqueness, and used it to drive a nested loop through the index we need (avoiding the table) to pick up the rowid and all the columns we need to check against the second collection. However, as it does the index range scan for each unique item in the first collection it runs the NOT IN subquery checking whether the row it has acquired has a match in the second collection. This means that around 1,000 times we fetch a row and scan the second collection for a match. We almost always find a match so, on average, we will have to scan 500 items from the second collection to find that match. The statement was CPU intensive; on the production system it did about 3,000 buffer gets but took about 1 CPU second to find and delete (on average) one row.

The problem is the repeated scanning on the second collection – and it’s a problem that shouldn’t exist. I want Oracle to unnest the second query and do a hash anti join with it. If we did that then we would only scan the second collection once, scatter it into memory, and only do one probe and comparison for each row brought back by the first collection. This is (a mockup of) the plan I want to see:

----------------------------------------------------------
| Id  | Operation                              | Name    |
----------------------------------------------------------
|   0 | DELETE STATEMENT                       |         |
|   1 |  DELETE                                | T1      |
|   2 |   HASH JOIN ANTI                       |         |
|   3 |    NESTED LOOPS                        |         |
|   5 |     SORT UNIQUE                        |         |
|   6 |      COLLECTION ITERATOR PICKLER FETCH |         |
|   7 |     INDEX RANGE SCAN                   | T1_I2   |
|   8 |    COLLECTION ITERATOR PICKLER FETCH   |         |
----------------------------------------------------------

With the SQL supplied, I couldn’t make this plan appear in 11.1. I had hoped to force the path I wanted and then create an SQL Baseline for it, but I actually had to rewrite the query, converting the “NOT IN” to “NOT EXISTS” – now this isn’t always legal, of course, but in my case I knew that all the relevant columns would be non-null (even though the n1 column in the table had not been declared as such) and the data accumulated in the collections would also be non-null so the transformation was safe. So here’s the rewrite and the new plan:

delete
from	t1
where
	exists (
		select
			null
		from	table(cast(:b1 as jpl_table)) v1
		where	v1.x1 = t1.id1
	)
and	not exists (
		select
			null
		from
			table(cast(:b2 as jpl_table3)) v2
		where	v2.x1 = t1.id1
		and	v2.x2 = t1.id2
		and	v2.x3 = t1.n1
	)
;

-------------------------------------------------------
| Id  | Operation                             | Name  |
-------------------------------------------------------
|   0 | DELETE STATEMENT                      |       |
|   1 |  DELETE                               | T1    |
|*  2 |   HASH JOIN ANTI                      |       |
|   3 |    NESTED LOOPS                       |       |
|   4 |     SORT UNIQUE                       |       |
|   5 |      COLLECTION ITERATOR PICKLER FETCH|       |
|*  6 |     INDEX RANGE SCAN                  | T1_I2 |
|   7 |    COLLECTION ITERATOR PICKLER FETCH  |       |
-------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."ID1"=SYS_OP_ATG(VALUE(KOKBF$),1,2,2) AND
              "T1"."ID2"=SYS_OP_ATG(VALUE(KOKBF$),2,3,2) AND
              "T1"."N1"=SYS_OP_ATG(VALUE(KOKBF$),3,4,2))
   6 - access("T1"."ID1"=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))

It’s exactly what I wanted – but the code has to be modified, which means the full testing cycle and delay, and because of the complexities of the collection objects and the need to use realistic data it’s something that I can’t actually do on the sand-pit that the client lets me play with. So, as it stands, I think it ought to be quite a bit more efficient – but I’ll have to wait a couple of weeks to find out.

Here’s the irritating bit. The client will be upgrading to 11.2.0.3 in the not too distant future, and here’s the plan you get from the original query on that version of Oracle:

------------------------------------------------------------
| Id  | Operation                               | Name     |
------------------------------------------------------------
|   0 | DELETE STATEMENT                        |          |
|   1 |  DELETE                                 | T1       |
|   2 |   MERGE JOIN ANTI NA                    |          |
|   3 |    SORT JOIN                            |          |
|   4 |     NESTED LOOPS                        |          |
|   5 |      VIEW                               | VW_NSO_1 |
|   6 |       SORT UNIQUE                       |          |
|   7 |        COLLECTION ITERATOR PICKLER FETCH|          |
|*  8 |      INDEX RANGE SCAN                   | T1_I2    |
|*  9 |    SORT UNIQUE                          |          |
|  10 |     COLLECTION ITERATOR PICKLER FETCH   |          |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("ID1"="X1")
   9 - access(INTERNAL_FUNCTION("ID1")=SYS_OP_ATG(VALUE(KOKBF$),1,2,2)
              AND INTERNAL_FUNCTION("ID2")=SYS_OP_ATG(VALUE(KOKBF$),2,3,2) AND
              INTERNAL_FUNCTION("N1")=SYS_OP_ATG(VALUE(KOKBF$),3,4,2))
       filter(INTERNAL_FUNCTION("N1")=SYS_OP_ATG(VALUE(KOKBF$),3,4,2)
              AND INTERNAL_FUNCTION("ID2")=SYS_OP_ATG(VALUE(KOKBF$),2,3,2) AND
              INTERNAL_FUNCTION("ID1")=SYS_OP_ATG(VALUE(KOKBF$),1,2,2))

With my data set it uses a merge join rather than a hash join, but it manages to unnest both collections – note, also, the appearance of the null-aware anti join; the appearance of the non-mergeable view (vw_nso_1) is also an interesting detail – 11.1 didn’t have this operator in its plan.

In principle 11.1 ought to be able to produce the same plan, all the units of functionality seem to be there – including the null-aware anti-join – but the I just can’t make the plan appear (although I did managed to get an ORA-00600 error with one of my more bizarre attempts at hinting.)

Despite the automatic appearance of what seems to be a suitable (though slightly sub-optimal) path with the upgrade, I think we’ll still be doing the rewrite – interestingly 11.2 does produce a slightly different plan when you go for existence subqueries – it’s a (differently named) non-mergeable view again:

----------------------------------------------------------
| Id  | Operation                              | Name    |
----------------------------------------------------------
|   0 | DELETE STATEMENT                       |         |
|   1 |  DELETE                                | T1      |
|*  2 |   HASH JOIN ANTI                       |         |
|   3 |    NESTED LOOPS                        |         |
|   4 |     VIEW                               | VW_SQ_1 |
|   5 |      SORT UNIQUE                       |         |
|   6 |       COLLECTION ITERATOR PICKLER FETCH|         |
|*  7 |     INDEX RANGE SCAN                   | T1_I2   |
|   8 |    COLLECTION ITERATOR PICKLER FETCH   |         |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."ID1"=SYS_OP_ATG(VALUE(KOKBF$),1,2,2) AND
              "T1"."ID2"=SYS_OP_ATG(VALUE(KOKBF$),2,3,2) AND
              "T1"."N1"=SYS_OP_ATG(VALUE(KOKBF$),3,4,2))
   7 - access("ITEM_2"="T1"."ID1")

One last thought – it looks as if the optimizer has some new ways (including dynamic sampling) of handling collections and subquery manipulation of collections in 11.2: and this client loves doing cunning things with collections – so we’re probably going to get a number of better execution plans from the upgrade – but they’re going to have to check every single example they’ve got of code using collections, because you can bet that somewhere they’ll hit an edge case where the “new improved” mechanisms manage to be the wrong choice.

Footnote: I put a couple of /*+ cardinality (XXX 10) */ hints into my code while I was creating the examples above, but took them out to present the code and results. My data set was small compared to the client’s data set, so I needed the hints but didn’t want to give the impression that they were a necessary part of the solution.

5 Comments »

  1. it looks as if the optimizer has some new ways (including dynamic sampling) of handling collections and subquery manipulation of collections in 11.2

    True – it’s cardinality feedback. Another thing is Oracle uses collection size at run-time with the help of bind peeking.

    Comment by Timur Akhmadeev — February 22, 2012 @ 7:21 am BST Feb 22,2012 | Reply

  2. Hi Jonathan ,

    1000 Rows – getting selected for removal (First collection)
    – Index Range scan 500 Rows (Second Collection)
    As you said 1000 Fetches 500 (worst case 1000 * 500) else average of (500*500) – as said across in description

    Why can’t we prode 1000 rows go for Hash Anti join with needed 500 rows, that will lead to max and min (500) verifications
    instead of average

    some thing like

    Only delaing across with two collections separately
    select x1, x2, x3 from table(cast(:b2 as jpl_table3)) v2
    select x1 from table(cast(:b1 as jpl_table)) v1

    Result to join across with source table t1

    Hash Join (Anti) or Hash Join
    HasH Join
    Collection 2 (Probing first collection) that make my CR less and CPU inturn
    Collection 1
    T1

    I was trying for that..

    Comment by Pavan Kumar N — February 22, 2012 @ 2:57 pm BST Feb 22,2012 | Reply

  3. Jonathan,

    seems few of the keywords got missed relational operator and value after level and rowum, not sure if you left them out intentionally.

    Pleasure reading your posts.

    Comment by vijay sehgal — February 24, 2012 @ 8:15 am BST Feb 24,2012 | Reply

  4. Hi Jonathan,

    I think you meant the changes introduced by bugfix 6708183 (disabled by default), right?
    Some details are in note 6708183.8 including the references to other enhancements in the same area

    Thanks,
    Mauro

    Comment by Mauro — February 24, 2012 @ 5:59 pm BST Feb 24,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,101 other followers