Oracle Scratchpad

February 14, 2007

Conditional SQL (2)

Filed under: CBO,Conditional SQL,Execution plans,Indexing,Tuning — Jonathan Lewis @ 8:48 am GMT Feb 14,2007

Some time ago I wrote a note on conditional SQL, commenting on code with predicates like:

         where	t1.n1 = nvl(:n1,t1.n1)

In that article, I showed how Oracle can transform a query of this type using concatenation so that at run time one of two possible execution paths can be taken. But, as often happens,  a simple example leads to more complex questions.

What happens if you have several similar predicates in your query – does Oracle keep doubling the number of possible execution paths for the query: if so, how far does it go, if not are there any side effects to consider? What happens with more complex queries, including joins.

I may come back to joins at some other time (the mechanism does appear, but the ramifications are convoluted). Let’s just stick with the single table query for the moment.

In the following test case, I create a table indexed on a pair of columns. I’ve indexed the columns in both possible orders (n1, n2) and (n2, n1). After gathering statistics I run a query that uses the offending nvl() construct. The execution plan comes from 10gR2 autotrace with CPU costing disabled:

rem
rem     Script:         null_plan.sql
rem     Author:         Jonathan Lewis
rem     Dated:          February 2003
rem

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000
)
select
	mod(rownum,1000)	n1,
	mod(rownum,400)		n2,
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 20000
;      

create index t1_i1 on t1(n1, n2);
create index t1_i2 on t1(n2, n1);      

-- gather statistics before running the query below      

variable b1 number
variable b2 number      

exec :b1 := 25; :b2 := 25      

select
	small_vc
from
	t1
where
	n1 = nvl(:b1, n1)
and	n2 = nvl(:b2, n2)
;      

/*
Execution Plan
----------------------------------------------------------
Plan hash value: 3469439717     

-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |    51 |   969 |    59 |
|   1 |  CONCATENATION                |       |       |       |       |
|*  2 |   FILTER                      |       |       |       |       |
|*  3 |    TABLE ACCESS FULL          | T1    |    50 |   950 |    56 |
|*  4 |   FILTER                      |       |       |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID| T1    |     1 |    19 |     3 |
|*  6 |     INDEX RANGE SCAN          | T1_I1 |     1 |       |     2 |
-----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(:B1 IS NULL)
   3 - filter("N2"=NVL(:B2,"N2") AND "N1" IS NOT NULL)
   4 - filter(:B1 IS NOT NULL)
   6 - access("N1"=:B1)
       filter("N2"=NVL(:B2,"N2"))
*/

As you can see, the split and concatenation occurs just once. If the optimizer tried to cover more options, it would double the number of sub-plans that had to be examined every time it introduced one more split – the potential optimisation cost could become very large if it tried to get really clever with these nvl() predicates.

In this example column n1 was used to dictate the split and, as far as I can tell, the rationale for choosing the n1 column was that it had a better selectivity than n2 (there are 1,000 distinct values for n1 but only 400 distinct values for n2). Given this choice of split, Oracle was able to use the index on (n1, n2) on one of the branches of the plan.

So what happens if you drop the index (n1, n2) ? The optimizer still splits the query on the n1 predicate, and does a full tablescan for both parts of the query. It doesn’t seem to consider the cost of operating the two halves of the query and use the n2 predicate as the basis for the split instead – despite the “obvious” benefit of doing so.

One of the often-quoted fairy-tales about indexes was the directive to “put the most selective column first”. It was never a sensible rule of thumb (except, possibly, prior to version 6.0). But if you have a lot of code that uses this nvl() construct then there may actually be a reasonable argument for adopting this approach.  (There’s a better argument for fixing the code, of course).

[Further reading on Conditional SQL]

9 Comments »

  1. So, how would you “fix the code” then.

    I’m asking because this approach is still taken where I work to allow parameters to be passed in as NULL.

    Comment by Paul James — February 14, 2007 @ 10:00 am GMT Feb 14,2007 | Reply

  2. Paul, as a general principle, code the calling program with something like:

    if (variable is null) then
            execute SQL-statement-1
    else
            execute SQL-statement-2
    end if;

    If this is going to give you too many options (2^N for N variables) then restrict the set of statements to those that make the efficient access paths possible and put up with the the fact that some of the filter predicates will still have to use the nvl() construct.

    Comment by Jonathan Lewis — February 14, 2007 @ 11:58 am GMT Feb 14,2007 | Reply

  3. Hi John,

    Tom Kyte keeps saying:

    “Never do procedurally that which can be done in a single sql statement”

    For example on this posting:

    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6407993912330

    or in many of his presentations.

    I believe that is the reason why many people try to write 1 sql statement to return the data they need.

    Thanks

    Comment by Tobias — February 14, 2007 @ 2:03 pm GMT Feb 14,2007 | Reply

  4. Tobias, better yet is this example from http://tkyte.blogspot.com/2006/10/slow-by-slow.html which reads:

    My mantra, that I’ll be sticking with thank you very much, is:

        You should do it in a single SQL statement if at all possible.
        If you cannot do it in a single SQL Statement, then do it in PL/SQL.
        If you cannot do it in PL/SQL, try a Java Stored Procedure.
        If you cannot do it in Java, do it in a C external procedure.

    If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…

    But I don’t think Tom is suggesting you could write the entire application with a single SQL statement. Your reference, and my quote, relate to items comparing set-wise processing and row-by-row processing. I doubt if Tom would have any worries about having a little procedural control over which set-wise operation to use.

    Of course, there are some people who go a bit over the top with the “one SQL statement” approach – but I don’t think that is the common cause for this particular error: it’s more likely to be code-generators, and programmers whose only concern is meeting the specification of content with no regard for performance.

    If you really want to stick to the single statement, then you could simply concatenate (union all)the various bits of SQL, remembering to include the relevant is null and is not null predicates to get complete non-overlapping coverage. But that’s not so easy as nulls are counter-intuitive.

    The name, by the way, is Jonathan, not John.

    Comment by Jonathan Lewis — February 14, 2007 @ 2:42 pm GMT Feb 14,2007 | Reply

  5. One reason people I work with give for doing this, is because then “you only use one cursor instead of two, and that way you benefit from [cursor] caching”.

    I doubt that any caching benefit would outweigh the inefficient execution plans this could give you, but I wanted to throw it out there.

    Comment by Anonymous — February 14, 2007 @ 4:14 pm GMT Feb 14,2007 | Reply

  6. Anonymous, certainly a point worth raising. There are always compromises that have to be made, especially in high-concurrency systems: the skill lies in getting the cost/benefit balanced properly.

    The “bind variables are good for OLTP” argument often gets a little over-used, or abused, in cases where the argument for “a few” specialised cursors has some merit.

    Comment by Jonathan Lewis — February 14, 2007 @ 4:46 pm GMT Feb 14,2007 | Reply

  7. Johnathan,

    Thanks for the answer. I’ve also seen…

    AND ( n1 = :b1
    OR :b1 IS NULL)

    Would this be any better than the original code?

    Comment by Paul James — February 15, 2007 @ 10:17 am GMT Feb 15,2007 | Reply

  8. Paul, a quick test on a single table (9.2.0.8 and 10.2.0.1) suggests that this optimisation is a very special case aimed very precisely at the nvl(:b1,colX) and equivalant decode() statemetns. The equivalent statement using your suggested approach results in a simple tablescan access path, and no concatenation – even when I added the USE_CONCAT hint.

    Comment by Jonathan Lewis — February 15, 2007 @ 3:39 pm GMT Feb 15,2007 | Reply

  9. Hi Jonathan,

    Based on the example provided by you, I do not see the difference, in terms of logical i/o’s, between the query with NVL and without NVL i.e.
    n1=nvl(:b1,n1) the I/O is 14 Gets and with
    n1=:b1, the I/O again is 14 Gets. Infact, when I changed the query to incorporate the changes suggested Paul James, i.e. (n1=:b1 or :b1 is null), the optimizer opted only for a Full Table Scan of T1 and the I/O’s were way high (364 I/O’s).

    Hence, to me it seems the original plan of Split and Concatenation is a good choice.

    What is your say on this ?

    Regards

    Comment by VSharma — March 25, 2008 @ 12:27 pm GMT Mar 25,2008 | Reply


RSS feed for comments on this post.

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: