Oracle Scratchpad

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm GMT Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

11 Comments »

  1. Hi Jonathan,

    I run the same tests under versions 11.2.0.1.0 and 11.2.0.2.0; below the results

    ------------------------------------------------------------------------------------------------------------------------------------
                                                                                                            11.2.0.1.0 
    ------------------------------------------------------------------------------------------------------------------------------------
    query with NVL
    SQL>
    SQL> select  *
      2  from    t1
      3  where   nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) &lt; sysdate
      4  ;
    
    Plan d'exÚcution
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |  2500 |  2500 |    27   (8)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    27   (8)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss')) < SYSDATE@!)
    
    SQL> select  *
      2  from    t1
      3  where   d1 is null or d1 < sysdate
      4  ;
    
    Plan d'exÚcution
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      | 50000 | 50000 |    25   (0)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    25   (0)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("D1" IS NULL OR "D1" < SYSDATE@!) 
    
    Prompt query with NVL
    query with NVL
    SQL>
    SQL> select  *
      2  from    t1
      3  where   nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
      4  ;
    
    Plan d'exÚcution
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |   100 |   100 |    26   (4)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   |   100 |   100 |    26   (4)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss')) < SYSDATE@!)
    
    SQL> prompt  query with OR clause
    query with OR clause
    SQL>
    SQL> select  *
      2  from    t1
      3  where   d1 is null or d1 < sysdate
      4  ;
    
    Plan d'exÚcution
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      | 50000 | 50000 |    25   (0)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    25   (0)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)
    

    for the version 11.2.0.2 it estimates 100 rows

    Thanks,
    Cherif.

    Comment by cherif — April 4, 2014 @ 6:34 pm GMT Apr 4,2014 | Reply

  2. Results in 11.2.0.3

    query with NVL
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      |   100 |   100 |   142   (0)| 02:02:56 |
    |*  1 |  TABLE ACCESS FULL| T1   |   100 |   100 |   142   (0)| 02:02:56 |
    --------------------------------------------------------------------------
    
    query with OR clause
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      | 49900 | 49900 |   142   (0)| 02:02:56 |
    |*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |   142   (0)| 02:02:56 |
    --------------------------------------------------------------------------
    
    

    Comment by Dave Costa — April 4, 2014 @ 8:24 pm GMT Apr 4,2014 | Reply

  3. Cherif, Dave,

    Thanks for doing the tests and reporting the results.
    The temporary appearance of the 100 estimate is a particularly useful result – it’s hard to imagine where is might have come from, unless it’s an error that’s simply based on the number of nulls and nothing else.

    Comment by Jonathan Lewis — April 4, 2014 @ 10:03 pm GMT Apr 4,2014 | Reply

  4. Hi Jonathan, this is the result on a 11.2.0.2

    SQL&gt; set autotrace traceonly expl
    SQL&gt; prompt     query with NVL
    query with NVL
    SQL&gt;
    SQL&gt; select     *
      2  from       t1
      3  where      nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) &lt; sysdate
      4  ;
    
    Piano di esecuzione
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      | 48870 |   429K|    27   (8)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   | 48870 |   429K|    27   (8)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(NVL(&quot;D1&quot;,TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
                  hh24:mi:ss'))
    SQL&gt; prompt     query with OR clause
    query with OR clause
    SQL&gt;
    SQL&gt; select     *
      2  from       t1
      3  where      d1 is null or d1 &lt; sysdate
      4  ;
    
    Piano di esecuzione
    ----------------------------------------------------------
    Plan hash value: 3617692013
    
    --------------------------------------------------------------------------
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |      | 48870 |   429K|    26   (4)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| T1   | 48870 |   429K|    26   (4)| 00:00:01 |
    --------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(&quot;D1&quot; IS NULL OR &quot;D1&quot;&lt;SYSDATE@!)
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    

    Mauri

    Comment by Maurizio — April 7, 2014 @ 8:57 am GMT Apr 7,2014 | Reply

    • Mauri,

      Thanks for supplying the example – which differs significantly from the 11.2.0.2 sample from Cherif further up the page. This puzzled me for a bit, but then I realised that you hadn’t collected stats on the data (dynamic sampling used for this statement …) and that probably explains the results.

      WordPress did it’s thing with “less than” symbols unfortunately, so I’ve lost a bit of your output.

      Comment by Jonathan Lewis — April 9, 2014 @ 10:09 am GMT Apr 9,2014 | Reply

      • Hi Jonathan, I’ve sent my results after playing your scripts as I’ve found on your post.
        Hope this can help you
        Mauri

        Comment by Maurizio — April 9, 2014 @ 12:50 pm GMT Apr 9,2014 | Reply

  5. Please excuse my dullness, but I fail to see a difference among the Cost Estimates of the two queries between the old and new Oracle versions. I see estimates of 18 and 13 for the two Predicate statements in both Oracle 11.1.0.7 and 11.2.0.4.

    I do see that the CPU estimates have changes, as have the estimated rows. Is that what I should be looking at when reading an explain plan?

    Comment by Stew — April 7, 2014 @ 5:57 pm GMT Apr 7,2014 | Reply

    • Stew,

      You were probably fooled by the omission of the word “not” in the parenthetic comment that originally said: “(which I’m interested in at the moment)” (now corrected). The posting is all about cardinality; and when the optimizer picks a plan that you think is obviously wrong it’s the cardinality estimates that are likely to be the most help.

      Comment by Jonathan Lewis — April 9, 2014 @ 10:12 am GMT Apr 9,2014 | Reply

  6. Hello Jonathan! Looks like my English worse than i’ve expected, or there is typo in SQL statements – ” interestingly the size of the error is exactly the number of rows where d1 is null” – in plan we can see 49900 instead of 50000, so size of error is 100, right? “100 estimate is a particularly useful … based on the number of nulls” – mean there are 100 nulls, right? But “when rownum > 100 then null” mean we have 49900 nulls..
    Another thing – because query return 100% rows, statement ““NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)” IMHO too bold – taking border numbers for tests may easily prove that new rule just sounds like “100% for range-based predicate on function(col)” :)

    Comment by Oleg — April 7, 2014 @ 10:04 pm GMT Apr 7,2014 | Reply

    • Oleg,

      Thanks for the comment.

      It’s not your English, I managed to omit two “not”s in one article (see reply to Stew Ashton above) – the description should have been: “where d1 is not null” (now corrected).

      Without wishing to split hairs, you might note that I said “the ESTIMATE for the nvl() sample is now correct” then emphasised the point that it had changed significantly. I made no comment about what calculation may have been applied to arrive at the estimate. In fact I didn’t even make a definitive statement about how 11.1.0.7 was calculating the cardinality – only a comment that is SEEMED to be using a well-known generic rule.

      Comment by Jonathan Lewis — April 9, 2014 @ 10:23 am GMT Apr 9,2014 | Reply

  7. Jonathan,

    Thank you for the clarification.

    > Stew,
    >
    > You were probably fooled by the omission of the word “not” in the parenthetic comment that originally said: “(which I’m interested in > at the moment)” (now corrected). The posting is all about cardinality; and when the optimizer picks a plan that you think is obviously > wrong it’s the cardinality estimates that are likely to be the most help.

    I missed that point, that you were focused on cardinality, not cost.

    Comment by Stew — April 9, 2014 @ 2:02 pm GMT Apr 9,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,305 other followers