Oracle Scratchpad

June 12, 2013

Not In Nasty

Filed under: Bugs,CBO,Oracle — Jonathan Lewis @ 5:31 pm BST Jun 12,2013

Actually it’s probably not the NOT IN that’s nasty, it’s the thing you get if you don’t use NOT IN that’s more likely to be nasty. Just another odd little quirk of the optimizer, which I’ll demonstrate with a simple example (running under 11.2.0.3 in this case):

create table t1
as
select	*
from	all_objects
where	rownum <= 10000
;

update t1 set
	object_type = null
where
	object_type = 'TABLE'
;

commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
;

select
	sample_size, num_nulls, num_distinct, histogram
from
	user_tab_columns
where
	table_name = 'T1'
and	column_name = 'OBJECT_TYPE'
;

This code is going to give you 10,000 rows but the number of distinct values for object_type and the number of nulls will depend on the version and options installed; however you should get about 15 distinct values for object_type and about 1,000 nulls in that column.

If you want to run a query to count the rows where the object_type is not one of: ‘INDEX’,’SYNONYM’, or ‘VIEW’, there are two obvious ways of writing it, so let’s put them into a single query with a UNION ALL, and see what Oracle predicts as the cardinality (cut-n-pasted from an SQL*Plus session):

SQL> set autotrace traceonly explain
SQL> select
  2     count(*)
  3  from       t1
  4  where      object_type != 'INDEX'
  5  and        object_type != 'SYNONYM'
  6  and        object_type != 'VIEW'
  7  union all
  8  select
  9     count(*)
 10  from       t1
 11  where      object_type not in ('INDEX', 'SYNONYM', 'VIEW')
 12  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 575959041

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     2 |    12 |    38  (53)| 00:00:01 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|   2 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  3 |    TABLE ACCESS FULL| T1   |  5903 | 35418 |    19   (6)| 00:00:01 |
|   4 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
|*  5 |    TABLE ACCESS FULL| T1   |  7308 | 43848 |    19   (6)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW')
5 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND "OBJECT_TYPE"<>'VIEW')

Look at operations 3 and 5: the way you write the query makes a significant difference to the cardinality predicted by the optimizer – despite the fact that the NOT IN predicate is transformed internally so that the predicate sections of the two parts of the UNION ALL are exactly the same.

If you’re wondering, the NOT IN option is the one that does the rational arithmetic – it has calculated the number of rows where object_type is not null, then allowed for filtering out 3/15ths of those rows because the query rejects 3 rows out of the 15 distinct values reported by the statistics. The list of inequalities has applied some arithmetic that tries to allow for the nulls three times – the more values you reject the worse the estimate gets.

It’s quite likely that you won’t notice the effect in many cases – but if you run queries against a column with a large percentage of nulls, then the differences can be very large and knock-on effects of this error could be dramatic.

7 Comments »

  1. So… what to use?

    Comment by eve — June 12, 2013 @ 6:25 pm BST Jun 12,2013 | Reply

    • Eve,
      This is a case where you have to consider both the pragmatic and the strategic positions.

      Pragmatic – if you’ve done an upgrade that’s currently misbehaving because it has changed the arithmetic and paths, then you need to revert the behaviour. (The discussion with Martin shows that this is a partial fix of an old error.)

      Strategic – you need to assume that at some stage the bit that is still wrong is going to be changed as well, so you need to anticipate that event and work out if there’s a strategy that does the right thing without introducing an unpleasant hack to your code. Ultimately this may mean creating an SQL Baseline for the plan you want to see and attaching it to the query – alternatively, stacking in a load of hints (the equivalent of the baseline) to the same effect.

      Comment by Jonathan Lewis — June 17, 2013 @ 8:15 am BST Jun 17,2013 | Reply

  2. Jonathan,

    the different costing for NOT IN seems to be something new in 11.2.0.3. In 11.2.0.1 (and 11.2.0.2) I get the same cardinality for both versions:

    BANNER
    ----------------------------------------------------------------------------
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
    PL/SQL Release 11.2.0.1.0 - Production
    CORE    11.2.0.1.0      Production
    TNS for 64-bit Windows: Version 11.2.0.1.0 - Production
    NLSRTL Version 11.2.0.1.0 - Production
    
    SAMPLE_SIZE  NUM_NULLS NUM_DISTINCT HISTOGRAM
    ----------- ---------- ------------ ---------
           9039        961           18 NONE
    
    Plan hash value: 575959041
    ----------------------------------------------------------------------------
    | Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ----------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |      |     2 |    12 |    81  (51)| 00:00:01 |
    |   1 |  UNION-ALL          |      |       |       |            |          |
    |   2 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
    |*  3 |    TABLE ACCESS FULL| T1   |  6221 | 37326 |    40   (0)| 00:00:01 |
    |   4 |   SORT AGGREGATE    |      |     1 |     6 |            |          |
    |*  5 |    TABLE ACCESS FULL| T1   |  6221 | 37326 |    40   (0)| 00:00:01 |
    ----------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND
                  "OBJECT_TYPE"<>'VIEW')
       5 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND
                  "OBJECT_TYPE"<>'VIEW')
    

    Martin

    Comment by Martin Preiss — June 12, 2013 @ 9:44 pm BST Jun 12,2013 | Reply

    • explain plan for 
      select /*+ opt_param('_fix_control','9702850:off') */
             count(*)
      from       t1
      where      object_type != 'INDEX'
      and        object_type != 'SYNONYM'
      and        object_type != 'VIEW'
      union all
      select count(*)
      from       t1
      where      object_type not in ('INDEX', 'SYNONYM', 'VIEW');
      
      select * from table(dbms_xplan.display);
      
      Plan hash value: 575959041                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                                   
      ----------------------------------------------------------------------------                                                                                                                                                                                                                                 
      | Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                                                 
      ----------------------------------------------------------------------------                                                                                                                                                                                                                                 
      |   0 | SELECT STATEMENT    |      |     2 |    14 |    66  (50)| 00:00:01 |                                                                                                                                                                                                                                 
      |   1 |  UNION-ALL          |      |       |       |            |          |                                                                                                                                                                                                                                 
      |   2 |   SORT AGGREGATE    |      |     1 |     7 |            |          |                                                                                                                                                                                                                                 
      |*  3 |    TABLE ACCESS FULL| T1   |  6172 | 43204 |    33   (0)| 00:00:01 |                                                                                                                                                                                                                                 
      |   4 |   SORT AGGREGATE    |      |     1 |     7 |            |          |                                                                                                                                                                                                                                 
      |*  5 |    TABLE ACCESS FULL| T1   |  6172 | 43204 |    33   (0)| 00:00:01 |                                                                                                                                                                                                                                 
      ----------------------------------------------------------------------------                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                                                                   
      Predicate Information (identified by operation id):                                                                                                                                                                                                                                                          
      ---------------------------------------------------                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                                                                   
         3 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND                                                                                                                                                                                                                                        
                    "OBJECT_TYPE"<>'VIEW')                                                                                                                                                                                                                                                                         
         5 - filter("OBJECT_TYPE"<>'INDEX' AND "OBJECT_TYPE"<>'SYNONYM' AND                                                                                                                                                                                                                                        
                    "OBJECT_TYPE"<>'VIEW')                                                                                                                                                                                                                                                                         
      
       20 rows selected 
      
      

      Comment by Dom Brooks — June 13, 2013 @ 3:21 pm BST Jun 13,2013 | Reply

    • Martin,

      Thanks for that – checking “live” versions for consistency is very helpful.

      In this case, with your stats, it looks like the error is present in both cases in the earlier versions, so 11.2.0.3 is displaying a correction for the NOT IN that has yet to be included in the explicit OR expansion. (Your result should be close to 9039 * 15/18 … ca, 7532).

      Comment by Jonathan Lewis — June 14, 2013 @ 6:15 am BST Jun 14,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,090 other followers