Oracle Scratchpad

December 10, 2013

Subquery

Filed under: Oracle,subqueries,Tuning — Jonathan Lewis @ 6:26 pm GMT Dec 10,2013

How not to write subqueries:

AND     sal.ticket_airline || sal.ticket_number NOT IN (
                SELECT sub.dsd_airline || sub.dsd_ticket_number
                FROM   ...
        )

If there had been any opportunity for the optimizer to do something clever with the NOT IN, you’ve just made sure it can’t happen. On top of that you may find that you don’t get the right results – consider the following cut-n-paste:

SQL> select user from dual where 1 || 23 = 12 || 3;

USER
------------------------------
TEST_USER

1 row selected.

Sometimes people simply forget that you can have multiple columns in subqueries (or in “IN Lists”) – so it’s perfectly valid to write the subquery as:

AND     (sal.ticket_airline, sal.ticket_number) NOT IN (
                SELECT sub.dsd_airline, sub.dsd_ticket_number
                FROM   ...
        )

It’s quite likely that Oracle will actually turn this into a NOT EXISTS, or ANTI-JOIN, of course. But if it doesn’t do something nice you could try doing a manual rewrite – provided it is actually logically equivalent:

AND     not exists (
                select  null
                from    ....
                where   sub.dsd_airline    = sal.ticket_airline
                and     sub.dsd_ticket_number = sal.ticket_number
        )

Remember: NOT IN may not translate to NOT EXISTS – see also this.

6 Comments »

  1. Hi Jonathan,
    good point. I think there is a little typo in the last where clause. It should be sal.ticket_airline.
    Cheers Mathias

    Comment by Mathias Zarick — December 10, 2013 @ 6:52 pm GMT Dec 10,2013 | Reply

  2. Dear Jonathan,

    Trust you are fine. I was going through one of the queries and below is another interesting case where I believe the optimizer is not able to use Anti Join.

    I have tried to make sure the post is with correct formatting, if it is not the case, please just post your comments and I will send you the test case via mail.

    This is on 11.2.0.2.0 – 64bit on AIX.

    
    create table teststatus(col1 number, status varchar2(30));
    
    create table teststatus1(col1 number, status varchar2(30));
    
    insert into teststatus select rownum rn, decode(mod(rownum,3),0,'complete',1,'scx',2,'exs')
    from dual connect by level <= 90000;
    
    insert into teststatus1 select rownum rn, decode(mod(rownum,2),0,'qualify',1,'accepted')
    from dual connect by level <= 30000;
    
    insert into teststatus1 select 30000+rownum,30000+rownum from dual connect by level <= 1000;
    
    commit;
    
    set line 200
    set pages 5000
    
    -- Query1
    
    SQL> select /*+ gather_plan_statistics */  count(*) from teststatus1 t1
      2   where not exists
      3   ( select null from teststatus t2
      4     where decode(t1.status,'accepted','complete') = t2.status
      5    or decode(t1.status,'qualify','complete') = t2.status
      6    or decode(t1.status,'qualify','scx') = t2.status
      7   or decode(t1.status,'qualify','exs') = t2.status
      8   );
    
      COUNT(*)
    ----------
          1000
    
    Elapsed: 00:02:24.95
    
    select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    Plan hash value: 1341772356
    
    ---------------------------------------------------------------------------------------------
    | Id  | Operation           | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |             |      1 |        |      1 |00:02:26.98 |     236K|
    |   1 |  SORT AGGREGATE     |             |      1 |      1 |      1 |00:02:26.98 |     236K|
    |*  2 |   FILTER            |             |      1 |        |   1000 |00:02:27.64 |     236K|
    |   3 |    TABLE ACCESS FULL| TESTSTATUS1 |      1 |  29662 |  31000 |00:00:00.01 |      84 |
    |*  4 |    TABLE ACCESS FULL| TESTSTATUS  |   1002 |    867 |      2 |00:02:26.95 |     236K|
    ---------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter( IS NULL)
       4 - filter(("T2"."STATUS"=DECODE(:B1,:SYS_B_0,:SYS_B_1) OR
                  "T2"."STATUS"=DECODE(:B2,:SYS_B_2,:SYS_B_3) OR
                  "T2"."STATUS"=DECODE(:B3,:SYS_B_4,:SYS_B_5) OR
                  "T2"."STATUS"=DECODE(:B4,:SYS_B_6,:SYS_B_7)))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    
    -- Query2
    
    SQL>
    SQL>   select /*+ gather_plan_statistics */ count(*) from teststatus1 t1
      2    where not exists
      3    ( select null from teststatus t2
      4      where decode(t1.status,'accepted','complete','qualify','complete') = t2.status
      5    );
    
      COUNT(*)
    ----------
          1000
    
    Elapsed: 00:00:00.17
    
    
    select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));
    
    
    Plan hash value: 2707617272
    
     --------------------------------------------------------------------------------------------
    | Id  | Operation           | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |             |      1 |        |      1 |00:00:00.04 |     320 |
    |   1 |  SORT AGGREGATE     |             |      1 |      1 |      1 |00:00:00.04 |     320 |
    |*  2 |   HASH JOIN ANTI    |             |      1 |  19775 |   1000 |00:00:00.04 |     320 |
    |   3 |    TABLE ACCESS FULL| TESTSTATUS1 |      1 |  29662 |  31000 |00:00:00.01 |      84 |
    |   4 |    TABLE ACCESS FULL| TESTSTATUS  |      1 |  86655 |  90000 |00:00:00.02 |     236 |
    ---------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("T2"."STATUS"=DECODE("T1"."STATUS",:SYS_B_0,:SYS_B_1,:SYS_B_2,:SYS_B_3))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    
    

    With respect to the above, I have below observations and questions. It would be really great if you could shed some light on the same.

    1) I believe in Query1 due to redundant decode() the optimizer was not able to use ANTI join, but in Query2 due to only 1 decode() (which is the way it should have been) the optimizer was able to use ANTI join.

    2) In execution plan for Query1, line 4. I am not able to understand why the optimizer shows 1002 in starts and 2 in A-rows.

    Thanking you.

    Warm Regards,
    Vijay Sehgal

    Comment by vijay sehgal — December 17, 2013 @ 12:04 pm GMT Dec 17,2013 | Reply

    • Your assumption is nearly correctly – you don’t get the anti-join because of the OR (rather than because of the decode() of itself). A check of the 10053 trace shows the following:

      SU: Considering subquery unnest on query block SEL$1 (#1).
      SU:   Checking validity of unnesting subquery SEL$2 (#2)
      SU:     SU bypassed: Invalid correlated predicates.
      SU:   Validity checks failed.
      

      The second question is a demonstration of subquery caching – you only have two different input values for the first 30,000 rows, and they each find one row result which goes into the subquery cache, then you have 1000 distinct value that return no rows.

      Comment by Jonathan Lewis — December 17, 2013 @ 12:53 pm GMT Dec 17,2013 | Reply

      • Jonathan,

        Thanks a lot for your help on the above, I have one more related question.

        Is it possible to determine the number of times a filter subquery is executed other than using gather_plan_statisics ?

        Warm Regards,
        Vijay Sehgal

        Comment by vijay sehgal — December 18, 2013 @ 1:32 pm GMT Dec 18,2013 | Reply

        • Not really – although you could estimate if you knew your data well and had some figures about the total activity of the query.
          (The 10046 trace could also help if it dumped STAT lines on every execution, but that’s not much different from using the hint and not so convenient.)

          Comment by Jonathan Lewis — December 18, 2013 @ 2:14 pm GMT Dec 18,2013


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: