Oracle Scratchpad

February 25, 2007

NOT IN

Filed under: CBO,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 8:46 pm BST Feb 25,2007

The “NOT IN” operator can cause confusion – especially when you have nulls lurking in critical columns, and most particularly where subqueries are involved

You can avoid most of the problems if you just remember that “NOT IN” is not the opposite of “IN”. Here’s an example to explain why.

        colX in (value1, value2, value3)

means


        colX = value1
  or    colX = value2
  or    colX = value3

When Oracle is evaluating this predicate, it only takes one of the three components to evaluate to true for the entire statement to evaluate to true.

        colX NOT in (value1, value2, value3)

means


        colX != value1
  AND   colX != value2
  AND   colX != value3

When Oracle is evaluating this predicate, every single one of the components must evaluate to true for the entire predicate to evaluate to true.

But if one of the values involved is a NULL, then that component evaluates to neither true nor false, it evaluates to null – so the whole expression evaluates to null.

So NULL values are allowed and don’t affect the outcome of IN predicates, but do affect the outcome of “NOT IN” predicates. You’ll also notice that in general IN predicates can short-circuit – the tests can stop on the first success (true), but the same is not true for NOT IN. [But see comment 2 and comment 4 below]

For a completely different version of the same argument, including symbolic logic, you might want to check a page on the Co-operative FAQ written in 2003.

13 Comments »

  1. …So…a DB I worked with was right!

    “Don’t use NULL! Write (the string) Null instead!”
    He said!
    :D

    Comment by Antonio — February 26, 2007 @ 8:27 am BST Feb 26,2007 | Reply

  2. Jonathan,

    Surely, NOT IN can short-circuit too…, the anti-join does this, doesn’t it?
    The test stops on the first failure (false): i.e. if it finds the value it is supposed to not find.

    Toon

    Comment by Toon — February 26, 2007 @ 8:38 am BST Feb 26,2007 | Reply

  3. Indeed Jonathan, I shudder to think of how many queries there are out there that are plain wrong because of a NOT IN with NULLs.

    NOT EXISTS will circumvent the semantic difficulties (with potentially a different execution plan), or NVL with the NOT IN (providing the NVLed value can never appear as a legitimate column value). Best of all would be do disallow NULLs on those columns of course.

    Comment by SeanMacGC — February 26, 2007 @ 9:36 am BST Feb 26,2007 | Reply

  4. Toon, you are, of course, correct.

    The IN test short-circuits its workload because it can answer the question “Should I keep the current row” with a YES on the first hit.

    The NOT IN test short-circuits its workload because it can answer the question “Should I keep the current row” with a NO on the first hit.

    I was too busy thinking about the cost of getting the YES, and forgetting that NO can be an equally useful answer.

    Which reflects the mantra – if you want to be efficient it’s best to understand your data.

    Comment by Jonathan Lewis — February 26, 2007 @ 11:46 am BST Feb 26,2007 | Reply

  5. Try these 2 SQL, you will see the result.


    select * from scott.emp
    where comm not in (300,500);

    select * from scott.emp
    where comm not in (Null,300,500);

    Comment by zhu1 (木匠) — February 26, 2007 @ 6:23 pm BST Feb 26,2007 | Reply

  6. Jonathan,

    I must say that from my experience NOT IN is very popular among the developers.
    For them maybe it is the easiest why to write a query that will select records from a table where some field does not belong in some set of values. Very often this set of values is query.
    Although some of them know the effect of having NULL values in the list, many of them do not know how NOT IN can impact the performances.
    A query with NOT IN, if it is not written properly, will perform very badly, even though it is sure that the sub-query cannot return nulls.
    And here the problem is. We know (about the nulls), but does the optimizer knows?
    Such a query will perform a lot of consistent gets, which can be very painful.
    As a workaround we have to make sure that no result column can return null. This can be done either with making the columns not null (which is not always good solution) or using NVL (I read about this some discussion on ASKTOM, but I don’t remember the link). With this workaround optimizer will consider use of ANTI JOIN that will reduce consistent gets dramatically.

    –Mihajlo

    Comment by Mihajlo Tekic — February 26, 2007 @ 6:35 pm BST Feb 26,2007 | Reply

  7. Mihajlo, If it’s legal to do so, then adding “is not null” at both ends of the comparison should allow the optimizer to convert the not in to a not exists, then on to an anti-join.
    In outline:

    where
        colX is not null
    and colX not in (
            select colY from ... where ... and colY is not null
        )

    Comment by Jonathan Lewis — February 26, 2007 @ 9:48 pm BST Feb 26,2007 | Reply

  8. Jonathan, That is just another option.

    Comment by Mihajlo Tekic — February 26, 2007 @ 10:11 pm BST Feb 26,2007 | Reply

  9. [...] Refences Used : “NOT IN in SQL” thread on AskTom Metalink Note:28934.1 “Use of indexes with NOT IN subquery” “Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN,” by Roger Schrag (December 2004) Jonathan Lewis’s NOT IN Blog Entry [...]

    Pingback by Prerequisites for using Oracle’s anti-join access paths « H.Tonguç YILMAZ Oracle Blog — March 2, 2007 @ 6:31 am BST Mar 2,2007 | Reply

  10. You should use NVL, which replace null values by a given value to solve this issue.

    select * from scott.emp
    where NVL(comm,’ ‘) not in (Null,300,500);

    Comment by Purple — April 11, 2008 @ 10:46 am BST Apr 11,2008 | Reply

  11. A google search when trying to tune a NOT EXISTS query had led me to Rogher Schrag’s
    paper at http://www.dbspecialists.com/presentations/semijoins.html
    Worth a good (detailed) reading.

    Comment by Hemant K Chitale — April 17, 2008 @ 6:32 am BST Apr 17,2008 | Reply

  12. Thanks for this info! It helped troubleshoot my issue….
    Keep up the good work!

    Comment by things2note4 — January 7, 2010 @ 4:36 pm BST Jan 7,2010 | Reply

  13. I couldn’t get results from query with WHERE starement “and colx not in (select col from table)”. Thanks for info.

    Comment by peatoo — June 15, 2011 @ 7:55 am BST Jun 15,2011 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,508 other followers