Oracle Scratchpad

May 15, 2015

Quiz Night

Filed under: Oracle — Jonathan Lewis @ 8:34 pm BST May 15,2015

How many rows will you return from a single table query if you include the predicate


        rownum > 2

in the where clause.

Warning: this IS a catch question

To make it easier and avoid ambiguity, you may assume the table is the standard SCOTT.EMP table.

 

Part 2:

This posting was prompted by noticing a note that Dominic Brooks posted a few months ago.

Can you supply a workaround for the little oddity he’s described.

Answers

Part 1

I did say that the first part was a catch question. The semi-automatic answer that I would expect most people to give would be “none” – because putting a specific limit on a result is (probably) the reason that most people use rownum, and therefore it’s easy to forget exactly what it’s really doing. The best reply (because it’s also a nice demonstration of an important feature of the run-time engine came from comment 8 by Balazs Papp with the predicate “1 = 1 or rownum > 2”. A disjunct (OR) evaluates to TRUE if any of its components evaluates to TRUE; since 1 is always equal to 1 the check against rownum is irrelevant and the entire data set will be returned. (Conversely, thinking back to an old post of mine about NOT IN subqueries, a conjunct (AND) evaluates to TRUE if only if every individual component evalues to TRUE.)

It’s always instructive to look at execution plans (ideally pulled from memory, and including the rowsoruce execution stats) when you experiment. First the query with just the rownum predicate, then with the “1=1” included.


select * from emp where rownum > 2;

no rows returned

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      0 |00:00:00.01 |       2 |
|   1 |  COUNT              |      |      1 |        |      0 |00:00:00.01 |       2 |
|*  2 |   FILTER            |      |      1 |        |      0 |00:00:00.01 |       2 |
|   3 |    TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM>2)

=======================================================================================

select * from emp where rownum > 2 or 1=1;

14 rows selected.

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     14 |00:00:00.01 |       3 |
|   1 |  COUNT             |      |      1 |        |     14 |00:00:00.01 |       3 |
|   2 |   TABLE ACCESS FULL| EMP  |      1 |     14 |     14 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------

{This space really was blank - no predicates reported]

Note how the first query had Oracle acquire every row (A-rows) and then discard it because the check “rownum > 2” would be false for each row in turn.
Note how the entire predicate section of the second query has disappeared because “1=1” is always true, so the filter is irrelevant. It is an interesting oddity, though, that the redundant COUNT operator still appears in the second plan – presumably an automatic response to the presence of the rownum predicate.

The query I had originally planned to use in my answer modelled a typing error I had seen in a production system that might cause a double take on first sight.  (This is using the standard SCOTT.EMP table again on 11.2.0.4):

select
        rowid, rownum, job
from
        emp 
where   job = 'CLERK' or rownum > 2
;

ROWID                  ROWNUM JOB
------------------ ---------- ---------
AAAvgVAAFAAAAiBAAA          1 CLERK
AAAvgVAAFAAAAiBAAK          2 CLERK
AAAvgVAAFAAAAiBAAL          3 CLERK
AAAvgVAAFAAAAiBAAM          4 ANALYST
AAAvgVAAFAAAAiBAAN          5 CLERK


select
        rowid, rownum, job
from
        emp
where   job = 'ANALYST' or rownum > 2
;

ROWID                  ROWNUM JOB
------------------ ---------- ---------
AAAvgVAAFAAAAiBAAH          1 ANALYST
AAAvgVAAFAAAAiBAAM          2 ANALYST
AAAvgVAAFAAAAiBAAN          3 CLERK

I’ve included the rowid for each row because (although it’s not an assumption you should generally make) it shows you the order the rows appeared in the block. As you can see, after two CLERKS (first query) anything is accepted; after two ANALYSTS (second query) anything is accepted.

Part 2

As Dominic points out in his comments below, the anomaly comes from a predicate “rownum <= nvl(:2, rownum)” which the optimizer applies to every row rather than recognising that when :2 is not null it could use its STOPKEY optimisation. This is a variant of the old problem of the optimizer finding cases where the only way it can get the correct answer in all circumstances is to do something inefficient – a problem I described some time ago in a note on “OR with subquery”. We recognise, of course, that in general we should either use the front end to choose one of two possible queries (depending on whether the bind variable is null or not) or we should rewrite the query as a UNION ALL, taking care that the code eliminates any rows from the later query blocks that have already appeared in the earlier query blocks, and being very careful about dealing with NULLs in the data; for example (setting the bind variable :n to the value 10, and with a table t1 that is a copy of the data from view all_objects, indexed on object_id):


select  *
from    t1
where   object_id <= 100
and     :n is not null
and     rownum <= :n
union all
select  *
from    t1
where   object_id <= 100
and     :n is null
;

--------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |     10 |00:00:00.01 |       5 |
|   1 |  UNION-ALL                     |       |      1 |        |     10 |00:00:00.01 |       5 |
|*  2 |   COUNT STOPKEY                |       |      1 |        |     10 |00:00:00.01 |       5 |
|*  3 |    FILTER                      |       |      1 |        |     10 |00:00:00.01 |       5 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |      1 |     37 |     10 |00:00:00.01 |       5 |
|*  5 |      INDEX RANGE SCAN          | T1_I1 |      1 |     37 |     10 |00:00:00.01 |       3 |
|*  6 |   FILTER                       |       |      1 |        |      0 |00:00:00.01 |       0 |
|   7 |    TABLE ACCESS BY INDEX ROWID | T1    |      0 |     37 |      0 |00:00:00.01 |       0 |
|*  8 |     INDEX RANGE SCAN           | T1_I1 |      0 |     37 |      0 |00:00:00.01 |       0 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(ROWNUM<=:N)
   3 - filter(:N IS NOT NULL)
   5 - access("OBJECT_ID"<=100)
   6 - filter(:N IS NULL)
   8 - access("OBJECT_ID"<=100)

When the bind variable :n is null the optimizer will execute only the second query block; but when :n is not null it will take the first query block, and take advantage of the COUNT STOPKEY optimisation. (I was a little surprised that the operations 2 and 3 were in the order show, I had expected them to be in the opposite order.)

In this example we can engineer the code so that precisely one of the two query blocks executes, so we don’t need to worry about the threat of an overlap that I raised in the note on “Subquery with OR” with its solution of a call to lnnvl().

 

15 Comments »

  1. None

    Comment by Shawn — May 15, 2015 @ 8:39 pm BST May 15,2015 | Reply

  2. I agree with Shawn, none.

    You’re thinking we’ll answer this question:

    select count(*) from
    (
    select rownum from scott.emp
    group by rownum
    having rownum > 2
    );

    which would lead to the answer (number of rows in the table – 2)

    instead of this question:

    select * from scott.emp
    where rownum > 2

    Comment by Lance — May 15, 2015 @ 8:55 pm BST May 15,2015 | Reply

    • Lance,

      Interesting example – I’m just faintly surprised that the optimizer doesn’t trap that “group by rownum” and raise a special case error for it.
      I wasn’t trying to be that cunning and devious, though.

      Comment by Jonathan Lewis — May 16, 2015 @ 12:23 pm BST May 16,2015 | Reply

  3. Such predicate evaluation returns “false” therefore result will be zero rows (no rows will be return)

    Comment by Chinar Aliyev — May 15, 2015 @ 8:56 pm BST May 15,2015 | Reply

  4. The actual sql statement which got my back up initially was a recursive one from stats (fired in that particular case from a weird code path … well, that was the theory anyway):

    delete /*+ dynamic_sampling(4) */
    from sys.wri$_optstat_histhead_history
    where savtime < :1
    and rownum <= NVL(:2, rownum);

    https://orastory.wordpress.com/2015/02/25/understanding-vs-resolution-statistics-history-cleanup/

    Comment by Dom Brooks — May 15, 2015 @ 9:19 pm BST May 15,2015 | Reply

  5. The answer is .. alarm bells ring when I see ROWNUM used as a predicate.

    For code I have written in the past (admittedly the ROWNUM is in a CONNECT BY not a WHERE) at Nominet UK I wrote something like:

    SELECT something
    FROM somewhere
    CONNECT BY ROWNUM <= x

    It produced two different result sets, one containing x+1 rows Oracle 10.1 and one containing x rows in Oracle 10.2, same data set.

    The issue was observed during a database/architecture upgrade, 10.1 SPARC Solaris to 10.2 x64 RHEL.

    I'm still burned by it, thus I'm not prepared to answer your question ;-)

    Comment by Michael D O'Shea — May 15, 2015 @ 9:20 pm BST May 15,2015 | Reply

    • Michael,

      That’s an interesting case – did you get a bug number ?
      Change of version clearly shouldn’t produce change in result – even if the syntax is a little unusual. (Admittedly I often use that rownum <= N rather than level <= N in my code to generate data volume, but I don't think it's "supposed" to work.)

      Comment by Jonathan Lewis — May 16, 2015 @ 12:21 pm BST May 16,2015 | Reply

  6. 0. The first row fetched is always 1 so no rows would ever be returned using a > condition.

    Comment by C. Sneed — May 15, 2015 @ 9:46 pm BST May 15,2015 | Reply

  7. No rows will be returned. The test fails at the first row.

    Comment by jkstill — May 15, 2015 @ 9:46 pm BST May 15,2015 | Reply

  8. Part 1:

    Depends (e.g 1 = 1 OR rownum > 2), or I am missing something.

    Part 2:

    select *
    from   t1
    where  rownum <= (select nvl(:rn, rownum) from dual)
    and    mod(object_id,5) = 0;
    
    Plan hash value: 2845220685
    
    -------------------------------------------------------------------------------------
    | Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
    -------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |      1 |        |     10 |00:00:00.01 |       5 |
    |*  1 |  COUNT STOPKEY     |      |      1 |        |     10 |00:00:00.01 |       5 |
    |*  2 |   TABLE ACCESS FULL| T1   |      1 |  12952 |     10 |00:00:00.01 |       5 |
    |   3 |   COUNT            |      |      1 |        |      1 |00:00:00.01 |       0 |
    |   4 |    FAST DUAL       |      |      1 |      1 |      1 |00:00:00.01 |       0 |
    -------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(ROWNUM<=)
       2 - filter(MOD("OBJECT_ID",5)=0)
    

    Comment by Balazs Papp — May 15, 2015 @ 9:55 pm BST May 15,2015 | Reply

    • Balazs Papp,

      Best possible example for part 1, but part 2 gives the wrong result if :rn is null. The nvl(:rn, rownum) generates the rownum of the subquery, which is always 1, so the query will return at most one row if :rn is null.

      Comment by Jonathan Lewis — May 16, 2015 @ 12:17 pm BST May 16,2015 | Reply

  9. For me all expected answers, not surprised, but would wait for expert comments ;

    SQL> select object_name from cdb_objects where rownum > 2 ;
    no rows selected

    SQL> select object_name from cdb_objects where rownum select count(*) from (select object_name from cdb_objects where rownum > 2 ) ;
    COUNT(*)
    ———-
    0
    SQL> select count(*) from (select object_name from cdb_objects where rownum < 2 ) ;

    COUNT(*)
    ———-
    1

    Regards,
    Pradeep

    Comment by Pradeep — May 16, 2015 @ 8:22 am BST May 16,2015 | Reply

  10. It does seem like the example for part I ( -> where JOB = ‘something’ or rownum > 2) , the results depends on the order of the data and access path, and that makes it indeterministic. A bug or a feature? :)

    Comment by Andy — May 29, 2015 @ 8:57 am BST May 29,2015 | Reply

    • Andy,

      That’s potentially true for any use of rownum:
      e.g.

      select where colX = 1 and rownum <= 5

      depends on choice of execution plan when I have an index on (colX, colA).

      The tablescan and the index range scan could visit the colX = 1 rows in two different orders.

      Comment by Jonathan Lewis — June 1, 2015 @ 4:58 pm BST Jun 1,2015 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.