Oracle Scratchpad

October 28, 2014

First Rows

Filed under: CBO,Hints,Oracle — Jonathan Lewis @ 7:01 am GMT Oct 28,2014

Following on from the short note I published about the first_rows optimizer mode yesterday here’s a note that I wrote on the topic more than 2 years ago but somehow forgot to publish.

I can get quite gloomy when I read some of the material that gets published about Oracle; not so much because it’s misleading or wrong, but because it’s clearly been written without any real effort being made to check whether or not it’s true. For example, a couple of days ago [ed: actually some time around May 2012] I came across an article about optimisation in 11g that seemed to be claiming that first_rows optimisation somehow “defaulted” to first_rows(1) , or first_rows_1, optimisation if you didn’t supply a final integer value.

For at least 10 years the manuals have described first_rows (whether as a hint or as a parameter value) as being available for backwards compatibility; so if it’s really just a synonym for first_rows_1 (or first_rows(1)) you might think that the manuals would actually mention this. Even if the manuals didn’t mention it you might just consider a very simple little test before making such a contrary claim, and if you did make such a test and found that your claim was correct you might actually demonstrate (or describe) the test so that other people could check your results.

It’s rather important, of course, that people realise (should it ever happen) that first_rows has silently changed into first_rows_1 because any code that’s using it for backwards compatibility might suddenly change execution path when you did the critical upgrade where the optimizer changed from “backwards compatibility” mode to “completely different optimisation strategy” mode. So here’s a simple check (run from 11.2.0.4 – to make sure I haven’t missed the switch):

rem
rem     Script:         {xxxxxx}_first_rows.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2012
rem     Purpose:        
rem

begin
        dbms_stats.set_system_stats('MBRC',16);
        dbms_stats.set_system_stats('MREADTIM',10);
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t2 as
select
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    all_objects
where rownum <= 3000
;

begin
        dbms_stats.gather_table_stats(
                user,
                't2',
                method_opt => 'for all columns size 1'
 
        );
end;
/

create index t2_i1 on t2(n1);

SQL> select /*+ all_rows */ n2 from t2 where n1 = 15;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    15 |   120 |    12   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |    15 |   120 |    12   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=15)

You’ll notice that I’ve created my data in a way that means I’ll have 15 rows with the value 15, scattered evenly through the table. As a result of the scattering the clustering_factor on my index is going to be similar to the number of rows in the table, and the cost of fetching all the rows by index is going to be relatively high. Using all_rows optimization Oracle has chosen a tablescan.

So what happens if I use the first_rows(1) hint, and how does this compare with using the first_rows hint ?

SQL> select /*+ first_rows(1) */ n2 from t2 where n1 = 15;

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     2 |    16 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2    |     2 |    16 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T2_I1 |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=15)

SQL> select /*+ first_rows */ n2 from t2 where n1 = 15;

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |    15 |   120 |    16   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T2    |    15 |   120 |    16   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=15)

You might not find it too surprising to see that Oracle used the indexed access path in both cases. Does this mean that first_rows really means (or defaults to) first_rows(1) ?

Of course it doesn’t – you need only look at the estimated cost and cardinality to see this. The two mechanisms are clearly implemented through different code paths. The first_rows method uses some heuristics to restrict the options it examines but still gives us the estimated cost and cardinality of fetching all the rows using the path it has chosen. The first_rows(1) method uses arithmetic to decide on the best path for getting the first row, and adjusts the cost accordingly to show how much work it thinks it will have to do to fetch just that one row.

Lagniappe

Of course, no matter how inane a comment may seem to be, there’s always a chance that it might be based on some (unstated) insight. Is there any way in which first_rows(n) and first_rows are related ? If so could you possibly manage to claim that this establishes a “default value” link?

Funnily enough there is a special case: if you try hinting with first_rows(0) – that’s the number zero – Oracle will use the old first_rows optimisation method – you can infer this from the cost and cardinality figures, or you can check the 10053 trace file, or use a call to dbms_xplan() to report the outline.  It’s an interesting exercise (left to the reader) to decide whether this is the lexical analyzer deciding to treat the “(0)” as a new – and meaningless – token following the token “first_rows”, or whether it is the optimizer recognising the lexical analyzer allowing “first_rows(0)” as a token which the optimizer is then required to treat as first_rows.

Mind you, if you only want the first zero rows of the result set there’s a much better trick you can use to optimise the query – don’t run the query.

Update  (April 2021)

A recent question on the Oracle-L listserver asking whether the optimizer_mode should be left at first_rows when upgrading to 19c on Exadata left me browsing through the many notes I’d written about first_rows and first_row(n); and when I got to this one I realised that 19c could answer the question:

“Is first_rows(0) deliberately the same of first_rows, or is it the lexer treating the (0) as a separate token.”

When I re-ran my script and included one version of the SQL with the hint /*+ first_rows(1.5) */ the plan showed the following hint report:

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (E - Syntax error (1))
---------------------------------------------------------------------------

   1 -  SEL$1
         E -  first_rows(

On the other hand, when my hint was /*+ first_rows(0) */ there was no Hint Report until I requested a “full” hint report, at which point I got the following:


Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1
---------------------------------------------------------------------------

   0 -  STATEMENT
           -  first_rows(0)

So first_rows(0) is not an error, and it produces the same plan as first_rows.

3 Comments »

  1. just out of curiosity: how did you get the idea to check “first_rows(0)”? Sounds to me like an oxymoron…

    I did a little check on 12.1.0.2 and got the following results:

    delete from plan_table;
    
    explain plan set statement_id '0' for select /*+ first_rows(0) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '1' for select /*+ first_rows(1) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '2' for select /*+ first_rows(2) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '3' for select /*+ first_rows(3) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '6' for select /*+ first_rows(6) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '12' for select /*+ first_rows(12) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '14' for select /*+ first_rows(14) */ n2 from t2 where n1 = 15;
    explain plan set statement_id '15' for select /*+ first_rows(15) */ n2 from t2 where n1 = 15;
    
    select statement_id
         , operation
    	 , options
    	 , cost
    	 , cardinality
      from plan_table
     where id = 1
     order by to_number(statement_id);
    
    STATEMENT_ID    OPERATION            OPTIONS                              COST CARDINALITY
    --------------- -------------------- ------------------------------ ---------- -----------
    0               TABLE ACCESS         BY INDEX ROWID BATCHED                 16          15
    1               TABLE ACCESS         BY INDEX ROWID BATCHED                  3           2
    2               TABLE ACCESS         BY INDEX ROWID BATCHED                  4           2
    3               TABLE ACCESS         BY INDEX ROWID BATCHED                  5           3
    6               TABLE ACCESS         BY INDEX ROWID BATCHED                  7           6
    12              TABLE ACCESS         BY INDEX ROWID BATCHED                 13          12
    14              TABLE ACCESS         BY INDEX ROWID BATCHED                 15          14
    15              TABLE ACCESS         FULL                                   13          15
    

    So it seems that all the strange values for first_rows_n are indeed valid parameters – though the documentation (http://docs.oracle.com/database/121/REFRN/refrn10145.htm#REFRN10145) states: “The optimizer uses a cost-based approach and optimizes with a goal of best response time to return the first n rows (where n = 1, 10, 100, 1000).”

    A look at the CBO traces for the different queries also shows that the different values are indeed regarded as valid parameters and that 0 is transformed into first_rows – as you already mentioned:

    — trace with event 10053
    select /*+ first_rows(0) */ n2 from t2 where n1 = 15;
    select /*+ first_rows(12) */ n2 from t2 where n1 = 15;
    select /*+ first_rows(18) */ n2 from t2 where n1 = 15;

    — grep atom_hint
    atom_hint=(@=0x75bf08b8 err=0 resol=0 used=1 token=453 org=1 lvl=1 txt=FIRST_ROWS )
    atom_hint=(@=0x799c99a8 err=0 resol=0 used=1 token=453 org=1 lvl=1 txt=FIRST_ROWS (12) )
    atom_hint=(@=0x6e1814f8 err=0 resol=0 used=1 token=453 org=1 lvl=1 txt=FIRST_ROWS (18) )

    Martin

    Comment by Martin Preiss — October 28, 2014 @ 9:43 am GMT Oct 28,2014 | Reply

    • Martin,

      The answer to the first question is that I just asked myself in what way the error might possibly be close to true.

      If you check the manuals for the hint (at least the 9.2 manual) you’ll find that N is allowed to be any value, it’s only the parameter that has a limited set of values.

      But is the translation to first_rows deliberate, or a side effect of ignoring an “invalid following token” ?

      Comment by Jonathan Lewis — October 28, 2014 @ 10:05 am GMT Oct 28,2014 | Reply

      • Jonathan,
        thank you for the explanantion. And the documentation in 12c still says that n can be any value in the hint: http://docs.oracle.com/database/121/SQLRF/sql_elements006.htm#BABGCCFG. Obviously I did not think too much about the difference between the hint and the parameter…

        Regarding the question if the translation is deliberate or not, I can add that:

        /*+ first_rows(0.0) */ is a valid hint and results in the same plan as first_rows

        No valid hints are the following variants:
        /*+ first_rows(0.1) */
        /*+ first_rows(-1) */
        /*+ first_rows(A) */
        /*+ first_rows(‘A’) */

        I think the invalid versions are not a surprise but 0.0 could be interpreted as a sign of a deliberate decision for the number zero case – in my understanding.

        Comment by Martin Preiss — October 28, 2014 @ 12:11 pm GMT Oct 28,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.