Oracle Scratchpad

November 11, 2008

first_rows_n

Filed under: CBO, Execution plans, Hints, Performance, Tuning — Jonathan Lewis @ 1:05 pm UTC Nov 11,2008

When it comes to setting the optimizer_mode parameter you often hear people say that first_rows_N (for one of the legal values of N) should be used for OLTP systems and all_rows should be used for decision support and data warehouse systems.

There is an element of truth in the statement – but it’s really a hangover from the early days of CBO, and remembrance of  the old first_rows optimizer mode (** See footnote).

Prior to the introduction of parameters like optimizer_index_cost_adj (Oracle 8), and system statistics (Oracle 9 – but optional until 10g), the cost based optimizer often seemed too keen to use execution plans that involved tablescans and hash joins – which could be most unsuitable in an OLTP system.

This problem appeared because the optimizer did not attempt to distinguish between the potential performance of a “db file sequential read” (single block read) compared to a “db file scattered read” (multiblock read).  Setting the optimizer_mode to first_rows introduced some simple rules that (to a large extent) stopped the optimizer from using tablescans unless there were no alternatives.

In 9i, however, first_rows_N optimisation is geared towards using arithmetic to make decisions about avoiding tablescans. You can see the results of this in some of the odd numbers that appear in execution plans as the optimizer works out whether, for example, it is better to use an index to pick up 10 rows from a table, or whether running 2.3% of a tablescan (which would probably get the first 10 rows) is the cheaper option.

Think about what this means – first_rows_N is trying to work out the cheapest way of collecting the first N rows of the result set, whereas all_rows is trying to work out the cheapest way of collecting all the rows of the result set.

But when you are running an OLTP system a huge fraction of typical end-user queries will be returning just a handful of rows – and if you have a query that’s only supposed to return 5 or 6 rows (say), then first_rows_10 actually means “all the rows I expect”, i.e. all_rows. If the optimizer works correctly, then both options should produce the same execution plan.

Of course, some end-user queries (even in an OLTP system) are reports, and for reports you don’t want to generate a plan that’s optimised to return the first few lines so, again, you’re really interested in all_rows optimisation.

Your conclusion should therefore be: for OLTP systems, first_rows_N ought to be irrelevant. You should be using all_rows – unless you can demonstrate that this introduces a large number of execution plans that have to be fixed individually because of some unexpected side effects (Again, though, see footnote). 

So why would we ever need first_rows_N ?

Think Google, or Amazon, or any of the “forum” software suppliers.  There are systems which are effectively ad hoc query systems where users can supply query predicates that could return a lot of data – but the code limits the amount of data sent back to the user to “page at a time”. Think “web-based reporting system”, and you’ve identified the basic rationale for first_rows_N.

Even for the web-base reporting system, though, you may still be able to run under all_rows from from 10g onwards. In 10g when the optimizer sees predicates like “rownum <= 35″ it will behave as if you had included the hint /*+ first_rows(35) */ in the query.

As a closing thought: even if you think that first_rows_N may be the correct logical and strategic choice for your system, remember that it hasn’t been “field-tested” to the extent that all_rows has been tested.

There are bugs (or side effects) in the optimizer that make their first appearance only in the strange circumstances of “real-world” applications – and it’s a good bet that such oddities are found (and probably fixed) faster in the all_rows environment than in the first_rows_N environment simply because far more systems use all_rows.

Even when first_rows_N seems to be the right choice, you could decide to stick with all_rows and adjust critical statements with a /*+ first_rows(N) */ hint.

Footnote: If you still have any systems running 9i or later and using first_rows, you should be planning to change the optimizer_mode when you next go through a full test cycle. As the manuals point out (9.2 Performance Tuning Guide and Reference p1-12): first_rows is available for backward compatibility and plan stability”.

Backward compatibility is important, and until you enable system statistics, clean up various other optimizer related parameters, sort out your statistics collection strategy, and clean out a load of redundant hints and SQL hacks, you may find that changing the optimizer_mode introduces as many problems as it solves – and that’s why this is a change that needs a lot of testing

8 Comments »

  1. Very interesting post Jonathan – really made me stop and think. The fact that ALL_ROWS is the default (since at least 10.2) lends credence to your argument. I think the comment on the amount “testing” that ALL_ROWS gets vs. any other option is also important. The middle of the pack is a much safer place to be than on the fringe.

    Comment by Kerry Osborne — November 12, 2008 @ 5:32 pm UTC Nov 12,2008 | Reply

  2. Jonathan, allow me to disagree. Most of the web-based reporting systems I run across don’t use a ‘where rownum <= n’ predicate at all. They perform a select, display the first n rows, and start ‘walking’ (i.e continue fetching) the result set when the user hits the next page button.

    Using a ‘rownum <= n’ predicate would force the application to re-execute the query, with rownum <= 2*n, when the next-page button is hit. It would fetch-and-throwaway the first n rows to get to the next page in the result set. The next ‘next-page’ hit would cause a rownum <= 3n predicate, throw away 2n rows, and so on.

    I would say, FIRST_ROWS_N is especially useful for web-based reporting systems. Not using a ‘rownum < n’ predicate avoids many unneeded re-executions of queries, combined with increasing amounts of throw-away fetching. I can be wrong of course, e.g. because I’m unaware of an efficient skip-fetch feature?

    Comment by Carel-Jan — November 14, 2008 @ 1:45 pm UTC Nov 14,2008 | Reply

  3. Carel-Jan,

    you might want to take a look at the pagination technique described by Tom Kyte: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:127412348064

    Comment by Martin Decker — November 16, 2008 @ 5:58 pm UTC Nov 16,2008 | Reply

  4. Carel-Jan,
    You’re allowed to disagree. There are at least three main versions for the web-based report:

    a) Fetch all the data into the client in a single call, but let the client display it a page at a time.

    b) Fetch all the data into the mid-tier in a single call, and let the client fetch the data from the mid-tier in pages

    c) Allow the client to fetch the data from the database in pages.

    The option I’ve seen most frequently is (c), but that doesn’t stop you seeing (a) or (b) more frequently.

    There is no generic “skip fetch” option – but Martin Decker has given you a link to Tom Kyte and I’ve published a couple of notes on mechanisms that can limit the damage.

    The strategy assumes that the user won’t want to page through all the data – so the trade-off is between: several small queries (with increasing throwaway) and stopping early, compared to acquiring and moving all the data to the client.

    Of course if you’re expecting the client to fetch the entire data set in a single call then it makes more sense to fetch the entire report in one go. But in that case I’d expect to run all_rows – because you want the database to get all the data with the least resource usage.

    Comment by Jonathan Lewis — November 17, 2008 @ 10:14 pm UTC Nov 17,2008 | Reply

  5. There is also a good description of possible pagination techniques on this page: http://www.inf.unideb.hu/~gabora/pagination/

    Comment by Herald — November 20, 2008 @ 8:48 pm UTC Nov 20,2008 | Reply

  6. Jonathan, you mentioned that

    “In 10g when the optimizer sees predicates like “rownum <= 35″ it will behave as if you had included the hint /*+ first_rows(35) */ in the query."

    Wouldn’t the number 35 in the predicate usually be a bind variable? In that case, how would Oracle know to use the hint /*+ first_rows(35) */?

    Comment by peter — January 29, 2009 @ 5:58 pm UTC Jan 29,2009 | Reply

    • Peter,

      Bind variable – highly likely, I should think – but with bind variable peeking this means the assumed hint would always be for the notional “page 1″ – biasing Oracle to the most efficient strategy for a relatively small number of rows.

      (There is a hidden parameter that seems to be relevant: _optimizer_rownum_bind_default which defaults to 10, but a quick test shows that when the optimizer can peek it uses the peeked value).

      Comment by Jonathan Lewis — January 30, 2009 @ 6:26 pm UTC Jan 30,2009 | Reply

  7. [...] Blog from Jonathan Lewis about First Rows vs. All Rows for the Oracle Optimizer [...]

    Pingback by Three Unrelated But Interesting Blogs « Seven Seconds — July 7, 2009 @ 3:55 am UTC Jul 7,2009 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.