When it comes to setting the optimizer_mode parameter you often hear people say that first_rows_N (for one of the legal values of N) should be used for OLTP systems and all_rows should be used for decision support and data warehouse systems.
There is an element of truth in the statement – but it’s really a hangover from the early days of CBO, and remembrance of the old first_rows optimizer mode (** See footnote).
Prior to the introduction of parameters like optimizer_index_cost_adj (Oracle 8), and system statistics (Oracle 9 – but optional until 10g), the cost based optimizer often seemed too keen to use execution plans that involved tablescans and hash joins – which could be most unsuitable in an OLTP system.
This problem appeared because the optimizer did not attempt to distinguish between the potential speed of a “db file sequential read” (single block read) compared to a “db file scattered read” (multiblock read). Setting the optimizer_mode to first_rows introduced some simple rules that (to a large extent) stopped the optimizer from using tablescans unless there were no alternatives.
In 9i, however, first_rows_N optimisation is geared towards using arithmetic to make decisions about avoiding tablescans. You can see the results of this in some of the odd numbers that appear in execution plans as the optimizer works out whether, for example, it is better to use an index to pick up 10 rows from a table, or whether running (say) 2.3% of a tablescan (which would probably get the first 10 rows) is the cheaper option.
Think about what this means – first_rows_N is trying to work out the cheapest way of collecting the first N rows of the result set, whereas all_rows is trying to work out the cheapest way of collecting all the rows of the result set.
But when you are running an OLTP system a huge fraction of typical end-user queries will be returning just a handful of rows anyway – and if you have a query that’s only supposed to return 5 or 6 rows (say), then first_rows_10 actually means “more rows than I really expect” i.e. all_rows. If the optimizer works correctly then both options should produce the same execution plan.
Of course some end-user queries are reports (even in an OLTP system), and for reports you don’t want to generate a plan that’s optimised to return the first few lines so, again, you’re really interested in all_rows optimisation.
Your conclusion should therefore be: for OLTP systems, first_rows_N ought to be pretty irrelevant. You should be using all_rows – unless you can demonstrate that this introduces a large number of execution plans that have to be fixed individually because of some unexpected side effects (Again, though, see footnote).
So why would we ever need first_rows_N ?
Think Google, or Amazon, or any of the “forum” software suppliers. There are systems which are effectively ad hoc query systems where users can supply query predicates that could return a lot of data – but the code limits the amount of data sent back to the user to “page at a time”. Think “web-based reporting system” and you’ve identified the basic rationale for first_rows_N.
Even for the web-base reporting system, though, you may still be able to run under all_rows from from 10g onwards. In 10g when the optimizer sees predicates like “rownum <= 35” it will behave as if you had included the hint /*+ first_rows(35) */ in the query.
As a closing thought: even if you think that first_rows_N may be the correct logical and strategic choice for your system, remember that it hasn’t been “field-tested” to the extent that all_rows has been tested.
There are bugs (or side effects) in the optimizer that make their first appearance only in the strange circumstances of “real-world” applications – and it’s a good bet that such oddities are found (and probably fixed) faster in the all_rows environment than in the first_rows_N environment simply because far more systems use all_rows.
Even when first_rows_N seems to be the right choice, you could decide to stick with all_rows and adjust critical statements with a /*+ first_rows(N) */ hint – where N can be any positive integer.
Update (Sept 2009):
A couple of months after reporting the problem in 10.2 the original author re-ran his test case in 11.2 to discover that the problem had disappeared. The analysis by Charles Hooper suggests that Oracle decided that a “rownum <= 1” predicate made the optimizer work out that it needed 1/78th of the final data set – but then erroneously pushed that factor of 1/78 into the cost calculations for an aggregation step that had to run to completion before any data could be reported.
If you still have any systems running 9i or later and using first_rows, you should be planning to change the optimizer_mode when you next go through a full test cycle. As the manuals point out (9.2 Performance Tuning Guide and Reference p1-12): “first_rows is available for backward compatibility and plan stability”.
Backward compatibility is important, and until you enable system statistics, clean up various other optimizer related parameters, sort out your statistics collection strategy, and clean out a load of redundant hints and SQL hacks, you may find that changing the optimizer_mode introduces as many problems as it solves – and that’s why this is a change that needs a lot of testing