Oracle Scratchpad

May 5, 2016

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:26 pm GMT May 5,2016

A recent post on the OTN database forum reminded me how easy it is to forget to keep repeating a piece of information after the first couple of hundred times you’ve explained it. No matter how “intuitively obvious” it is for one person, it’s new to someone else.

Here’s an execution plan that raised the question that prompted this note – it comes from calling dbms_xplan.display_cursor() with the ‘allstats last’ format option after enabling rowsource execution statisics (using hint gather_plan_statistics, or setting parameter statistics_level to all, or setting hidden parameter “_rowsource_execution_statistics” to true):

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                        |      1 |        |   1989 |00:00:04.96 |    9280 |    897 |
|   1 |  NESTED LOOPS OUTER                   |                        |      1 |   2125 |   1989 |00:00:04.96 |    9280 |    897 |
|   2 |   NESTED LOOPS OUTER                  |                        |      1 |   2125 |   1989 |00:00:04.93 |    9271 |    895 |
|   3 |    NESTED LOOPS OUTER                 |                        |      1 |   2125 |   1989 |00:00:00.03 |    5732 |      0 |
|   4 |     COLLECTION ITERATOR PICKLER FETCH |                        |      1 |   1989 |   1989 |00:00:00.01 |       0 |      0 |
|*  5 |     TABLE ACCESS BY INDEX ROWID       | TABLE1                 |   1989 |      1 |   1178 |00:00:00.03 |    5732 |      0 |
|*  6 |      INDEX RANGE SCAN                 | IDX_TABLE1             |   1989 |      2 |   2197 |00:00:00.02 |    3545 |      0 |
|   7 |    TABLE ACCESS BY INDEX ROWID        | TABLE2                 |   1989 |      1 |   1178 |00:00:03.26 |    3539 |    895 |
|*  8 |     INDEX UNIQUE SCAN                 | IDX_TABLE2_PK          |   1989 |      1 |   1178 |00:00:03.25 |    2359 |    895 |
|   9 |   TABLE ACCESS BY INDEX ROWID         | TABLE3                 |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
|* 10 |    INDEX UNIQUE SCAN                  | IDX_TABLE3_PK          |   1989 |      1 |      0 |00:00:00.03 |       9 |      2 |
-----------------------------------------------------------------------------------------------------------------------------------

Many people have heard about “tuning by cardinality feedback” (a term first used, I think, by Wolfgang Breitling many years ago), and it’s fairly common knowledge that this requires you to compare the predicted (estimated) E-rows with the actual A-rows generated by a line of the plan. A critical detail that often fails to appear when this point is being explained is that there is a very important difference between the meaning of E-rows and A-rows. Although this is a point I make very clear in my presentations on how to read execution plans, it was only when I found myself explaining it to Tony Hasler a couple of years ago [ed: see comment 3 – at least 6 years ago!] that I realised that it’s not commonly known and that, in casual conversation, it’s easy to overlook the fact that it’s something that you might need to mention explicitly. So here’s the critical detail:

  • E-Rows is the number of rows the optimizer is expecting the operation to produce each time it is called
  • A-Rows is the cumulative number of rows summed across all calls to the operation

As a consequence of this difference any comparison you make should not be comparing A-Rows with E-rows, it should be comparing A-Rows with E-Rows * Starts so, for example,  operation 6 in the plan above (index range scan idx_table1) shows a reasonable estimate because 1,989 starts at 2 rows per start is fairly close to the final 2,197 rows produced (especially when you consider that there’s a lot of integer rounding going on to display the values).

 

6 Comments »

  1. Most excellent, JL! line 8 is also interesting line showing unique scan with starts greater than actual.

    Comment by rsiz — May 5, 2016 @ 7:01 pm GMT May 5,2016 | Reply

    • Mark,

      Thanks for the comment.
      It’s quite surprising how much one could say about even a very simple execution plan once you start to look closely.

      In this case “Rows < Starts" can be explained/justified by the outer join.

      More significant, perhaps, might be the comparison of operations 5 and 6. Because of the outer joins from the rowsource in line 4 (which produced 1989 rows) both operations had to start 1,989 times – but the index range scan produced a total of 2,197 rowids, while the table access produced a total of only 1,178 rows. This tells us we HAD to visit the table 2,197 times but threw away 1,019 rows because of a filter predicate we applied after the reached the table – so we have a handle on how much benefit we might get from adding and extra column to the index.

      At the same time we note that that particular join seems to be to a well buffered table and index, which suggests that the performance threat of the throwaway might be ignorable – especially when compared to the work we do for operation 8 (which makes me wonder if there's something inappropriate about the index definition).

      Comment by Jonathan Lewis — May 6, 2016 @ 9:32 am GMT May 6,2016 | Reply

  2. It’s also important to remember that any operations executed by a parallel slave will not count toward the a-rows when using ‘allstats last’. ‘allstats all’ will give you the information from all executions of the cursor including parallel slaves (and any other executions which had row source level statistics enabled) .

    Comment by exploc — May 5, 2016 @ 7:16 pm GMT May 5,2016 | Reply

    • exploc,

      Thanks for the comment.

      That, in itself, is another little demonstration of how easy it is to NOT say some of the little things that might be very important. In this case a failure to make any comment about the fact that the note was implicitly about a serial query could lead to someone being very confused by trying the same call on a parallel query and seeing lots of zeros.

      The ‘all’ option, by the way, isn’t necessary; ‘allstats’ on its own sums across all executions and that captures the parallel workload; adding ‘all’ means you get the qb_name/alias information and the projection information as well, with the E-bytes and E-time columns added to the plan making it even wider; on the other hand you this does lose the TQ, IN/OUT, and DISTRIBUTION columns from the parallel plan so the nicest formatting option for most people is likely to be ‘allstats parallel’.

      Your comment prompted me to check if I’d ever mentioned the significance of ‘allstats all’ and parallel plans before – which led me to a note I wrote nearly 10 years ago where I first explained the E-rows/A-rows difference.

      Comment by Jonathan Lewis — May 6, 2016 @ 9:41 am GMT May 6,2016 | Reply

  3. Hi Jonathan,

    You must have explained this to me more than a couple of years as I blogged about it in January 2010!

    https://tonyhasler.wordpress.com/2010/01/17/cardinality-estimates-in-the-cbo/

    Are we both getting old :-)

    One other little point that may be worth mentioning (if it isn’t too much of a digression) is that STARTS or LAST_STARTS when applied to a BUFFER SORT or SORT JOIN operation reflects the number of times the sorted data is probed, not the number of times the data is sorted! Similarly, OUTPUT_ROWS/LAST_OUTPUT_ROWS/A-ROWS reflects the number of rows returned from the probes not the number of rows sorted.

    –Tony

    Comment by tonyhasler — May 6, 2016 @ 8:09 pm GMT May 6,2016 | Reply

    • Tony,

      Six years! Where did they go? I’ve just checked your book and even that was published two years ago (if anyone had asked I’d have said about one year).

      It would be interesting to see how many other “surpises” people can spot in comparing E-rows and A-rows.

      The SORT JOIN example introduces another “philosophical” point to the discussion, or maybe a reminder about how and when to pass on bits of information – if I explain point A, and then explain point B, how often do I need to include an explicit explanation of point C when point C is the “obvious” consequence of points A and B ? (The answer, I think, is probably “depends on context, but quite often”).

      If you know that the SORT JOIN is actually “probe an in-memory sorted data set, but acquire it and sort it if it’s not already in memory” ( http://allthingsoracle.com/execution-plans-part-4-precision-and-timing/ ), and know about the “A-Rows = starts * E-Rows” target then the conclusion is self-evident; but sometimes it still needs to be stated explicitly.

      Comment by Jonathan Lewis — May 7, 2016 @ 10:02 am GMT May 7,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.