Oracle Scratchpad

December 4, 2019

E-rows / A-rows

Filed under: Execution plans,Oracle — Jonathan Lewis @ 1:17 pm GMT Dec 4,2019

This note was prompted by an error I made at the UKOUG TechFest19 yesterday. It’s fairly well-known that when you read an execution plan that includes the rowsource execution stats – so you get the E-rows (estimated) and A-rows (Actual) reported – then a sensible check of the quality of the optimizer’s calculations is to compare the estimates and actuals allowing for the fact that the E-rows is “per start” and the A-rows is “cumulative”, so A-rows = E-rows * Starts.

The error I made yesterday was to forget that this relationship isn’t always true. In particular partitioning and parallel query introduced the need to be a little flexibility in reading the numbers – which I’ll demonstrate with a coupld of simple examples running under 12.2.0.1


rem
rem     Script:         estimate_actual.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem

create table pt_composite_1 (
        id,
        grp,
        small_vc,
        padding
)
nologging
partition by range(id) 
subpartition by hash (grp)
subpartitions 4
(
        partition p1 values less than (  4000),
        partition p2 values less than (  8000),
        partition p3 values less than ( 16000),
        partition p4 values less than ( 32000),
        partition p5 values less than ( 64000),
        partition p6 values less than (128000)
)
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                            id,
        trunc(rownum/100)                                 grp,
        cast(to_char(trunc(rownum/20)) as varchar2(10))   small_vc,
        cast(rpad('x',100) as varchar2(100))              padding
from
        generator       g1,
        generator       g2
where 
        rownum <= 1e5 -- > comment to avoid WordPress format issue
/

create table t3 
nologging pctfree 80
storage (initial 1M next 1M)
as
select * from pt_composite_1
/

All I’ve done is create a couple of tables with 100,000 rows each – and now I’m going to count the rows and see what I get from the execution plans with rowsource execution stats enabled:


set serveroutput off
alter session set statistics_level = all;

prompt  =================
prompt  Partition effects
prompt  =================

select count(id) from pt_composite_1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ================
prompt  Parallel effects
prompt  ================

select /*+ parallel (t3 4) */ count(id) from t3;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

With a little cosmetic tidying, here are the two execution plans (note that I haven’t used the “last” format option when reporting the parallel plan:


=================
Partition effects
=================

select count(id) from pt_composite_1

-------------------------------------------------------------------------------------------------
| Id  | Operation            | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |      1 |        |      1 |00:00:00.04 |    1866 |
|   1 |  SORT AGGREGATE      |                |      1 |      1 |      1 |00:00:00.04 |    1866 |
|   2 |   PARTITION RANGE ALL|                |      1 |    100K|    100K|00:00:00.04 |    1866 |
|   3 |    PARTITION HASH ALL|                |      6 |    100K|    100K|00:00:00.04 |    1866 |
|   4 |     TABLE ACCESS FULL| PT_COMPOSITE_1 |     24 |    100K|    100K|00:00:00.04 |    1866 |
-------------------------------------------------------------------------------------------------


================
Parallel effects
================

select /*+ parallel (t3 4) */ count(id) from t3

------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |      1 |        |      1 |00:00:00.04 |      20 |      0 |
|   1 |  SORT AGGREGATE        |          |      1 |      1 |      1 |00:00:00.04 |      20 |      0 |
|   2 |   PX COORDINATOR       |          |      1 |        |      4 |00:00:00.04 |      20 |      0 |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE     |          |      4 |      1 |      4 |00:00:00.11 |    8424 |   7692 |
|   5 |      PX BLOCK ITERATOR |          |      4 |    100K|    100K|00:00:00.11 |    8424 |   7692 |
|*  6 |       TABLE ACCESS FULL| T3       |     61 |    100K|    100K|00:00:00.06 |    8424 |   7692 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)

As you can see, the lines specifying partition selection report E-Rows for the whole table, not for any partition-level approximation, so for operations 3 and 4 we shouldn’t multiply Starts by E-rows to compare with A-row. (Starts = 6 for operation 3 because we have 6 partitions, and Start = 24 for operation 4 because at the lowest level we have a total of 24 data segments).

For the parallel query we see the same pattern – every parallel slave reports the expected total number of rows, and every “block iterator” (rowid range) reports the expected total number of rows. Again we see that using multiplication to compare E-rows and A-rows would not be valid.

In fact it’s not just partitioning and parallelism that can cause confusion. Even something as simple as a serial nested loop join has a couple of surprises (largely thanks to the evolution of the mechanics – without a matching adjustment to the execution plans – over time). Here’s a script to generate a couple of tables, which we will then join – hinting various mechanisms for the nested loop.


create table t1
as
select
        rownum           id,
        mod(rownum,100)  n1,
        cast(lpad(rownum,20) as varchar2(20)) v1 
from
        dual
connect by
        level <= 1000 -- > comment to avoid WordPress format issue
;

create table t2
as
select  * from t1
union all
select  * from t1
union all
select  * from t1
;

create index t2_i1 on t2(id);

It’s not a subtle test – as you can see we have 3 rows in table t2 for every row in t1. So let’s pick some t1 rows and join to t2 on id. Again it’s 12.2.0.1:

set serveroutput off
alter session set statistics_level = all;

prompt  ==============================
prompt  Nested loop join (traditional)
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                opt_param('_nlj_batching_enabled', 0)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with prefetch
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_prefetch(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

prompt  ==============================
prompt  Nested loop join with batching
prompt  ==============================

select
        /*+ 
                leading(t1 t2) use_nl_with_index(t2 t2_i1) 
                nlj_batching(t2) 
        */
        t1.v1, t2.v1
from
        t1, t2
where
        t1.n1 = 4
and
        t2.id = t1.id
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

Here are the three plans, with a couple of comments after each – all three queries returned the same 30 *- 10 * 3) rows.


==============================
Nested loop join (traditional)
==============================

--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                        |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|*  2 |   TABLE ACCESS FULL                  | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|   3 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     10 |      3 |     30 |00:00:00.01 |      45 |
|*  4 |    INDEX RANGE SCAN                  | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")

This is the original nested loop structured (apart from the “batched” option that appeared in 12c) and follows the rule/guideline:

  • Operation 2 operates once and returns the 10 rows predicted.
  • Operation 3 is started 10 times by operation 1, with a prediction of 3 rows per start – and the actual comes out at 30 rows.
  • Operation 4 is started 10 times (once for each start of operation 3), with a predication of 3 rowids per start – and the actual comes out at 30 rows
==============================
Nested loop join with prefetch
==============================

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |      3 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS                      |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL                | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN                 | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")
 

Again in the order of rowsource generation

  • Operation 3 starts once with a prediction of 10 rows and the rule works.
  • Operation 4 is started 10 times by operation 2, with a prediction of 3 rows (rowids) per start, and the rule works.
  • Operation 2 was started once by operation 1, with a predication of 30 rows (rowids), and the rule works.
  • Operation 1 starts once, but the prediction is reported as the value you would have got from the original NLJ shape – and breaks the rule.
==============================
Nested loop join with batching
==============================

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |     30 |00:00:00.01 |      60 |
|   1 |  NESTED LOOPS                |       |      1 |     30 |     30 |00:00:00.01 |      60 |
|   2 |   NESTED LOOPS               |       |      1 |     30 |     30 |00:00:00.01 |      30 |
|*  3 |    TABLE ACCESS FULL         | T1    |      1 |     10 |     10 |00:00:00.01 |      15 |
|*  4 |    INDEX RANGE SCAN          | T2_I1 |     10 |      3 |     30 |00:00:00.01 |      15 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2    |     30 |      3 |     30 |00:00:00.01 |      30 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."N1"=4)
   4 - access("T2"."ID"="T1"."ID")


In the order in which rowsources are created

  • Operation 3 starts once with a prediction of 10 rows – and the A-rows matches the rule
  • Operation 4 is started 10 times by opreation 2, with a prediction of 3 rows per start – and the A-rows matches the rule.
  • Operation 5 is started 30 times by operation 1, with a prediction of 3 rows per start – again reporting the value that you would have seen from the original representation of the NLJ, the prediction obviously should be 1 – so the rule is broken again

tl;dr

It is important to remember that the basic rule of “A-rows = starts * E-rows” does not hold for the partition-related lines or the PX related lines of partitioned and parallel execution plans.

You may also find a few other cases where you need be a little cautious about trusting the rule without first thinking carefully about the mechanics of what the shape of the plan is telling you.

3 Comments »

  1. […] is not always true – partitioning and parallel query need extra […]

    Pingback by dbms_xplan | Oracle Scratchpad — December 28, 2019 @ 6:30 pm GMT Dec 28,2019 | Reply

  2. can you explain how does it calculate starts as 61 in the below execution plan at line-6?

    ================
    Parallel effects
    ================
     
    select /*+ parallel (t3 4) */ count(id) from t3
     
    ------------------------------------------------------------------------------------------------------
    | Id  | Operation              | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
    ------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT       |          |      1 |        |      1 |00:00:00.04 |      20 |      0 |
    |   1 |  SORT AGGREGATE        |          |      1 |      1 |      1 |00:00:00.04 |      20 |      0 |
    |   2 |   PX COORDINATOR       |          |      1 |        |      4 |00:00:00.04 |      20 |      0 |
    |   3 |    PX SEND QC (RANDOM) | :TQ10000 |      0 |      1 |      0 |00:00:00.01 |       0 |      0 |
    |   4 |     SORT AGGREGATE     |          |      4 |      1 |      4 |00:00:00.11 |    8424 |   7692 |
    |   5 |      PX BLOCK ITERATOR |          |      4 |    100K|    100K|00:00:00.11 |    8424 |   7692 |
    |*  6 |       TABLE ACCESS FULL| T3       |     61 |    100K|    100K|00:00:00.06 |    8424 |   7692 |
    ------------------------------------------------------------------------------------------------------
    

    Comment by Mohan — December 31, 2019 @ 6:42 am GMT Dec 31,2019 | Reply

    • Mohan,

      Thanks for the question:

      Oracle tries to break an object down into a number of uniform sized sections (called granules) of consecutive blocks. For reasons that are probably historic there is a minimum of 13 granules per slave and (nominally) a maximum of 100 granules per slave. These values appear under the hidden parameters:

      _px_max_granules_per_slave      100         maximum number of rowid range granules to generate per slave
      _px_min_granules_per_slave       13         minimum number of rowid range granules to generate per slave
      

      This is why you quite often see that the number of starts for “small” parallel queries 13 * DOP.

      In your case I’m going to take a guess – based on arithmetic – that your table t3 is in a tablespace with uniform extents of 1MB each. (This is a bit of a stab in the darked based on the fact that 7692 reads / 61 starts = 126.09, which hints at Oracle deciding that 1 granule = 1 extent of 128 blocks (less 2 blocks per extent bitmap) but I’m only trying to show off and could easily be wrong.)

      UPDATE: I’ve just realised that you’re quotimg my example – and it was in a tablespace with uniform 1MB extents. But an extra test on a system-allocated tablespace shows that there’s a little more sophistication than just checking what the sizes are.

      Comment by Jonathan Lewis — December 31, 2019 @ 1:18 pm GMT Dec 31,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.