Oracle Scratchpad

November 12, 2020

rowsets

Filed under: Execution plans,Infrastructure,Oracle,Performance — Jonathan Lewis @ 12:35 pm GMT Nov 12,2020

Here’s a little demonstration of the benefit of rowsets. It started with a very simple question that arrived in my in-box from Kaley Crum.

  • “Why does the query with the UNION ALL take so much longer than the query without the UNION ALL?”

Here are the two queries – surely they should take virtually the same amount of time.

select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me);


select count(*)
from (select /*+ no_merge */
             object_id
      from drop_me
      --
      union all
      --
      select 1 from dual);

Look closely – the difference between the two queries is just the addition through a UNION ALL of “select 1 from dual”. The first query took about 3.5 seconds to complete (there were 70M rows in the table), and the second took nearly 69 seconds.

Here are the execution plans showing the rowsource execution statistics (and Query Block // Object Alias information) – first the baseline query:

QL_ID  0ph1vfuuxkbqb, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me)
 
Plan hash value: 3609429292
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |      1 |        | 26465 (100)|          |      1 |00:00:03.34 |   95896 |  95891 |
|   1 |  SORT AGGREGATE     |         |      1 |      1 |            |          |      1 |00:00:03.34 |   95896 |  95891 |
|   2 |   VIEW              |         |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.32 |   95896 |  95891 |
|   3 |    TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:03.27 |   95896 |  95891 |
--------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SEL$2 / from$_subquery$_001@SEL$1
   3 - SEL$2 / DROP_ME@SEL$2

And then the UNION ALL query:

SQL_ID  0chdajr28y0ub, child number 0
-------------------------------------
select count(*) from (select /*+ no_merge */              object_id    
  from drop_me       --       union all       --       select 1 from
dual)
 
Plan hash value: 3408528233
 
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |      1 |        | 26467 (100)|          |      1 |00:01:18.58 |   95896 |  95891 |
|   1 |  SORT AGGREGATE      |         |      1 |      1 |            |          |      1 |00:01:18.58 |   95896 |  95891 |
|   2 |   VIEW               |         |      1 |     70M| 26467   (2)| 00:00:02 |     70M|00:01:10.84 |   95896 |  95891 |
|   3 |    UNION-ALL         |         |      1 |        |            |          |     70M|00:00:53.13 |   95896 |  95891 |
|   4 |     TABLE ACCESS FULL| DROP_ME |      1 |     70M| 26465   (2)| 00:00:02 |     70M|00:00:19.28 |   95896 |  95891 |
|   5 |     FAST DUAL        |         |      1 |      1 |     2   (0)| 00:00:01 |      1 |00:00:00.01 |       0 |      0 |
---------------------------------------------------------------------------------------------------------------------------
 
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - SET$1 / from$_subquery$_001@SEL$1
   3 - SET$1
   4 - SEL$2 / DROP_ME@SEL$2
   5 - SEL$3 / DUAL@SEL$3

This is not a trick question – I’ve got the SQL to create the data set and run the test; and I’ve run the test through 19.3.0.0 and 12.2.0.1 with results very similar to the above. Here’s a slightly modified version of the script – you’ll notice the column name has changed because I’ve used my usual “large data set” generator rather than scaling up a clone of all_objects:

rem
rem     Script:         rowset_kaley.sql
rem     Author:         Kaley Crum / JP Lewis
rem     Dated:          Oct 2020
rem
rem     Last tested 
rem             19.3.0.0
rem

create table drop_me
pctfree 0
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4    -- > comment to avoid WordPress format issue
)
select
        rownum  n1
from
        generator
cross join
        generator
where rownum <= 7e7
;

alter session set statistics_level = all;

set timing on
set serveroutput off

prompt  ===========================================
prompt  Baseline: Simple query without a union all.
prompt  Runs in 3-4 seconds
prompt  ===========================================

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

prompt  ==========================================
prompt  Add a UNION ALL for one record from dual.
Prompt  Runs in over a minute
prompt  ==========================================

pause Press return

select
         count(*)
from (select /*+ no_merge */
             n1
      from drop_me
      --
      union all
      --
      select 1 from dual);

select *
from table(dbms_xplan.display_cursor(format => 'allstats last advanced'));

There are two factors involved in the massive variation in performance. The first factor is one that the test case will report – though I didn’t show it in the output above, the second is that the test case has enabled rowsource execution statistics.

Here’s the big clue – from the Column Projection Information, which is one of the things that appears with the “advanced” format option (or when you add the “projection” format option) in the call to dbms_xplan.display_xxx()

First for the fast query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]
   2 - (rowset=1019)
   3 - (rowset=1019)

The rowsource passes 1,019 rows at a time from the tablescan operation to the view operation and from the view operation to the sort operation, for a total of roughly 70,000 calls for each of those two steps of the plan.

Compare this to the projection information for the slow UNION ALL query:

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=0) COUNT(*)[22]

I haven’t missed a bit in copying – this really is it. The rows move up the stack one at a time, not in a rowset array. That’s 70,000,000 subroutime calls for each of the two steps of the plan.

I’m sure most of us have heard the mantra “row by row is slow by slow” (or some variant on the theme). This is true all the way down to the internal levels of an execution plan.

Of course 70M calls vs. 70K calls shouldn’t really explain a difference of more than 60 seconds – but when the calls to the O/S for timing information for rowsource execution stats are repeated for each call (or even if it’s only a 1% sample of the calls) down the stack that’s where a lot of time can disappear.

On my laptop, running Oracle 19.3 in a VM, this is how my timing went:

  • Fast query: 1.94 seconds, dropping to 1.79 seconds when I disabled rowsource execution stats.
  • Slow query: 65.91 seconds, dropping to a far less astounding 3.26 seconds when I disabled rowsource execution stats.

So there really is a noticeable difference between row-by-row and array-processing but most of the difference in the original test came from using rowsource execution statistics to measure how much of a difference there would be.

It’s also worth mentioning that this is probably the most extreme case you could produce to show the difference – using the largest possible rowset size with the smallest possible rows when you want to mazimise the gap between (internal) array processing and single row processing – and then processing a very large number of rows. [But isn’t ridiculously large numbers of rows what you do with Exadata?].

If you want further evidence that the difference is due to the rowset size you can always alter session set “_rowsets_enabled”=false; and watch the fast query slow down. It will take about half the time of the slow query as it only has to pass 70M rows up one step of the plan rather than the two steps that are in the UNION ALL plan.

In theory it looks as if you could also restrict the size of the rowset by setting _rowsets_max_rows or _rowsets_target_maxsize, but since the default value for the former is 256 the results above suggest that the settings might be ignored, and when I tried adjusting them at the session level nothing changed.

Another test you could run is to adjust (alter session) the parameter “_rowsource_statistics_sampfreq”, When I set this to 100 the time for the slow query dropped to about 21 seconds (and the fast query – with rowsets disabled – dropped to about 11 seconds).

Footnote

It’s always easy to hit an accidental special case without realising it, so when you’re testing something it’s important to think about what’s special in the example. I came up with two possibilities in this case – dual is always going to be a very special case in any circumstances, and I’ve got a table in my query that is guaranteed to return exactly one row. So I repeated the test with:

  • a real table with one row and primary key in case the reference to dual was disabling rowsets
  • a real table with 100 rows in case this enabled a rowset size of 100

In neither case did Oracle start using array processing.

Footnote 2

One of the patterns I see occasionally follows the form of the slow query above. Some front-end tool executes a select statement that starts with a select {list of desired column headings} from dual so that the front-end code doesn’t have to be programmed to create headings as it receives the data.

Although the impact this would have on rowsets looks like a large percentage of the work done in the database in this case, in more realistic cases it would probably be irrelevant, until someone starts testing with the statistics_level set to all, or _rowsource_execution_statistics explicitly enabled.

There is a related effect, though, from the SQL Monitor feature (which activates automatically for a statement if it’s executing in parallel, or if it’s predicted to take more than 5 seconds to complete). When I added the /*+ monitor */ hint to the two queries (and didn’t fiddle with the rowsets parameter) the fast query averaged 2.26 seconds instead of 1.79 seconds, and the slow query averaged 3.51 seconds instead of 3.26 seconds. I have to say, though, that the variation between executions of the same statement was quite significant compared the variation produced by adding or eliminating the hint.

7 Comments »

  1. Thanks for sharing.

    Comment by emanueol — November 12, 2020 @ 2:50 pm GMT Nov 12,2020 | Reply

  2. Interesting case for double compatibility test. Not relevant but it seems a bit simpler or standard to create the test case with postgres.
    OK for the old recursive syntax (introduced with Oracle 2 ?) but I don’t completely get the need of the “cross join” (faster ?)

    create unlogged table drop_me as select generate_series(1,7e7) n1;
    SELECT 70000000
    Durée : 19011,450 ms (00:19,011) — 19 seconds

    OR

    create unlogged table drop_me as with recursive generator(id) as (values(1) union all select id + 1 from generator where id < 7e7) select id n1 from generator;
    SELECT 70000000
    Durée : 38789,545 ms (00:38,790) — 39 seconds — standard syntax is 2 times slower than proprietary dedicated function and its plan with ProjectSet.

    select count(*) from (select n1 from drop_me) sub;
    — 1s2 — 2 workers = parallel 3

    select count(*) from (select n1 from drop_me union all values(1)) sub;
    — 3s5 — no parallelism

    create table dual as values(1); select count(*) from (select n1 from drop_me union all select 1 from dual ) dm;
    — 3s5 — no parallelism

    set max_parallel_workers_per_gather=6;
    select count(*) from (select n1 from drop_me) sub;
    — 0s7 — 6 workers = parallel 7

    with s1 as (select count(1) co from drop_me), s2 as (select count(1) co from dual) select sum(co) from (select co from s1 union all select co from s2) cot;
    — 0s7 — 6 workers = parallel 7

    I can't find anything related to rowsets in explain analyze output or initialization parameters. Parallelism is not used with the union all subquery, it's the only (big) noticeable difference. About parallelism it would be interesting to test this with pgstrom extension but my GPU is not compatible with version 3.

    Comment by Phil Florent — November 14, 2020 @ 10:07 am GMT Nov 14,2020 | Reply

    • Phil,

      You can ignore the particular choice of mechanism for generating large numbers of rows – there are three or four popular ways of doing it in Oracle, with varying costs. I happen to use the cross-join method because it uses much less memory than the other methods and I just clone template code to as the generator. It’s usually not a significant part of the modelling process anyway.

      I hadn’t tried running the test using parallel execution since the point was to demonstrate the impact of the change in the internal processing mechanism and the way this was exaggerated by the rowsource stats collection. However I’ve just run that test, and the Oracle optimizer doesn’t have a problem figuring out a suitable parallel approach for the union all. (Though the rowsets issue is still present, as is the overhead of gathering rowsource stats).

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — November 17, 2020 @ 8:45 pm GMT Nov 17,2020 | Reply

      • Greg Nancarrow explained me the cause of the non parallel plan. It is caused by an implicit conversion.
        “create unlogged table drop_me_n as select generate_series(1,7e7) n1;” generates numeric values
        “create unlogged table drop_me_i as select generate_series(1,70000000) n1;” generates integer values.

        The queries:
        select count(*) from (select n1 from drop_me_n union all values(1::numeric)) ua;
        select count(*) from (select n1 from drop_me_i union all values(1)) ua;
        give a parallel plan.

        Implicit conversions are always bad, it can prevent the usage of an index with Oracle and it’s also bad with postgres.

        Comment by Phil Florent — November 23, 2020 @ 12:29 pm GMT Nov 23,2020 | Reply

        • Phil,

          That’s an interesting, and slightly surprising, detail.
          In very many versions of Oracle the presence of a function applied to column (in the select list) could result in unexpected serialisation effects, but it would take something a little more serious than simple coercion.
          E.g. from 19.3

          select  /*+ parallel */ sum(oid) 
          from   ( 
                  select id_n oid from t2 
                  union all 
                  select to_number(id_c) from t1
          )
          /
          
          ------------------------------------------------------------------------------------------------------------------
          | Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
          ------------------------------------------------------------------------------------------------------------------
          |   0 | SELECT STATEMENT         |          |       |       |     5 (100)|          |        |      |            |
          |   1 |  SORT AGGREGATE          |          |     1 |    13 |            |          |        |      |            |
          |   2 |   PX COORDINATOR         |          |       |       |            |          |        |      |            |
          |   3 |    PX SEND QC (RANDOM)   | :TQ10000 |     1 |    13 |            |          |  Q1,00 | P->S | QC (RAND)  |
          |   4 |     SORT AGGREGATE       |          |     1 |    13 |            |          |  Q1,00 | PCWP |            |
          |   5 |      VIEW                |          | 20000 |   253K|     5  (20)| 00:00:01 |  Q1,00 | PCWP |            |
          |   6 |       UNION-ALL          |          |       |       |            |          |  Q1,00 | PCWP |            |
          |   7 |        PX BLOCK ITERATOR |          | 10000 | 50000 |     3  (34)| 00:00:01 |  Q1,00 | PCWC |            |
          |*  8 |         TABLE ACCESS FULL| T2       | 10000 | 50000 |     3  (34)| 00:00:01 |  Q1,00 | PCWP |            |
          |   9 |        PX BLOCK ITERATOR |          | 10000 | 60000 |     3  (34)| 00:00:01 |  Q1,00 | PCWC |            |
          |* 10 |         TABLE ACCESS FULL| T1       | 10000 | 60000 |     3  (34)| 00:00:01 |  Q1,00 | PCWP |            |
          ------------------------------------------------------------------------------------------------------------------
          
          

          Comment by Jonathan Lewis — December 4, 2020 @ 2:55 pm GMT Dec 4,2020

        • Jonathan,
          I reported the issue and they started a discussion about it but I guess it won’t be a priority. We noticed PostgreSQL often simply refuses to do implicit conversions anyway. At first it’s a bit annoying for developers but it can also be a good thing. Impossible to have a number variable in a program, a varchar column in the database and an implicit conversion with a to_number() that prevents an index access for example.
          Regards,
          Phil

          Comment by Phil Florent — December 5, 2020 @ 12:11 am GMT Dec 5,2020

  3. Hi Jonathan,
    Yes I obtain an “ORA-30009: Not enough memory for CONNECT BY” operation with the direct syntax (Oracle 19.9 SE2). I don’t obtain such an error with postgres. It could be interesting to see how much memory Oracle really needs for such a recursive (iterative) query. Perhaps it could be a concern it was really mandatory for a real query.
    postgres works a bit like Oracle < 9i about PGA with a per process unique parameter (work_mem = 4M). Per process, not per session. Hence a parallel query could use n X work_mem it has to be considered.

    "SELECT 1 FROM dual" vs "SELECT 1" vs "VALUES(1)"
    The DUAL table is important for double compatibility. I noticed it's not optimal to do a "SELECT something INTO variable FROM dual" in a program , especially in a massive loop. It's more efficient to do a direct variable := something. Last version I tested on this aspect was Oracle 11.2.0.4 SE, perhaps it changed. Since there is no pseudo-table with postgres the syntax can be "SELECT something". It's not standard but it's OK. If a DUAL table was created to obtain double compatibility it could be a performance issue though. The "VALUES(1)" is standard but it can only be used with INSERT by Oracle, VALUES cannot be used as a standalone statement.

    About parallel capabilities I still have to understand why postgres cannot execute the 2nd query with UNION ALL in parallel. Parallel query is something new for an Oracle SE DBA. I read the documentation but it's not very clear. Seems a question on the mailing list is required.
    Thanks, regards,
    Phil

    Comment by Phil Florent — November 18, 2020 @ 9:46 am GMT Nov 18,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.