Oracle Scratchpad

March 4, 2012

Count(*)

Filed under: Execution plans,Oracle,Performance,trace files — Jonathan Lewis @ 5:41 pm BST Mar 4,2012

A recent posting on the comp.databases.oracle.server newsgroup pointed me to a rather elderly Ask Tom question (originally posed in July 2004, last updated June 2011) where Tom produced an extraordinary observation. The response times for the following two queries are completely different (on Oracle 9.2 on his data set):

set autotrace traceonly explain

select count(*)
from   ( select null from big_table
         union all
         select null from big_table
       )
/

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1)
   1    0   SORT (AGGREGATE)
   2    1     VIEW (Cost=8 Card=8000000)
   3    2       UNION-ALL
   4    3         INDEX (FAST FULL SCAN) OF 'FOO' (NON-UNIQUE) (Cost=4 Card=4000000)
   5    3         INDEX (FAST FULL SCAN) OF 'FOO' (NON-UNIQUE) (Cost=4 Card=4000000)

select sum(cnt )
from   ( select count(*) cnt from big_table
         union all
         select count(*) cnt from big_table
       )
/

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8 Card=1 Bytes=13)
   1    0   SORT (AGGREGATE)
   2    1     VIEW (Cost=8 Card=2 Bytes=26)
   3    2       UNION-ALL
   4    3         SORT (AGGREGATE)
   5    4           INDEX (FAST FULL SCAN) OF 'FOO' (NON-UNIQUE) (Cost=4 Card=4000000)
   6    3         SORT (AGGREGATE)
   7    6           INDEX (FAST FULL SCAN) OF 'FOO' (NON-UNIQUE) (Cost=4 Card=4000000)

The second query took 14 seconds compared to 52 seconds for the first query.
At first my response was stark disbelief – but Tom had, of course, provided the evidence and after a couple of minutes thought I realised why the difference has appeared. Before explaining what’s happened, I’ll just reproduce the (slightly stripped) tkprof outputs that Tom showed us.

select count(*)
  from ( select null from big_table
         union all
         select null from big_table )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     46.32      52.72      44494      44630          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     46.32      52.72      44494      44630          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=44630 r=44494 w=0 time=52726468 us)
8000000   VIEW  (cr=44630 r=44494 w=0 time=46390678 us)
8000000    UNION-ALL  (cr=44630 r=44494 w=0 time=33556917 us)
4000000     INDEX FAST FULL SCAN FOO (cr=22315 r=22247 w=0 time=4464467 us)(object id 128577)
4000000     INDEX FAST FULL SCAN FOO (cr=22315 r=22247 w=0 time=3447898 us)(object id 128577)
********************************************************************************

select sum(cnt )
  from ( select count(*) cnt from big_table
         union all
         select count(*) cnt from big_table )

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     12.31      14.17      44494      44630          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     12.31      14.17      44494      44630          0           1

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=44630 r=44494 w=0 time=14174206 us)
      2   VIEW  (cr=44630 r=44494 w=0 time=14174178 us)
      2    UNION-ALL  (cr=44630 r=44494 w=0 time=14174171 us)
      1     SORT AGGREGATE (cr=22315 r=22247 w=0 time=7594295 us)
4000000      INDEX FAST FULL SCAN OBJ#(128577) (cr=22315 r=22247 w=0 time=3969944 us)(object id 128577)
      1     SORT AGGREGATE (cr=22315 r=22247 w=0 time=6579839 us)
4000000      INDEX FAST FULL SCAN OBJ#(128577) (cr=22315 r=22247 w=0 time=3545406 us)(object id 128577)

Update – 6th March

The comments from Tanel cover most of the interesting information in this example – but I’m going to write up an answer anyway since┬áthere are a couple of related points worth mentioning. First, the sort in the sort aggregate lines is not actually doing any sorting, Oracle is capable of keeping a simple running count in this particular case. In fact, even in cases where there is a genuine aggregation (multi-row result) the volume of data that goes into the sort area is related to the number of different values aggregated rather than the number of rows input – so the amount of memory used for a sort aggregate operation can be very small compared to the volume of data processed. (See this note for an example of how Oracle can handle one type of special case involving sorting)

As far as timing is concerned – the sort aggregate operator has to count the input from its child operation – and in the first case the input for the count is the output of the view, so the view has to be accumulating 8 million rows of “something” from its child operation, which is the union all, and the union all is getting 4 million rows of “something” from each of its two children. The code spends a lot of time sending millions for rows through the “pipelines” that connect operators Doing a quick count – 8 million rows go into the union all pipeline, then 8 million go into the view pipeline, then 8 million go into the sort aggregate pipeline for a total of 24 million rows of “something” being passed.

In the second case the sort aggregate operations at lines 4 and 6 have to accumulate “something” from their respective child operations at lines 5 an 7 – but thatt’s the limit of data being passed through pipelines – i.e. a total of 8 million rows going into a pipeline. So it’s not too surprising that the amount of CPU used is roughly one third of the CPU used by the first case – we’re passing one third of the rows. (Apart from this simpluy difference in volume there may even be some special case code that Oracle uses in the second case when it knows that the operation is simply a variation on “count number of rows in table”.)

Footnote: I ran into Tom Kyte yesterday at Hotsos 2012, and as a consequence I would like to point out that it is the question that is rather elderly, not Tom. This should have been clear from the syntax which did cite the question as an “ask rather elderly Tom question”.

 

15 Comments »

  1. Wow. Wondering why does it take union-all so much time..

    Comment by Sagi Zvi — March 4, 2012 @ 6:13 pm BST Mar 4,2012 | Reply

  2. Because the Union-all in the first query unions a dataset of 4mln. rows with another of 4 mln rows to create an 8mln row one. While the fast query unions two single row datasets and sums the 2 resulting rows.

    It’s a bit counter intuitive at first, but the union of the large resultsets takes long. Maybe because part of the data needs to be written to temp. But twice counting a dataset of 4mln. and then unioning and summing the 2 resulting rows is faster than unioning 2x 4mln rows to an 8mln row dataset and counting it.

    I wonder if the same goes for partitioned tables. Or if the optimizer would be smart enough to recognize them as such.

    Comment by Arian — March 4, 2012 @ 7:58 pm BST Mar 4,2012 | Reply

    • in both cases oracle read all the entries. since we dont aspect unionall to sort … why does it need temp anyway…

      Comment by sagi — March 5, 2012 @ 5:12 am BST Mar 5,2012 | Reply

  3. Yeah, UNION ALL doesn’t have this problem what UNION potentially has. As the UNION ALL really doesn’t merge the two row-sets together in any way, it just fetches all the rows from the 1st rowsource under it and then proceeds to next etc.

    But this is a really good illustration of the data flow through the execution plan tree. The execution plan is just a bunch of C functions called in a specific hierarchy and order (as defined in the execution plan). Rows fetched by access paths are passed to direct parents by the child row source functions (and this passing means copying stuff around in memory, which in turn means CPU usage). So the more rows are passed “upwards” in the plan, the more CPU it takes (among other things like memory/IO resource for subsequent sorts/aggregations if any) to handle them.

    Really great example I’d say. And Jonathan, correct me if I’m wrong ;-)

    Comment by Tanel Poder — March 5, 2012 @ 4:05 pm BST Mar 5,2012 | Reply

    • 1. Good explanation Tanel. The corollary is that one should design queries so that Oracle can filter rows as early as possible. I’ve seen this go wrong when using subqueries and/or the with clause. The human reading the query knew that the filter could be applied to the subquery, but Oracle couldn’t “prove” that it was OK to do, so didn’t.

      2. How much more intelligence would Oracle need to be able to “push” the count(*) into the inside queries and then add them together “outside”? Us humans know that a “union all” is mathemtically equivalent to a “+”, can we teach Oracle that?

      Comment by Colin 't Hart — March 5, 2012 @ 4:22 pm BST Mar 5,2012 | Reply

      • Yeah – and this particular query is so good example, because it doesn’t have the obvious usual suspects in it, which would suffer from large rowcounts fed to them (like filter predicates, joins, sorts, aggregates), but it’s pure data transfer towards the root through the execution plan tree.

        Comment by Tanel Poder — March 5, 2012 @ 4:26 pm BST Mar 5,2012 | Reply

  4. not reproducible on 9.2.0.8 on AIX 64bit

    Comment by Pavol Babel — March 6, 2012 @ 12:30 pm BST Mar 6,2012 | Reply

    • Pavol,

      I find that a little surprising – it certainly reproduces on 9.2.0.8 on windows, and I wouldn’t have thought that a change of O/S would cause a change in plan.

      Comment by Jonathan Lewis — March 6, 2012 @ 5:06 pm BST Mar 6,2012 | Reply

      • Jonathan, there is no execution plan change on AIX. At least I didn’t notice. I will upload execution plans later.

        Regards

        Pavol Babel

        Comment by Pavol Babel — March 7, 2012 @ 11:02 am BST Mar 7,2012 | Reply

      • Jonathan,

        I also gave a try to your hinted source$ example (response to sid). There is not such a huge difference between queries. It is 3 vs 3.5 seconds (with 2.9 million rows in sys,source$) on my database (9.2.0.8 AIX 5L 5.3 POWER6 4208MHz, 8 CORES). It seems data pipelining between execution plan operators is much faster on AIX. Will uload tkprof output later (now I’m too tired), too.

        Regards
        Pavol Babel

        Comment by Pavol Babel — March 8, 2012 @ 12:10 am BST Mar 8,2012 | Reply

        • I think it’s worth to point out that enabling tracing also enables Rowsource statistics sampling – and it is this Rowsource statistics sampling I believe that is responsible for the different results on different platforms. It even seems to differ between versions on the same platform: For example I have a 11.1.0.7 64bit Windows instance that is significantly faster than 10.2.0.4 64bit or 11.2.0.1 64bit on the same platform when enabling Rowsource statistics sampling. The difference is particularly significant when increasing the sample frequency, for example by simply setting STATISTICS_LEVEL = ALL.

          It is a known issue that getting the time information from the O/S is the sensitive part of that operation, and probably the 9.2.0.8 port on AIX is much more efficient in that regard than other ports.

          When taking stack traces during such a long running execution, in particular when using high sample frequencies, it can be seen that almost all time is spent on the ftime function calls.

          Randolf

          Comment by Randolf Geist — March 8, 2012 @ 2:35 pm BST Mar 8,2012

        • Randolf, here is reduced tkprof output from AIX. Getting O/S level statistics is slow on some platforms. I noticed only on 10g Linux / Windows. I’m not able to reproduce issue even on 9.2.0.8 on Linux x86

          
          SELECT
                 COUNT (SOURCE)
            FROM (SELECT /*+ full(source$) */
                         SOURCE
                    FROM sys.source$
                  UNION ALL
                  SELECT /*+ full(source$) */
                         SOURCE
                    FROM sys.source$);
          
          call     count       cpu    elapsed       disk      query    current        rows
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          Parse        1      0.00       0.00          0          0          0           0
          Execute      1      0.00       0.00          0          0          0           0
          Fetch        1      1.77       5.04      40238      40264          0           1
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          total        3      1.77       5.04      40238      40264          0           1
          
          
          
          
          SELECT
                 sum(cnt)
            FROM (SELECT /*+ full(source$) */
                         count(SOURCE) cnt
                    FROM sys.source$
                  UNION ALL
                  SELECT /*+ full(source$) */
                         count(SOURCE)
                    FROM sys.source$)
          
          call     count       cpu    elapsed       disk      query    current        rows
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          Parse        1      0.00       0.00          0          0          0           0
          Execute      1      0.00       0.00          0          0          0           0
          Fetch        1      1.28       2.90      40198      40264          0           1
          ------- ------  -------- ---------- ---------- ---------- ----------  ----------
          total        3      1.28       2.90      40198      40264          0           1
          
          

          Comment by Pavol Babel — March 20, 2012 @ 9:00 am BST Mar 20,2012

  5. I was stuck around 10 mins last night, then i decided to reproduce the case. Because of the implication of count(*), i was expecting to see some clue in the “Column Projection Information”, but nothing. So i wonder if the null in the first query is replaced with anything else, there will be column information for the union all row source in the “Column Projection Information”. Anyway, will try it again later.

    Comment by Sid — March 6, 2012 @ 1:12 pm BST Mar 6,2012 | Reply

  6. In the first query, The null really hides the fact that the data from union all operator will be passed to the parent view. There is no column information for the union all operator. Replace the null with 1, the clue will reveal.

    select count(*)
    from (
    	select null from big_table
    	union all
    	select null from big_table
    );
    
    ------------------------------------------------------------------------
    | Id  | Operation		| Name | Rows  | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	|      |     1 |  8926	 (2)| 00:01:48 |
    |   1 |  SORT AGGREGATE 	|      |     1 |	    |	       |
    |   2 |   VIEW			|      |  7984K|  8926	 (2)| 00:01:48 |
    |   3 |    UNION-ALL		|      |       |	    |	       |
    |   4 |     INDEX FAST FULL SCAN| FOO  |  3992K|  4463	 (2)| 00:00:54 |
    |   5 |     INDEX FAST FULL SCAN| FOO  |  3992K|  4463	 (2)| 00:00:54 |
    ------------------------------------------------------------------------
    
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
    
       1 - (#keys=0) COUNT(*)[22]
    
    
    select count(*)
    from (
    	select 1 from big_table
    	union all
    	select 1 from big_table
    );
    
    ---------------------------------------------------------------------------------
    | Id  | Operation		 | Name | Rows	| Bytes | Cost (%CPU)| Time	|
    ---------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT	 |	|     1 |    13 |  8926   (2)| 00:01:48 |
    |   1 |  SORT AGGREGATE 	 |	|     1 |    13 |	     |		|
    |   2 |   VIEW			 |	|     2 |    26 |  8926   (2)| 00:01:48 |
    |   3 |    UNION-ALL		 |	|	|	|	     |		|
    |   4 |     SORT AGGREGATE	 |	|     1 |	|	     |		|
    |   5 |      INDEX FAST FULL SCAN| FOO	|  3992K|	|  4463   (2)| 00:00:54 |
    |   6 |     SORT AGGREGATE	 |	|     1 |	|	     |		|
    |   7 |      INDEX FAST FULL SCAN| FOO	|  3992K|	|  4463   (2)| 00:00:54 |
    ---------------------------------------------------------------------------------
    
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
    
       1 - (#keys=0) SUM("CNT")[22]
       2 - "CNT"[NUMBER,22]
       3 - STRDEF[22]
       4 - (#keys=0) COUNT(*)[22]
       6 - (#keys=0) COUNT(*)[22]
    

    Comment by Sid — March 6, 2012 @ 2:45 pm BST Mar 6,2012 | Reply

    • Sid,

      The idea was a good one, but Oracle does get a bit clever with constants and counts. If you want to use the projection information to demonstrate the size of the data flow you need to select and count a non-null column from the table and use that column name all the way through the query. Here’s a sample that uses source$ to make the point:

      PLAN_TABLE_OUTPUT
      -------------------------------------
      SQL_ID  3g2w174t501g6, child number 0
      -------------------------------------
      select         /*+ gather_plan_statistics */         count(source) from    (
         select /*+ full(source$) */ source from source$         union all
      select /*+ full(source$) */ source from source$         )
      
      Plan hash value: 3996963393
      
      ------------------------------------------------------------------------------------------
      | Id  | Operation            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
      ------------------------------------------------------------------------------------------
      |   1 |  SORT AGGREGATE      |         |      1 |      1 |      1 |00:00:00.55 |    9782 |
      |   2 |   VIEW               |         |      1 |    569K|    571K|00:00:10.86 |    9782 |
      |   3 |    UNION-ALL         |         |      1 |        |    571K|00:00:06.86 |    9782 |
      |   4 |     TABLE ACCESS FULL| SOURCE$ |      1 |    284K|    285K|00:00:01.43 |    4891 |
      |   5 |     TABLE ACCESS FULL| SOURCE$ |      1 |    284K|    285K|00:00:01.14 |    4891 |
      ------------------------------------------------------------------------------------------
      
      Column Projection Information (identified by operation id):
      -----------------------------------------------------------
         1 - (#keys=0) COUNT("SOURCE")[22]
         2 - "SOURCE"[VARCHAR2,4000]
         3 - STRDEF[4000]
         4 - "SOURCE"[VARCHAR2,4000]
         5 - "SOURCE"[VARCHAR2,4000]
      
      

      Comment by Jonathan Lewis — March 6, 2012 @ 5:04 pm BST Mar 6,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,268 other followers