Oracle Scratchpad

April 26, 2007

Heisenberg

Filed under: dbms_xplan,Execution plans,Troubleshooting — Jonathan Lewis @ 9:14 pm BST Apr 26,2007

[See also: gather plan statistics]

Okay, so the title is pretentious; but I thought it was a snappy summary of this item [until someone pointed out that Heisenberg's Uncertainty Principle is not about measurement error].

I’ve blogged before about the improved features in 10g of the dbms_xplan package, in particular the display_cursor() procedure, with its option for displaying rowsource execution statistics.

There is a bit of a problem with this feature, though, which I’ll demonstrate (at least on my laptop) with a query that I first introduced some time ago when discussing filter subqueries and scalar subquery caching. The query references a table called emp, which holds details of 20,000 employees spread over 6 departments. Here’s my test run, with the resulting execution plan:

set serveroutput off

select
	/*+ gather_plan_statistics */
	count(*)
from	(
	select	/*+ no_merge */
		outer.*
	from emp outer
	where outer.sal >
		(
			select /*+ no_unnest */ avg(inner.sal)
		 	from emp inner
			where inner.dept_no = outer.dept_no
		)
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

====================================================================

  COUNT(*)
----------
      9998

1 row selected.

Elapsed: 00:00:00.14

PLAN_TABLE_OUTPUT
---------------------------
SQL_ID gncq9drn4sdgw, child number 0
-------------------------------------
select  /*+ gather_plan_statistics */  count(*) from (  select /*+ no_merge */
outer.* from emp outer where outer.sal > ( select /*+ no_unnest */
avg(inner.sal) from emp inner where inner.dept_no = outer.dept_no ) )

Plan hash value: 322796046

----------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:00.12 |    1540 |
|   2 |   VIEW                |      |      1 |    167 |   9998 |00:00:00.23 |    1540 |
|*  3 |    FILTER             |      |      1 |        |   9998 |00:00:00.16 |    1540 |
|   4 |     TABLE ACCESS FULL |  EMP |      1 |  20000 |  20000 |00:00:00.10 |     220 |
|   5 |     SORT AGGREGATE    |      |      6 |      1 |      6 |00:00:00.06 |    1320 |
|*  6 |      TABLE ACCESS FULL|  EMP |      6 |   3333 |  20000 |00:00:00.14 |    1320 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("OUTER"."SAL">  )
   6 - filter("INNER"."DEPT_NO"=:B1)

25 rows selected.

(You’ll note in passing that the predicate information extracted from the v$sql_plan table manages to lose the subquery from line 3 – this is an irritating defect of in-memory execution plans).

But there’s a much more significant problem than that if you’re doing this to find out where the time went.  Notice how the query elapsed time was 0.14 seconds, which is slightly larger than the actual time (A-time) reported in line one of the plan. Then look down the plan: the time recorded in a line should be the sum of the time spent in the operation itself plus the time spent in its direct descendants.

So how can line 2 report a larger time than line 1. How can line 6 report a larger time than line 5 ? Which lines are right, and which lines are wrong ?

Someone sent me an email recently pointing out that this problem disappears if you set parameter statistics_level to “all”. Sure enough, when I got rid of the hint and used an alter  session command to set the statistics_level, the execution plan came out with a totally consistent set of timings. Here are the results after the change:

Note how the elapsed time is consistent with the A-time in the top line of the plan, and how the A-times down the whole plan are self-consistent.

But spot the problem … the elapsed time has just gone up from 0.14 seconds to 0.46 seconds.  There’s an overhead of about  200%. So how do you work out where your time is really going when the overhead of measuring is much larger than the thing you are trying to measure.

I think the problem is one of granularity. The hidden parameter  “_rowsource_statistics_sampfreq” – which defaults to the value 128 – defines a sampling frequency for the collection of rowsource execution statistics. When you set statistics_level to “all” this parameter is (I suspect) invisibly set to 1, which makes the statistics more accurate but much more expensive to collect.

You can circumvent this issue by setting “_rowsource_statistics_sampfreq” back to 128 after you’ve set the statistics_level to “all”. Of course, this will just take your execution plan timings back to their previous inconsistent state – so you’re wrong whatever choice you make.

So what can you do ? I think you have to run the query three times and use a little bit of judicious guesswork. Use the first run to get a base-line, do the second with the gather_plan_statistics hint, and do the third with statistics_level set to “all”.

Combining the three sets of results lets you know the true impact of the overheads – which in some cases may be insignificant, hence ignorable – then combining the second and third sets, you may see a correlation which suggests fairly strongly where most of the overhead is appearing and where most of the useful work is actually being done.

Beyond that, of course, the starts and buffers (and disk reads and memory operations, if they occur) may give you a very good indicator anyway. To date, when it’s been important, I’ve found that I’ve been able to get enough information from the hint-based approach, and not needed to worry about completely self-consistent times.

Footnote 1: The comparative tests above have only been carried out on a system running 10.2.0.3 on Windows XP Pro.  Results may vary with version and operating system. In particular, the operating system may make a huge difference to the overheads. If you try this out on a different platform, and see different results, let me know. All we need to see are version, O/S details, and the three run times (not the plan outputs).

Footnote 2: You may recall that from 9.2.0.2 through to 9.2.0.5, enabling sql_trace gave you some row source execution statistics in the STAT# lines of trace files. This feature disappeared in 9.2.0.6 because of the overheads, re-appearing in 10g when the parameter to set a sampling frequency was introduced.

 

[See also: gather plan statistics]

10 Comments »

  1. Maybe we can use a Heisenberg compensator from Star Trek the next generation…

    LOL! ^_^

    Comment by Antonio — May 2, 2007 @ 12:12 pm BST May 2,2007 | Reply

  2. The result of mytest
    ——————–

    ******************************************************
    ** TEST Environment: HP-UX 11.23 10.2.0.3
    ******************************************************
    
    ** CASE #1
    ** without GATHER_PLAN_STATISTICS hint
    ** STATISTICS_LEVEL=TYPICAL (default)
    SQL> @run_test_sql1
    SQL> @check_result
    -----------------------------------------------
    | Id  | Operation             | Name | E-Rows |
    -----------------------------------------------
    |   1 |  SORT AGGREGATE       |      |      1 |
    |   2 |   VIEW                |      |      1 |
    |*  3 |    FILTER             |      |        |
    |   4 |     TABLE ACCESS FULL | EMP  |     14 |
    |   5 |     SORT AGGREGATE    |      |      1 |
    |*  6 |      TABLE ACCESS FULL| EMP  |      5 |
    -----------------------------------------------
    
    ** CASE #2
    ** with GATHER_PLAN_STATISTICS Hint 
    ** STATISTICS_LEVEL=TYPICAL (default)
    
    SQL> @run_test_sql2
    SQL> @check_result
    -------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
    -------------------------------------------------------------------------------------------------
    |   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:25.80 |   21672 |  21545 |
    |   2 |   VIEW                |      |      1 |      1 |    393K|00:00:16.14 |   21672 |  21545 |
    |*  3 |    FILTER             |      |      1 |        |    393K|00:00:14.96 |   21672 |  21545 |
    |   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:01.85 |    5418 |   5381 |
    |   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:14.29 |   16254 |  16164 |
    |*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:13.82 |   16254 |  16164 |
    -------------------------------------------------------------------------------------------------
    
    
    ** CASE #3
    ** without GATHER_PLAN_STATISTICS Hint 
    ** SQL> alter session set STATISTICS_LEVEL=ALL
    SQL> @run_test_sql1
    SQL> @check_result
    -------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
    -------------------------------------------------------------------------------------------------
    |   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:24.24 |   21672 |  21609 |
    |   2 |   VIEW                |      |      1 |      1 |    393K|00:00:23.27 |   21672 |  21609 |
    |*  3 |    FILTER             |      |      1 |        |    393K|00:00:21.70 |   21672 |  21609 |
    |   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:04.60 |    5418 |   5398 |
    |   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:13.06 |   16254 |  16211 |
    |*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:10.09 |   16254 |  16211 |
    -------------------------------------------------------------------------------------------------
    
    ** CASE #4
    ** without GATHER_PLAN_STATISTICS Hint 
    ** SQL> alter session set STATISTICS_LEVEL=ALL
    ** SQL> alter session set "_rowsource_statistics_sampfreq"=128
    SQL> @run_test_sql1
    SQL> @check_result
    
    -------------------------------------------------------------------------------------------------
    | Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
    -------------------------------------------------------------------------------------------------
    |   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:26.07 |   21672 |  21523 |
    |   2 |   VIEW                |      |      1 |      1 |    393K|00:00:21.49 |   21672 |  21523 |
    |*  3 |    FILTER             |      |      1 |        |    393K|00:00:20.31 |   21672 |  21523 |
    |   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:02.78 |    5418 |   5410 |
    |   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:14.93 |   16254 |  16113 |
    |*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:06.42 |   16254 |  16113 |
    -------------------------------------------------------------------------------------------------
    
    

    Comment by steve.kim — November 21, 2007 @ 2:07 am BST Nov 21,2007 | Reply

  3. Steve,

    Thanks for sending in some results.

    Unfortunately you’ve grown the data set to a size that’s resulted in the tablescans going to disc (see the reads column). As a consequence, the variation in CPU usage due to the execution stats collection has been swamped by the differences in CPU spent on things like LRU latch activity and I/O response time.

    Comment by Jonathan Lewis — November 21, 2007 @ 7:48 am BST Nov 21,2007 | Reply

  4. [...] Performance, Statspack, Troubleshooting — Jonathan Lewis @ 7:27 pm UTC Nov 25,2007 I wrote an article some time ago about how 10g gave you a very convenient way to capture run-time information about the work done [...]

    Pingback by gather_plan_statistics « Oracle Scratchpad — November 25, 2007 @ 7:27 pm BST Nov 25,2007 | Reply

  5. [...] the query text, check the index definitions, run the query on a recent backup of production with plan statistics enabled.  It turns out that a very large fraction of the work goes into one of the three parts of the [...]

    Pingback by Scalability Conflict « Oracle Scratchpad — September 6, 2010 @ 1:06 pm BST Sep 6,2010 | Reply

  6. [...] parameters STATISTICS_LEVEL and _ROW_SOURCE_SAMPFREQ to generate more accurate timing values (see Jonathan Lewis’ “Heisenberg” post for a discussion of the [...]

    Pingback by Execution plans in Mumbai | Marcus Mönnig's Oracle and Mumbai Blog — December 30, 2011 @ 4:49 pm BST Dec 30,2011 | Reply

  7. [...] instead of the hint or using _rowsource_statistics_sampfreq hidden parameter (see Jonathan Lewis’ post for [...]

    Pingback by DBMS_XPLAN.DISPLAY_CURSOR « Oracle Diagnostician — May 14, 2012 @ 5:16 am BST May 14,2012 | Reply

  8. Hi Jonathan. I have a query plan where I cannot explain how time adds up. I did the ALTER SESSION trick but it changed nothing. I ran this, and got the following plan. I have two questions (I appoligize for not being able to format this code and plan but I saw not formatting buttons on the insert box).

    1) @ step #8 in the plan, the query jumps to 3 and 1/2 minutes. This step says VIEW but gives no indication of what it did that actually took 3 and 1/2 minutes. Can you explain or give me some idea how to find out what is being done on this line that takes that long. Especially with so few rows.

    2) the total elapsed time of the query was 3 minutes 45 seconds. This jives with step #1 that says 3:44.54. But if A-TIME is supposed to be among other things, the sum of child steps, then how to we explain step #2. Step #2 says 3 minutes 3 minutes 44 seconds. But the sum of its child steps (#3 and #8) is7 minutes and 25 seconds. Did this query do these steps “in parallel” so to speak? Or is there something really off in the numbers and if so how to I prove it?

    This is not something that is super pressing but I sure would like to be able to explain the differences to people. It is making it somewhat harder to get people to believe I know what I am doing when I cannot such obvious issues as these two things.

    Thanks, Kevin Meade

    ALTER SESSION SET STATISTICS_LEVEL=ALL
    /
    
    drop table kevtemp1
    /
    
    create table kevtemp1
    nologging
    as
    SELECT SRCE_EFF_START_TMSP, 
           AGGREGATECLAIMUID 
    FROM (
         SELECT  A.AGGREGATECLAIMUID, 
                 A.SNAPSHOT_DT, 
                 MAX(A.SRCE_EFF_START_TMSP) SRCE_EFF_START_TMSP
         FROM (SELECT PI.AGGREGATECLAIMUID, 
                      PI.SRCE_EFF_START_TMSP, 
                      (SELECT DISTINCT BW.SNAPSHOT_DT
                       FROM RRS_SHR.RRS_ETL_BATCH_WINDOW BW, RRS_SHR.RRS_LOAD_STATUS RLS
                       WHERE RLS.ASTG_LOAD_STATUS = 'STARTED'   
                       AND RLS.SNAPSHOT_DT = BW.SNAPSHOT_DT
                       AND BW.RPTG_WINDOW_TYPE = 'D'    
                       AND BW.OBJECT_NM = 'R_AGGREGATE_CLAIM_SEED'    
                       AND PI.SRCE_EFF_START_TMSP 
                           BETWEEN 
                           BW.BEGIN_TMSP 
                           AND 
                           BW.END_TMSP
                      ) AS SNAPSHOT_DT
               FROM RRS_PSTG.P_PCD_AGGREGATECLAIM PI
               WHERE PI.BATCH_ID IN 
                     (SELECT src_batch_id
                      FROM batch_processing_queue
                      WHERE dm_batch_id =
                            batch_id_pkg.get_current_batch_id('PCD','ATM')
                     )
              ) A
              GROUP BY A.AGGREGATECLAIMUID, A.SNAPSHOT_DT
         )
    /
    
    Table created.
    
    Elapsed: 00:03:45.14
    
    Plan hash value: 3515505567
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                       | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT          |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |       |       |          |
    |   1 |  LOAD AS SELECT                 |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |   530K|   530K|  530K (0)|
    |   2 |   HASH GROUP BY                 |                             |      1 |  84487 |  48482 |00:03:44.41 |     245K|   8386 |      0 |  3128K|  1284K| 3538K (0)|
    |   3 |    HASH UNIQUE                  |                             |  33524 |      1 |  33524 |00:03:40.98 |     236K|     12 |      0 |  1270K|  1270K|  523K (0)|
    |   4 |     NESTED LOOPS                |                             |  33524 |      1 |  33524 |00:00:08.35 |     236K|     12 |      0 |       |       |          |
    |*  5 |      TABLE ACCESS BY INDEX ROWID| RRS_ETL_BATCH_WINDOW        |  33524 |     11 |  33524 |00:00:07.84 |     203K|      9 |      0 |       |       |          |
    |*  6 |       INDEX RANGE SCAN          | IDX1_RRS_ETL_BATCH_WINDOW   |  33524 |     41 |   5414K|00:00:04.09 |   86482 |      5 |      0 |       |       |          |
    |*  7 |      INDEX UNIQUE SCAN          | IUK2_RRS_LOAD_STATUS        |  33524 |      1 |  33524 |00:00:00.31 |   33535 |      3 |      0 |       |       |          |
    |   8 |    VIEW                         |                             |      1 |  84487 |  48484 |00:03:43.99 |     245K|   8386 |      0 |       |       |          |
    |*  9 |     HASH JOIN                   |                             |      1 |  84487 |  48484 |00:00:02.63 |    9186 |   8374 |      0 |  1023K|  1023K| 1243K (0)|
    |* 10 |      INDEX RANGE SCAN           | IAK1_BATCH_PROCESSING_QUEUE |      1 |      5 |    102 |00:00:00.22 |     855 |     44 |      0 |       |       |          |
    |  11 |      TABLE ACCESS FULL          | P_PCD_AGGREGATECLAIM        |      1 |   1536K|   1536K|00:00:01.47 |    8331 |   8330 |      0 |       |       |          |
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       5 - filter(("BW"."BEGIN_TMSP"=:B1)
       7 - access("RLS"."SNAPSHOT_DT"="BW"."SNAPSHOT_DT" AND "RLS"."ASTG_LOAD_STATUS"='STARTED')
       9 - access("PI"."BATCH_ID"="SRC_BATCH_ID")
      10 - access("DM_BATCH_ID"="BATCH_ID_PKG"."GET_CURRENT_BATCH_ID"('PCD','ATM'))
    
    
    46 rows selected.
    
    Elapsed: 00:00:01.75
    

    Comment by Kevin Meade — November 19, 2012 @ 5:29 pm BST Nov 19,2012 | Reply

  9. [...] copied the following question up from a recent comment because it’s a useful generic type of question, and I think I can answer it in the few [...]

    Pingback by Plan timing « Oracle Scratchpad — November 19, 2012 @ 10:02 pm BST Nov 19,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,906 other followers