Heisenberg

April 26, 2007

Heisenberg

Filed under: dbms_xplan,Execution plans,Troubleshooting — Jonathan Lewis @ 9:14 pm BST Apr 26,2007

Okay, so the title is pretentious; but I thought it was a snappy summary of this item [until someone pointed out that Heisenberg’s Uncertainty Principle is not about measurement error].

I’ve blogged before about the improved features in 10g of the dbms_xplan package, in particular the display_cursor() procedure, with its option for displaying rowsource execution statistics.

There is a bit of a problem with this feature, though, which I’ll demonstrate (at least on my laptop) with a query that I first introduced some time ago when discussing filter subqueries and scalar subquery caching. The query references a table called emp, which holds details of 20,000 employees spread over 6 departments. Here’s my test run, with the resulting execution plan:

set serveroutput off

select
	/*+ gather_plan_statistics */
	count(*)
from	(
	select	/*+ no_merge */
		outer.*
	from emp outer
	where outer.sal >
		(
			select /*+ no_unnest */ avg(inner.sal)
		 	from emp inner
			where inner.dept_no = outer.dept_no
		)
	)
;

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

====================================================================

  COUNT(*)
----------
      9998

1 row selected.

Elapsed: 00:00:00.14

PLAN_TABLE_OUTPUT
---------------------------
SQL_ID gncq9drn4sdgw, child number 0
-------------------------------------
select  /*+ gather_plan_statistics */  count(*) from (  select /*+ no_merge */
outer.* from emp outer where outer.sal > ( select /*+ no_unnest */
avg(inner.sal) from emp inner where inner.dept_no = outer.dept_no ) )

Plan hash value: 322796046

----------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:00.12 |    1540 |
|   2 |   VIEW                |      |      1 |    167 |   9998 |00:00:00.23 |    1540 |
|*  3 |    FILTER             |      |      1 |        |   9998 |00:00:00.16 |    1540 |
|   4 |     TABLE ACCESS FULL |  EMP |      1 |  20000 |  20000 |00:00:00.10 |     220 |
|   5 |     SORT AGGREGATE    |      |      6 |      1 |      6 |00:00:00.06 |    1320 |
|*  6 |      TABLE ACCESS FULL|  EMP |      6 |   3333 |  20000 |00:00:00.14 |    1320 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("OUTER"."SAL">  )
   6 - filter("INNER"."DEPT_NO"=:B1)

25 rows selected.

You’ll note in passing that the predicate information extracted from the v$sql_plan dynamic performance view manages to lose the subquery from operation 3 – this is an irritating defect of in-memory execution plans.

But there’s a much more significant problem than that if you’re trying to find out where the time went. Notice how the query elapsed time was 0.14 seconds, which is slightly larger than the actual time (A-time) reported at operation one of the plan. Then look down the plan: the time recorded for an operation should be the sum of the time spent in the operation itself plus the time spent in its child operations (i.e. immediate descendents only).

So how can operation 2 report a larger time than operation 1. How can operation 6 report a larger time than operation 5 ? Which operations are right, and which are wrong?

Someone sent me an email recently pointing out that this problem disappears if you set parameter statistics_level to “all”. Sure enough, when I got rid of the hint and used an alter session command to set the statistics_level, the execution plan came out with a totally consistent set of timings. Here are the results after the change:

  COUNT(*)
----------
      9998

1 row selected.

Elapsed: 00:00:00.46

PLAN_TABLE_OUTPUT
---------------------------
SQL_ID 393znz62ak5vr, child number 0
-------------------------------------
select    count(*) from (  select /*+ no_merge */ outer.* from emp outer 
where outer.sal > ( select /*+ no_unnest */ avg(inner.sal) from emp inner 
where inner.dept_no = outer.dept_no ) )

Plan hash value: 322796046

----------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:00.46 |    1540 |
|   2 |   VIEW                |      |      1 |    167 |   9998 |00:00:00.44 |    1540 |
|*  3 |    FILTER             |      |      1 |        |   9998 |00:00:00.38 |    1540 |
|   4 |     TABLE ACCESS FULL |  EMP |      1 |  20000 |  20000 |00:00:00.08 |     220 |
|   5 |     SORT AGGREGATE    |      |      6 |      1 |      6 |00:00:00.19 |    1320 |
|*  6 |      TABLE ACCESS FULL|  EMP |      6 |   3333 |  20000 |00:00:00.14 |    1320 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("OUTER"."SAL">  )
   6 - filter("INNER"."DEPT_NO"=:B1)

25 rows selected.

Note how the elapsed time is consistent with the A-time in the top line of the plan, and how the A-times down the whole plan are self-consistent.

But spot the problem … the elapsed time has just gone up from 0.14 seconds to 0.46 seconds. There’s an overhead of about 200%. So how do you work out where your time is really going when the overhead of measuring is much larger than the thing you are trying to measure.

I think the problem is one of granularity. The hidden parameter “_rowsource_statistics_sampfreq” – which defaults to the value 128 – defines a sampling frequency for the collection of rowsource execution statistics. When you set statistics_level to “all” this parameter is (I suspect) invisibly set to 1, which makes the statistics more accurate but much more expensive to collect.

You can circumvent this issue by setting “_rowsource_statistics_sampfreq” back to 128 after you’ve set the statistics_level to “all”. Of course, this will just take your execution plan timings back to their previous inconsistent state – so you’re wrong whatever choice you make.

So what can you do ? I think you have to run the query three times and use a little bit of judicious guesswork. Use the first run to get a base-line, do the second with the gather_plan_statistics hint, and do the third with statistics_level set to “all”.

Combining the three sets of results lets you know the true impact of the overheads – which in some cases may be insignificant, hence ignorable – then combining the second and third sets, you may see a correlation which suggests fairly strongly where most of the overhead is appearing and where most of the useful work is actually being done.

Beyond that, of course, the starts and buffers (and disk reads and memory operations, if they occur) may give you a very good indicator anyway. To date, when it’s been important, I’ve found that I’ve been able to get enough information from the hint-based approach, and not needed to worry about completely self-consistent times.

Footnote 1: The comparative tests above have only been carried out on a system running 10.2.0.3 on Windows XP Pro. Results may vary with version and operating system. In particular, the operating system may make a huge difference to the overheads. If you try this out on a different platform, and see different results, let me know. All we need to see are version, O/S details, and the three run times (not the plan outputs).

Footnote 2: You may recall that from 9.2.0.2 through to 9.2.0.5, enabling sql_trace gave you some row source execution statistics in the STAT# lines of trace files. This feature disappeared in 9.2.0.6 because of the overheads, re-appearing in 10g when the parameter to set a sampling frequency was introduced.

Comments (18)

18 Comments »

Maybe we can use a Heisenberg compensator from Star Trek the next generation…

LOL! ^_^

Comment by Antonio — May 2, 2007 @ 12:12 pm BST May 2,2007 | Reply

The result of mytest
——————–

******************************************************
** TEST Environment: HP-UX 11.23 10.2.0.3
******************************************************

** CASE #1
** without GATHER_PLAN_STATISTICS hint
** STATISTICS_LEVEL=TYPICAL (default)
SQL&gt; @run_test_sql1
SQL&gt; @check_result
-----------------------------------------------
| Id  | Operation             | Name | E-Rows |
-----------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |
|   2 |   VIEW                |      |      1 |
|*  3 |    FILTER             |      |        |
|   4 |     TABLE ACCESS FULL | EMP  |     14 |
|   5 |     SORT AGGREGATE    |      |      1 |
|*  6 |      TABLE ACCESS FULL| EMP  |      5 |
-----------------------------------------------

** CASE #2
** with GATHER_PLAN_STATISTICS Hint 
** STATISTICS_LEVEL=TYPICAL (default)

SQL&gt; @run_test_sql2
SQL&gt; @check_result
-------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:25.80 |   21672 |  21545 |
|   2 |   VIEW                |      |      1 |      1 |    393K|00:00:16.14 |   21672 |  21545 |
|*  3 |    FILTER             |      |      1 |        |    393K|00:00:14.96 |   21672 |  21545 |
|   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:01.85 |    5418 |   5381 |
|   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:14.29 |   16254 |  16164 |
|*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:13.82 |   16254 |  16164 |
-------------------------------------------------------------------------------------------------


** CASE #3
** without GATHER_PLAN_STATISTICS Hint 
** SQL&gt; alter session set STATISTICS_LEVEL=ALL
SQL&gt; @run_test_sql1
SQL&gt; @check_result
-------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:24.24 |   21672 |  21609 |
|   2 |   VIEW                |      |      1 |      1 |    393K|00:00:23.27 |   21672 |  21609 |
|*  3 |    FILTER             |      |      1 |        |    393K|00:00:21.70 |   21672 |  21609 |
|   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:04.60 |    5418 |   5398 |
|   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:13.06 |   16254 |  16211 |
|*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:10.09 |   16254 |  16211 |
-------------------------------------------------------------------------------------------------

** CASE #4
** without GATHER_PLAN_STATISTICS Hint 
** SQL&gt; alter session set STATISTICS_LEVEL=ALL
** SQL&gt; alter session set "_rowsource_statistics_sampfreq"=128
SQL&gt; @run_test_sql1
SQL&gt; @check_result

-------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------
|   1 |  SORT AGGREGATE       |      |      1 |      1 |      1 |00:00:26.07 |   21672 |  21523 |
|   2 |   VIEW                |      |      1 |      1 |    393K|00:00:21.49 |   21672 |  21523 |
|*  3 |    FILTER             |      |      1 |        |    393K|00:00:20.31 |   21672 |  21523 |
|   4 |     TABLE ACCESS FULL | EMP  |      1 |     14 |    917K|00:00:02.78 |    5418 |   5410 |
|   5 |     SORT AGGREGATE    |      |      3 |      1 |      3 |00:00:14.93 |   16254 |  16113 |
|*  6 |      TABLE ACCESS FULL| EMP  |      3 |      5 |    917K|00:00:06.42 |   16254 |  16113 |
-------------------------------------------------------------------------------------------------

Comment by steve.kim — November 21, 2007 @ 2:07 am GMT Nov 21,2007 | Reply

Steve,

Thanks for sending in some results.

Unfortunately you’ve grown the data set to a size that’s resulted in the tablescans going to disc (see the reads column). As a consequence, the variation in CPU usage due to the execution stats collection has been swamped by the differences in CPU spent on things like LRU latch activity and I/O response time.

Comment by Jonathan Lewis — November 21, 2007 @ 7:48 am GMT Nov 21,2007 | Reply
[…] Performance, Statspack, Troubleshooting — Jonathan Lewis @ 7:27 pm UTC Nov 25,2007 I wrote an article some time ago about how 10g gave you a very convenient way to capture run-time information about the work done […]

Pingback by gather_plan_statistics « Oracle Scratchpad — November 25, 2007 @ 7:27 pm GMT Nov 25,2007 | Reply
[…] the query text, check the index definitions, run the query on a recent backup of production with plan statistics enabled. It turns out that a very large fraction of the work goes into one of the three parts of the […]

Pingback by Scalability Conflict « Oracle Scratchpad — September 6, 2010 @ 1:06 pm BST Sep 6,2010 | Reply
[…] parameters STATISTICS_LEVEL and _ROW_SOURCE_SAMPFREQ to generate more accurate timing values (see Jonathan Lewis’ “Heisenberg” post for a discussion of the […]

Pingback by Execution plans in Mumbai | Marcus Mönnig's Oracle and Mumbai Blog — December 30, 2011 @ 4:49 pm GMT Dec 30,2011 | Reply
[…] instead of the hint or using _rowsource_statistics_sampfreq hidden parameter (see Jonathan Lewis’ post for […]

Pingback by DBMS_XPLAN.DISPLAY_CURSOR « Oracle Diagnostician — May 14, 2012 @ 5:16 am BST May 14,2012 | Reply

Hi Jonathan. I have a query plan where I cannot explain how time adds up. I did the ALTER SESSION trick but it changed nothing. I ran this, and got the following plan. I have two questions (I appoligize for not being able to format this code and plan but I saw not formatting buttons on the insert box).

1) @ step #8 in the plan, the query jumps to 3 and 1/2 minutes. This step says VIEW but gives no indication of what it did that actually took 3 and 1/2 minutes. Can you explain or give me some idea how to find out what is being done on this line that takes that long. Especially with so few rows.

2) the total elapsed time of the query was 3 minutes 45 seconds. This jives with step #1 that says 3:44.54. But if A-TIME is supposed to be among other things, the sum of child steps, then how to we explain step #2. Step #2 says 3 minutes 3 minutes 44 seconds. But the sum of its child steps (#3 and #8) is7 minutes and 25 seconds. Did this query do these steps “in parallel” so to speak? Or is there something really off in the numbers and if so how to I prove it?

This is not something that is super pressing but I sure would like to be able to explain the differences to people. It is making it somewhat harder to get people to believe I know what I am doing when I cannot such obvious issues as these two things.

Thanks, Kevin Meade

ALTER SESSION SET STATISTICS_LEVEL=ALL
/

drop table kevtemp1
/

create table kevtemp1
nologging
as
SELECT SRCE_EFF_START_TMSP, 
       AGGREGATECLAIMUID 
FROM (
     SELECT  A.AGGREGATECLAIMUID, 
             A.SNAPSHOT_DT, 
             MAX(A.SRCE_EFF_START_TMSP) SRCE_EFF_START_TMSP
     FROM (SELECT PI.AGGREGATECLAIMUID, 
                  PI.SRCE_EFF_START_TMSP, 
                  (SELECT DISTINCT BW.SNAPSHOT_DT
                   FROM RRS_SHR.RRS_ETL_BATCH_WINDOW BW, RRS_SHR.RRS_LOAD_STATUS RLS
                   WHERE RLS.ASTG_LOAD_STATUS = 'STARTED'   
                   AND RLS.SNAPSHOT_DT = BW.SNAPSHOT_DT
                   AND BW.RPTG_WINDOW_TYPE = 'D'    
                   AND BW.OBJECT_NM = 'R_AGGREGATE_CLAIM_SEED'    
                   AND PI.SRCE_EFF_START_TMSP 
                       BETWEEN 
                       BW.BEGIN_TMSP 
                       AND 
                       BW.END_TMSP
                  ) AS SNAPSHOT_DT
           FROM RRS_PSTG.P_PCD_AGGREGATECLAIM PI
           WHERE PI.BATCH_ID IN 
                 (SELECT src_batch_id
                  FROM batch_processing_queue
                  WHERE dm_batch_id =
                        batch_id_pkg.get_current_batch_id('PCD','ATM')
                 )
          ) A
          GROUP BY A.AGGREGATECLAIMUID, A.SNAPSHOT_DT
     )
/

Table created.

Elapsed: 00:03:45.14

Plan hash value: 3515505567

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT          |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |       |       |          |
|   1 |  LOAD AS SELECT                 |                             |      1 |        |      0 |00:03:44.54 |     246K|   8389 |    128 |   530K|   530K|  530K (0)|
|   2 |   HASH GROUP BY                 |                             |      1 |  84487 |  48482 |00:03:44.41 |     245K|   8386 |      0 |  3128K|  1284K| 3538K (0)|
|   3 |    HASH UNIQUE                  |                             |  33524 |      1 |  33524 |00:03:40.98 |     236K|     12 |      0 |  1270K|  1270K|  523K (0)|
|   4 |     NESTED LOOPS                |                             |  33524 |      1 |  33524 |00:00:08.35 |     236K|     12 |      0 |       |       |          |
|*  5 |      TABLE ACCESS BY INDEX ROWID| RRS_ETL_BATCH_WINDOW        |  33524 |     11 |  33524 |00:00:07.84 |     203K|      9 |      0 |       |       |          |
|*  6 |       INDEX RANGE SCAN          | IDX1_RRS_ETL_BATCH_WINDOW   |  33524 |     41 |   5414K|00:00:04.09 |   86482 |      5 |      0 |       |       |          |
|*  7 |      INDEX UNIQUE SCAN          | IUK2_RRS_LOAD_STATUS        |  33524 |      1 |  33524 |00:00:00.31 |   33535 |      3 |      0 |       |       |          |
|   8 |    VIEW                         |                             |      1 |  84487 |  48484 |00:03:43.99 |     245K|   8386 |      0 |       |       |          |
|*  9 |     HASH JOIN                   |                             |      1 |  84487 |  48484 |00:00:02.63 |    9186 |   8374 |      0 |  1023K|  1023K| 1243K (0)|
|* 10 |      INDEX RANGE SCAN           | IAK1_BATCH_PROCESSING_QUEUE |      1 |      5 |    102 |00:00:00.22 |     855 |     44 |      0 |       |       |          |
|  11 |      TABLE ACCESS FULL          | P_PCD_AGGREGATECLAIM        |      1 |   1536K|   1536K|00:00:01.47 |    8331 |   8330 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter(("BW"."BEGIN_TMSP"=:B1)
   7 - access("RLS"."SNAPSHOT_DT"="BW"."SNAPSHOT_DT" AND "RLS"."ASTG_LOAD_STATUS"='STARTED')
   9 - access("PI"."BATCH_ID"="SRC_BATCH_ID")
  10 - access("DM_BATCH_ID"="BATCH_ID_PKG"."GET_CURRENT_BATCH_ID"('PCD','ATM'))


46 rows selected.

Elapsed: 00:00:01.75

Comment by Kevin Meade — November 19, 2012 @ 5:29 pm GMT Nov 19,2012 | Reply

Kevin,

I’ve answered this in a new blog post.

Comment by Jonathan Lewis — November 19, 2012 @ 10:01 pm GMT Nov 19,2012 | Reply

[…] copied the following question up from a recent comment because it’s a useful generic type of question, and I think I can answer it in the few […]

Pingback by Plan timing « Oracle Scratchpad — November 19, 2012 @ 10:02 pm GMT Nov 19,2012 | Reply
[…] It would be nice, however to know where the query spent its time (i.e. can we re-run it with rowsource execution stats or monitoring enabled, and see the actual run-time work in the […]

Pingback by Understanding SQL | Oracle Scratchpad — June 1, 2015 @ 4:54 pm BST Jun 1,2015 | Reply
[…] query is engineered to have a problem, of course, and enabling rowsource execution statistics exaggerates the anomaly – but the threat is genuine. You may have seen my posting (now 12 […]

Pingback by Shrink Space | Oracle Scratchpad — November 26, 2018 @ 4:37 pm GMT Nov 26,2018 | Reply
[…] parameters STATISTICS_LEVEL and _ROW_SOURCE_SAMPFREQ to generate more accurate timing values (see Jonathan Lewis’ “Heisenberg” post for a discussion of the […]

Pingback by Execution plans in Mumbai – Marcus Mönnig's Oracle Blog — May 11, 2020 @ 9:33 am BST May 11,2020 | Reply
[…] I’ll leave you with one other thought. Here’s an execution plan from 12c (12.2.0.1) which joins three dimension tables to a fact table. There are 343,000 rows in the fact table and the three joins individually identify about 4 percent of the data in the table. In a proper data warehouse we might have been looking at a bitmap star transformation solution for this query, but in a mixed system we might want to run warehouse queries against normalised data – this plan shows what Bloom filters can do to minimise the workload. The plan was acquired from memory after enabling rowsource execution statistics: […]

Pingback by Serial Bloom | Oracle Scratchpad — September 30, 2020 @ 1:37 pm BST Sep 30,2020 | Reply
[…] information for rowsource execution stats are repeated for each call (or even if it’s only a 1% sample of the calls) down the stack that’s where a lot of time can […]

Pingback by rowsets | Oracle Scratchpad — November 12, 2020 @ 12:36 pm GMT Nov 12,2020 | Reply
[…] the execution plan from 11g for this query – I’ve enabled rowsource execution stats and pulled the plan from memory using the ‘allstats last’ format […]

Pingback by Case Study | Oracle Scratchpad — April 5, 2021 @ 3:36 pm BST Apr 5,2021 | Reply
[…] Guideline: do not set statistics_level to all at the system level, the overheads can be significant. You might want to set it occasionally for a single session for a brief interval to investigate a performance problem – bearing in mind that setting the parameter might actually introduce a whole new performance problem. […]

Pingback by Statistics_Level | Oracle Scratchpad — November 22, 2021 @ 11:13 am GMT Nov 22,2021 | Reply
[…] published on the ODC database forum. The plan comes from a call to dbms_xplan.display_cursor() with rowsource execution statistics […]

Pingback by Execution Plan Puzzle | Oracle Scratchpad — January 27, 2022 @ 6:05 pm GMT Jan 27,2022 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

April 26, 2007

Heisenberg

18 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS

Oracle Scratchpad

April 26, 2007

Heisenberg

Share this:

Related

18 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS