Oracle Scratchpad

June 7, 2013

Same Plan

Filed under: CBO,Execution plans,Oracle,Tuning — Jonathan Lewis @ 5:11 pm BST Jun 7,2013

An interesting little problem appeared on the Oracle-L mailing list earlier on this week – a query ran fairly quickly when statistics hadn’t been collected on the tables, but then ran rather slowly after stats collection even though the plan hadn’t changed, and the tkprof results were there to prove the point. Here are the two outputs (edited slightly for width – the original showed three sets of row stats, the 1st, avg and max, but since the query had only been run once the three columns showed the same results in each case):


 Rows (max)  Row Source Operation
 ----------  ---------------------------------------------------
          0  UPDATE  CXT_FAKT_PROVISIONSBUCHUNG (cr=2039813 pr=3010 pw=0 time=47745718 us)
      15456   TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1325 pw=0 time=40734 us cost=370 size=880992 card=15456)
      11225   VIEW  CXV_HAUPT_VU_SPARTE (cr=2038477 pr=1684 pw=0 time=47297497 us cost=10 size=4293 card=1)
      11225    SORT UNIQUE (cr=2038477 pr=1684 pw=0 time=47284436 us cost=10 size=824 card=1)
     126457     VIEW  (cr=2038167 pr=1669 pw=0 time=27753402 us cost=9 size=824 card=1)
     126457      WINDOW SORT (cr=2038167 pr=1669 pw=0 time=27667835 us cost=9 size=853 card=1)
     126457       WINDOW SORT (cr=2038167 pr=1669 pw=0 time=26884699 us cost=9 size=853 card=1)
     126457        NESTED LOOPS  (cr=2038167 pr=1669 pw=0 time=26342292 us)
     126457         NESTED LOOPS  (cr=1995241 pr=1615 pw=0 time=26192173 us cost=7 size=853 card=1)
     141581          NESTED LOOPS  (cr=642683 pr=1066 pw=0 time=3039331 us cost=5 size=499 card=1)
      11225           TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=109767 us cost=2 size=16 card=1)
      11225            INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=4 pw=0 time=51796 us cost=1 size=0 card=1)(object id 1790009)
     141581           TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=620210 pr=1034 pw=0 time=2889850 us cost=3 size=483 card=1)
    1732978            INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140952 pr=204 pw=0 time=2094982 us cost=2 size=0 card=1)(object id 1795724)
     126457          INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=1352558 pr=549 pw=0 time=23078816 us cost=1 size=0 card=1)(object id 1795724)
     126457         TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=42926 pr=54 pw=0 time=94791 us cost=2 size=354 card=1)

 Rows (max)  Row Source Operation
 ----------  ---------------------------------------------------
          0  UPDATE  CXT_FAKT_PROVISIONSBUCHUNG (cr=89894995 pr=1701 pw=0 time=318766975 us)
      15456   TABLE ACCESS FULL CXT_FAKT_PROVISIONSBUCHUNG (cr=1328 pr=1031 pw=0 time=46975 us cost=370 size=880992 card=15456)
      11225   VIEW  CXV_HAUPT_VU_SPARTE (cr=89893656 pr=670 pw=0 time=1553653734 us cost=11 size=4293 card=1)
      11225    SORT UNIQUE (cr=89893656 pr=670 pw=0 time=1553640071 us cost=11 size=419 card=1)
     126457     VIEW  (cr=89893656 pr=670 pw=0 time=1533733864 us cost=10 size=419 card=1)
     126457      WINDOW SORT (cr=89893656 pr=670 pw=0 time=1533646166 us cost=10 size=155 card=1)
     126457       WINDOW SORT (cr=89893656 pr=670 pw=0 time=1532847028 us cost=10 size=155 card=1)
     126457        NESTED LOOPS  (cr=89893656 pr=670 pw=0 time=1532238656 us)
    3501658         NESTED LOOPS  (cr=89167767 pr=665 pw=0 time=1529652013 us cost=8 size=155 card=1)
    5300707          NESTED LOOPS  (cr=1339312 pr=480 pw=0 time=9657987 us cost=5 size=81 card=1)
      11225           TABLE ACCESS BY INDEX ROWID POP_INFO (cr=22473 pr=32 pw=0 time=119070 us cost=2 size=16 card=1)
      11225            INDEX UNIQUE SCAN PK_POP_INFO (cr=11248 pr=3 pw=0 time=54707 us cost=1 size=0 card=1)(object id 1790009)
    5300707           TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=1316839 pr=448 pw=0 time=8359987 us cost=3 size=65 card=1)
    5300707            INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=140971 pr=87 pw=0 time=3603882 us cost=2 size=0 card=1)(object id 1795724)
    3501658          INDEX RANGE SCAN IDX_TMP_VU_SPARTE (cr=87828455 pr=185 pw=0 time=1518016475 us cost=2 size=0 card=1)(object id 1795724)
     126457         TABLE ACCESS BY INDEX ROWID TMP_VU_SPARTE (cr=725889 pr=5 pw=0 time=1829196 us cost=3 size=74 card=1)

As you can see, the first run took 48 seconds (time=47,745,718 us in the first line) while the second execution took 319 seconds (time=318766975 us). If you check the execution plans carefully they appear to be the same plan and, in fact, the trace file showed that the two plans had the same plan hash value. Clearly, though, they do vastly different amounts of work – the most eye-catching detail, perhaps, is the way the bad plan blows the row count up to 5 million before collapsing it back to 126,000). What do you have to do to get the change in performance (and it’s a totally reproducible change) – create or drop stats: if you have stats on the tables you get a slow execution, if you delete the stats you get the fast execution.

So what’s the problem ? Look carefully at the plan(s) – they’re not actually the same plan, but you can’t see the difference you can only see clues that they must be different. Notice in the last four lines that you access the same table (TMP_VU_SPARTE) twice using the same index (IDX_TMP_VU_SPARTE) – when you collect stats you access the two tables in the opposite order, and it makes a difference to the work you do.

To demonstrate the point I’ve created a simplified model of the problem, based on some extra information supplied in the mail thread. The model requires a correlated update, based on a view which joins a table to itself, and range-based predicates. Here’s the data generation:


execute dbms_random.seed(0)

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1e4)) contract,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_from,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_to,
	sysdate - 300 + trunc(dbms_random.value(0,300))	created,
	sysdate - 300 + trunc(dbms_random.value(0,300))	replaced
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

create index t1_i1 on t1(contract, date_from, date_to);

create or replace view v1
as
select
	t1a.*
from
	t1	t1a,
	t1	t1b
where
	t1b.contract    = t1a.contract
and	t1b.date_from  <= t1a.date_from
and	t1b.date_to    >  t1a.date_from
and	t1b.created    <  t1a.replaced
and	t1b.replaced   >  t1a.created
;

rem
rem	We are going to update t2 from v1 using
rem	equality on all columns in the index
rem

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1e4)) 		contract,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_from,
	sysdate - 300 + trunc(dbms_random.value(0,300))	date_to,
	sysdate - 300 + trunc(dbms_random.value(0,300))	replaced
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5 ; 

begin 	dbms_stats.gather_table_stats(
 		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 =>'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 =>'for all columns size 1'
	);

end;
/

I haven’t crafted my data particularly carefully and from the point of view of realism the content is a little bizarre (I’ve got “to” dates earlier than “from” dates, for example) – but all I’m interested in is getting the right sort of shape to demonstrate a point about the plan, I’m not trying to model the actual variation in activity.

I’ve created a view with a self-join that starts with equality on first column of the index I’ve created, but uses range-based predicates on the other columns in the table – that’s probably quite important as far as the real-world performance was concerned – partly because of the nature of the data (interesting skews when comparing “valid to/from dates”) and partly because Oracle is going to have to use its 5% guess for join selectivities for range based selectivities.

And now the update – two versions with their execution plans; starting with the plan where the correlated subquery visits t1 aliased as t1a first:

explain plan for
update t2 set
	replaced = (
		select
			/*+ leading(v1.t1a v1.t1b) */
			max(replaced)
		from	v1
		where
			v1.contract = t2.contract
		and	v1.date_from = t2.date_from
		and	v1.date_to = t2.date_to
	)
;

select * from table(dbms_xplan.display(null,null,'alias'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027

-----------------------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                |       |   100K|  2734K|   500K (20)| 00:41:41 |
|   1 |  UPDATE                         | T2    |       |       |            |          |
|   2 |   TABLE ACCESS FULL             | T2    |   100K|  2734K|    64   (8)| 00:00:01 |
|   3 |   SORT AGGREGATE                |       |     1 |    72 |            |          |
|   4 |    NESTED LOOPS                 |       |       |       |            |          |
|   5 |     NESTED LOOPS                |       |     1 |    72 |     4   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID| T1    |     1 |    36 |     2   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN          | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  9 |     TABLE ACCESS BY INDEX ROWID | T1    |     1 |    36 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - UPD$1
   2 - UPD$1        / T2@UPD$1
   3 - SEL$F5BB74E1
   6 - SEL$F5BB74E1 / T1A@SEL$2
   7 - SEL$F5BB74E1 / T1A@SEL$2
   8 - SEL$F5BB74E1 / T1B@SEL$2
   9 - SEL$F5BB74E1 / T1B@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND
              "T1A"."DATE_TO"=:B3)
   8 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
              "T1B"."DATE_FROM"<=:B3)
       filter("T1B"."DATE_TO">:B1 AND "T1B"."CONTRACT"="T1A"."CONTRACT" AND
              "T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND "T1B"."DATE_TO">"T1A"."DATE_FROM")
   9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND
              "T1B"."REPLACED">"T1A"."CREATED")

Notice that there’s nothing in the body of the plan that tells you which copy of t1 is visited first – you can’t tell unless you look carefully at the predicate information, or unless you’ve requested the alias section that lets you see very easily that the t1 at line 6 (the first one accessed) is aliased as t1a and the t1 at line 9 is aliased as t1b. While you’re at it, check the plan hash value from the top of the plan output.

Now the plan when we hint the two tables into the reverse order:

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------
Plan hash value: 2328412027

-----------------------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                |       |   100K|  2734K|   600K (17)| 00:50:01 |
|   1 |  UPDATE                         | T2    |       |       |            |          |
|   2 |   TABLE ACCESS FULL             | T2    |   100K|  2734K|    64   (8)| 00:00:01 |
|   3 |   SORT AGGREGATE                |       |     1 |    72 |            |          |
|   4 |    NESTED LOOPS                 |       |       |       |            |          |
|   5 |     NESTED LOOPS                |       |     1 |    72 |     5   (0)| 00:00:01 |
|   6 |      TABLE ACCESS BY INDEX ROWID| T1    |     1 |    36 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN          | T1_I1 |     1 |       |     2   (0)| 00:00:01 |
|*  8 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1   (0)| 00:00:01 |
|*  9 |     TABLE ACCESS BY INDEX ROWID | T1    |     1 |    36 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - UPD$1
   2 - UPD$1        / T2@UPD$1
   3 - SEL$F5BB74E1
   6 - SEL$F5BB74E1 / T1B@SEL$2
   7 - SEL$F5BB74E1 / T1B@SEL$2
   8 - SEL$F5BB74E1 / T1A@SEL$2
   9 - SEL$F5BB74E1 / T1A@SEL$2

Predicate Information (identified by operation id):
---------------------------------------------------
   7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND
              "T1B"."DATE_FROM"<=:B3)
       filter("T1B"."DATE_TO">:B1)
   8 - access("T1B"."CONTRACT"="T1A"."CONTRACT" AND "T1A"."DATE_FROM"=:B1 AND
              "T1A"."DATE_TO"=:B2)
       filter("T1A"."CONTRACT"=:B1 AND "T1B"."DATE_FROM"<="T1A"."DATE_FROM" AND
              "T1B"."DATE_TO">"T1A"."DATE_FROM")
   9 - filter("T1B"."CREATED"<"T1A"."REPLACED" AND
              "T1B"."REPLACED">"T1A"."CREATED")

The plan hash value is the same – but if you look at the alias section you can see that the t1 we access first is the one with alias t1b.
Now compare the predicate sections, just for line 7 (the initial index access line):

First Plan
   7 - access("T1A"."CONTRACT"=:B1 AND "T1A"."DATE_FROM"=:B2 AND "T1A"."DATE_TO"=:B3)

Second Plan
   7 - access("T1B"."CONTRACT"=:B1 AND "T1B"."DATE_TO">:B2 AND "T1B"."DATE_FROM"<=:B3)
       filter("T1B"."DATE_TO">:B1)

Although my model data is random garbage, it’s easy to imagine that with a large data set the selectivities of these two predicates could be dramatically different, and there is clearly some “real-world” meaning to the date_to/date_from columns for a given row that the optimizer is unlikely to recognise when looking at the individual column stats for the table. It’s not surprising that a plan with no stats (which results in dynamic sampling) could find a better plan than a table with stats that leaves the optimizer using its default “call it 5% and hope for the best” strategy for range-based joins.

Conclusion

When you reference a table more than once in an execution plan, make sure you look very carefully at the predicate section – or even call for the alias section – so that you know exactly which copy of the table appears at which point in the plan.

Footnote

Although the results don’t mean much for my example, here’s my output from tracing my queries – rather than report the whole query in each case, I’ve just given the hint to show the table order:


/*+ leading(v1.t1a v1.t1b) */

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      6.01       0.00          0     200722     204119      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      6.01       0.00          0     200722     204119      100000

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=200773 pr=0 pw=0 time=0 us)
    100000     100000     100000   TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
    100000     100000     100000   SORT AGGREGATE (cr=200248 pr=0 pw=0 time=0 us)
         0          0          0    NESTED LOOPS  (cr=200248 pr=0 pw=0 time=0 us)
         0          0          0     NESTED LOOPS  (cr=200248 pr=0 pw=0 time=0 us cost=4 size=72 card=1)
         0          0          0      TABLE ACCESS BY INDEX ROWID T1 (cr=200248 pr=0 pw=0 time=0 us cost=2 size=36 card=1)
         0          0          0       INDEX RANGE SCAN T1_I1 (cr=200248 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0      INDEX RANGE SCAN T1_I1 (cr=0 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0     TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)

/*+ leading(v1.t1b v1.t1a) */

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      6.09       0.00          0     613372     200000      100000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      6.09       0.00          0     613372     200000      100000

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  UPDATE  T2 (cr=613372 pr=0 pw=0 time=1 us)
    100000     100000     100000   TABLE ACCESS FULL T2 (cr=459 pr=0 pw=0 time=0 us cost=64 size=2800000 card=100000)
    100000     100000     100000   SORT AGGREGATE (cr=612913 pr=0 pw=0 time=1 us)
         0          0          0    NESTED LOOPS  (cr=612913 pr=0 pw=0 time=1 us)
         0          0          0     NESTED LOOPS  (cr=612913 pr=0 pw=0 time=1 us cost=5 size=72 card=1)
    166078     166078     166078      TABLE ACCESS BY INDEX ROWID T1 (cr=368051 pr=0 pw=0 time=0 us cost=3 size=36 card=1)
    166078     166078     166078       INDEX RANGE SCAN T1_I1 (cr=202108 pr=0 pw=0 time=0 us cost=2 size=0 card=1)(object id 81082)
         0          0          0      INDEX RANGE SCAN T1_I1 (cr=244862 pr=0 pw=0 time=0 us cost=1 size=0 card=1)(object id 81082)
         0          0          0     TABLE ACCESS BY INDEX ROWID T1 (cr=0 pr=0 pw=0 time=0 us cost=2 size=36 card=1)

There isn’t a lot of difference in the run-time (which is mostly due to the tablescan) – but there’s an obvious difference in the number of buffer visits and the amount of data found for the first t1 access.
Just like the OP my plan varied with stats – though in my case I got the better plan when I had stats, and the worse plan when I deleted stats and the optimizer used dynamic sampling.

8 Comments »

  1. Coincidentally Kerry Osborne is writing up an example of a plan not changing, the plan hash value not changing, but a significant change in performance caused by the predicate order changing; part 1 is here: http://kerryosborne.oracle-guy.com/2013/06/sql-gone-bad-but-plan-not-changed , with part 2 to follow.

    Comment by Jonathan Lewis — June 7, 2013 @ 7:35 pm BST Jun 7,2013 | Reply

  2. What would be ways of fixing these up when it happens on Prod systems which has stats and optimizer does weird things: Passing Hints,Creating Profiles is that they way out.

    Thank you for detailing out the problem in such a clear way

    Comment by Mah — June 8, 2013 @ 1:44 am BST Jun 8,2013 | Reply

  3. I wondered if this might be a case where the plan_hash_value was the same but the v$sql_plan.other_xml.plan_hash_2 might have been different – it wasn’t.

    Comment by Dom Brooks — June 10, 2013 @ 9:07 am BST Jun 10,2013 | Reply

  4. I had a similar case in that the predicates were different but the plan_hash_values where the same. It was very confusing until I looked at the predicates:

    http://www.bobbydurrettdba.com/2012/12/04/index-causes-poor-performance-in-query-that-doesnt-use-it/

    – Bobby

    Comment by Bobby Durrett — June 11, 2013 @ 11:59 pm BST Jun 11,2013 | Reply

    • Bobby,

      Thanks for the link.
      It’s surprising how comomonly you can find optimizer anomalies when you start mixing Oracle features. Your example of mixing function-based indexes and IOTs looks like a typical case of a developer not thinking through all the variations and boundary conditions. (There’s a degree of similarity between your example and an example – fixed in 10.2 – that I posted a while back that didn’t include IOTs: http://jonathanlewis.wordpress.com/2011/12/30/fbi-bug-2/ )

      Comment by Jonathan Lewis — June 17, 2013 @ 8:11 am BST Jun 17,2013 | Reply

  5. access(“T1B”.”CONTRACT”=:B1 AND “T1B”.”DATE_TO”>:B2 AND “T1B”.”DATE_FROM”:B1)

    just wondering, why same bind variable :B1 is use for 2 different columns (“T1B”.”CONTRACT”=:B1 during access to Index and than later in filter “T1B”.”DATE_TO”=:B1) even though they are of different data type? or i miss something

    Thank you,

    Comment by Henish — June 18, 2013 @ 3:51 am BST Jun 18,2013 | Reply

    • Henish,

      I hadn’t spotted that little detail. My first thought was that it was a simple artefact of the way “explain plan” works, so I re-ran the test and pulled the plans from v$sql_plan (dbms_xplan.display_cursor) just to cross-check – and found that the same effect appeared.

      The “bind variables” aren’t really bind variables, of course, even though they look like the bind variables you see when pl/sql variables are substituted into embedded static SQL; they’re merely place-holders representing the correlated columns from the outer query (and the “peeked_binds” option won’t show any values). I’d guess that the code to produce the predicate section simply doesn’t “bother” to work out if a particular correlated column has been used elsewhere in the subquery, it’s just using something to show that there is a link.

      Comment by Jonathan Lewis — June 19, 2013 @ 9:30 am BST Jun 19,2013 | Reply

  6. […] recently, two interesting blog articles, here and here, have been published to emphasize the possibility of having a same plan hash value for […]

    Pingback by Different sql id, different force matching signature, different rows processed with the same plan hash value | Mohamed Houri’s Oracle Notes — June 19, 2013 @ 9:07 am BST Jun 19,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers