Oracle Scratchpad

February 3, 2011

Upgrade issues

Filed under: CBO,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 6:41 pm BST Feb 3,2011

Here’s an example of how a bug-fix can create problems. It’s a code change in 11.2.x.x and (I believe) 10.2.0.5 relating to the costing of queries involving (but perhaps not restricted to) composite partitioned tables. I first saw this change in an email from Doug Burns, who sent me the 10053 traces from a very simple query that had started using the wrong index after an upgrade from 10.2.0.4 to 11.2.0.2.

As part of his testing he had set the optimizer_features_enable parameter back to 10.2.0.4 and found that not only did the choice of index change back to the expected index, but the costs of the two indexes changed dramatically. (The cost of using the “right” index changed from 15 to something in excess of 9,000 on the upgrade!)

The example I’m going to show you demonstrates the principles of the cost change – but with the data set and statistics I’ve generated you won’t see a change of execution plan. This is typical of some of the models I create – it’s enough to establish a principle, after which it’s simple enough to recognise the problems that the principle can cause.

So here’s a data set. I created it in a default 11.2.0.2 install on Windows 32-bit, so it’s running with 8KB blocks, autoallocate extents, and ASSM – but I’ve disabled CPU costing:

create table t1(
	list_col	number,
	range_col	number,
	v100		varchar2(10),
	v10		varchar2(10),
	vid		varchar2(10),
	padding		varchar2(100)
)
partition by range(range_col)
subpartition by list (list_col)
subpartition template (
	subpartition s0		values(0),
	subpartition s1		values(1),
	subpartition s2		values(2),
	subpartition s3		values(3),
	subpartition s4		values(4),
	subpartition s5		values(5),
	subpartition s6		values(6),
	subpartition s7		values(7),
	subpartition s8		values(8),
	subpartition s9		values(9)
)
(
	partition p_10 values less than (10),
	partition p_20 values less than (20),
	partition p_30 values less than (30),
	partition p_40 values less than (40),
	partition p_50 values less than (50),
	partition p_60 values less than (60),
	partition p_70 values less than (70),
	partition p_80 values less than (80),
	partition p_90 values less than (90),
	partition p_100 values less than (100)
)
;

insert into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	mod(rownum,10),
	mod(rownum,100),
	lpad(mod(rownum,10),10),
	lpad(mod(rownum,100),10),
	lpad(rownum,10),
	rpad('x',100)
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000
;

alter table t1 add constraint t1_pk primary key(v10, vid, v100, range_col, list_col) using index local;

create index t1_one_col on t1(v100) local;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'

	);
end;
/

You’ll notice that one of the indexes I’ve created has only one column, while the other has five columns (including the single column of the smaller index). Since I’ve created only 1,000,000 rows spread across 100 partitions every partition is small and the corresponding index partitions are also small, so I’m going to create use dbms_stats.get_index_stats() and dbms_stats.set_index_stats() to make the indexes appear much larger. Specifically I will set the blevel on the single column index to 2, and the blevel on the multi-column index to 3. (This difference in settings isn’t just whimsy, it helps to emulate Doug’s problem.) I’ve previously published the type of code to make this possible; in this case I only set the table-level stats because the queries I’m going to use will trigger Oracle to use just the table-level stats.

select
	/*+ index(t1(v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

select
	/*+ index(t1(v10, vid, v100)) */
	count(*)
from
	t1
where
	v10 = '0000000005'
and	v100 = '0000000005'
;

I’m going to show you two sets of execution plans. The first one is where I’ve set optimizer_features_enable to ’10.2.0.4′, the second it where I’ve left it to default.

10.2.0.4 execution plans:
===================
---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2350 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2350 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2350 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   330 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   100 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   100 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   100 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

11.2.0.2 execution plans:
===================
Execution Plan
----------------------------------------------------------
Plan hash value: 3019183742

---------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name       | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |            |     1 |    22 |  2530 |       |       |
|   1 |  SORT AGGREGATE                      |            |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|   3 |    PARTITION LIST ALL                |            |  1000 | 22000 |  2530 |     1 |    10 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1         |  1000 | 22000 |  2530 |     1 |   100 |
|*  5 |      INDEX RANGE SCAN                | T1_ONE_COL |   100K|       |   510 |     1 |   100 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("V10"='0000000005')
   5 - access("V100"='0000000005')

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost  | Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    22 |   370 |       |       |
|   1 |  SORT AGGREGATE      |       |     1 |    22 |       |       |       |
|   2 |   PARTITION RANGE ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|   3 |    PARTITION LIST ALL|       |  1000 | 22000 |   370 |     1 |    10 |
|*  4 |     INDEX RANGE SCAN | T1_PK |  1000 | 22000 |   370 |     1 |   100 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("V10"='0000000005' AND "V100"='0000000005')
       filter("V100"='0000000005')

Regardless of the setting of optimizer_features_enable, if I hint the single-column index I have a plan that visits each partition and subpartition in turn for the index then table; and if I hint the primary key I do that same walk through each partition and subpartition and visit just the index.

The big difference is in the cost. By default, both execution plans cost more in 11.2.0.2 – but the odd thing is that the change in costs is different. The cost of using the single-column index has gone up by 180, the cost of using the primary key index has gone up by 270; and the change in costs can be attributed completely to the indexes.

Is there a pattern in the change ? Yes, it’s not immediately obvious, of course, and needs a few extra tests to confirm it, but the change in cost can be calculated as: (number of range partitions * (number of list partitions – 1) * blevel of index).

      For the single column index this is: 10 * 9 * 2 = 180
    For the primary key index this is: 10 * 9 * 3 = 270

You don’t have to see an example of a change in plan actually happening once you’ve seen this change in arithmetic. In certain circumstances the cost of using composite partitions goes up when you upgrade to 11.2 – and the change in cost is proportional to the number of data segments (subpartitions) and the blevel of the indexes.

In Doug’s case he had a table with roughly 4,000 subpartitions in total – and he had a single column index with a blevel of 2, and a multi-column index with a blevel of 3: so the cost of using the (smaller, but less efficient) index went up by about 8,000 and the cost of using the (larger but efficient) index went up by about 12,000. The difference of 4,000 between these two increases was far higher than the original cost of using the bad index – so Oracle switched from the good index to the bad index.

The nasty thing about this problem is that it’s correcting an error – 11.2 is allowing for the cost of probing every single index subpartition, that’s what the extra multiples of the blevel represent; unfortunately the optimizer’s underlying paradigm of “every block visit is a disk I/O” makes this correction a threat.

Solutions and Workarounds

There is an /*+ optimizer_features_enable(‘x.x.x.x’) */ hint that could be used if the problem applies to just a couple of queries.

If the problem applies to classes of queries involving a couple of big tables and indexes you could use dbms_stats.set_index_stats() to adjust the blevel of the critical indexes.

If the problem appears all over the place you could set the optimizer_features_enable parameter to ’10.2.0.4′ in the parameter file, or in a session logon trigger.

If the problem appears all over the place, but there are other features of the 11.2.0.2 optimizer that are very helpful you could take advantage of the “_fix_control” parameter – after talking to Oracle support. This particular problem comes under bug fix 7132684, which appears in v$system_fix_control under 10.2.0.5, with the description “Fix costing for non prefixed local index”. If you’ve got the right symptoms, then the following statement will bypass the problem:

alter session set "_fix_control"='7132684:OFF';

Final Warning:

This note comes from 30 minutes looking at Doug’s trace files, one hour building a test case, and an hour spent writing this blog. This is not a complete determination of the problem it’s just an initial outline of what can happen and the symptoms to watch out for, posted as an early warning to help a few people save some time. Do not treat this note as a definitive reference.

7 Comments »

  1. it helps to emulate Doug’s problem

    I think it will take a much better test case than this to cover the full range of my problems ;-)

    But thanks a lot for looking at this and the detailed explanation.

    Comment by Doug Burns — February 3, 2011 @ 11:57 pm BST Feb 3,2011 | Reply

  2. Nicely done. One additional solution (with all the usual caveats about lying to the CBO to fix one problem may cause other, potentially worse and more numerous problems) is to set the blevel to match (probably setting the one column index UP rather than making the PK look excessively cheap, since if you’re that close on the cost estimate on a plan using the single column index versus a table scan it probably won’t hurt too much to do a table scan, and the PK winning over the single column index when it shouldn’t probably won’t hurt too much.)

    Oracle publishing changes in factor weightings and formulas used to generate the estimated cost from release to release and in patches sure would be a nice thing to do.

    mwf

    Comment by Mark W. Farnham — February 4, 2011 @ 11:19 am BST Feb 4,2011 | Reply

  3. Thank you for the early warning, Jonathan! I’m on the verge of migrating a DB with sub-partitioning to 11.2.0.2, and this note might just save my day.
    Kind regards,
    Uwe

    Comment by oraculix — February 4, 2011 @ 2:47 pm BST Feb 4,2011 | Reply

  4. [...] Of course, being Oracle, you may find that someone, somewhere, has exactly such a histogram that appears to work brilliantly for them – but that will be because the optimizer has messed up the arithmetic so much that they are getting a great execution plan for completely the wrong reason … so they need to watch out for the next upgrade or patch release in case the optimizer gets enhanced. [...]

    Pingback by Philosophy – 13 « Oracle Scratchpad — February 10, 2011 @ 2:58 pm BST Feb 10,2011 | Reply

  5. Thanks for the warning …

    Other than doing optimizer_features_enable, is there an expected patch from oracle to fix this issue?

    Comment by Val Kaviyarasan — February 28, 2011 @ 8:14 pm BST Feb 28,2011 | Reply

  6. I’ve just run this test on 10.2.0.5 and 11.2.0.2 – strangely 11.2.0.2 with optimizer features set to 10.2.0.5 gives me the new result, whereas the “real” 10.2.0.5 shows me the old-style, 10.2.0.4 result. I’m not sure if the 10.2.0.5 behaviour changes with some PSUs applied to it – nevertheless having no further PSUs or the like applied to both 11.2.0.2 and 10.2.0.5 there seems to be an inconsistency in results.

    Comment by Randolf Geist — July 26, 2011 @ 6:53 pm BST Jul 26,2011 | Reply

    • Randolf,

      Thanks for the note. It’s possible that Doug had various patches applied – I think I recall him moaning about the number of patches and fixes going on all the time.

      Comment by Jonathan Lewis — August 1, 2011 @ 11:08 am BST Aug 1,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,454 other followers