Oracle Scratchpad

March 9, 2012

Index Upgrades

Filed under: CBO,Oracle,Upgrades — Jonathan Lewis @ 6:05 pm GMT Mar 9,2012

Listening to a presentation by Paul Matuszyk on extended statistics yesterday, I learned something that I should have spotted ages ago. Here’s a little demo script to introduce the point:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1000)
select
	rownum			id,
	mod(rownum,100)		n1,
	mod(rownum,100)		n2,
	mod(rownum,100)		n3,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1000000;

create index t1_i1 on t1(n1, n2, n3);

-- collect stats, no histograms.


Notice that in any one row the values of n1, n2, and n3 are all the same, and given the way I’ve used the mod() function to generate the values there are only 100 possible combinations across the entire million rows. This fact will be echoed in the number of distinct keys in the (non-unique) index t1_i1.

Now take a look at the following queries:

select
	*
from
	t1
where
	n1 = 50
and	n2 = 50
and	n3 = 50
;

select
	n1, n2, n3, max(small_vc)
from
	t1
group by
	n1, n2, n3
;

select
	max(t1.small_vc),
	max(t2.small_vc)
from
	t1,
	t1	t2
where
	t2.n1 = t1.n1
and	t2.n2 = t1.n2
and	t2.n3 = t1.n3
;

Here are the execution plans for the queries – two plans per query, the first of each pair comes from 10.2.0.5, the second comes from 11.1.0.7. Spot the differences:

Simple select
----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   125 |  2761 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   125 |  2761 |
----------------------------------------------------------

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1220K|  2761 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  1220K|  2761 |
----------------------------------------------------------

Simple aggregate
-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       |  6012 |
|   1 |  HASH GROUP BY     |      |   500K|  9765K|    30M|  6012 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
-------------------------------------------------------------------

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |  2000 |  5033 |
|   1 |  HASH GROUP BY     |      |   100 |  2000 |  5033 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    19M|  2761 |
-----------------------------------------------------------

Simple join
--------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    40 |       |  6495 |
|   1 |  SORT AGGREGATE     |      |     1 |    40 |       |       |
|*  2 |   HASH JOIN         |      |  1000K|    38M|    30M|  6495 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
|   4 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
--------------------------------------------------------------------

--------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    40 |       |  1006K|
|   1 |  SORT AGGREGATE     |      |     1 |    40 |       |       |
|*  2 |   HASH JOIN         |      |    10G|   372G|    30M|  1006K|
|   3 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
|   4 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
--------------------------------------------------------------------

In 11g, the optimizer can use the¬†number of distinct keys for an index in much the same way that it can take advantage of extended statistics (specifically in the “column group” variant)¬† to improve its estimates of cardinality for queries involving predicates on correlated columns. In this case the 10g code has seen three columns with 100 distinct values in each, from which is has inferred that there could be 1,000,000 combinations (100 * 100 * 100) in the table; but 11g has based its calculations on the known number of distinct keys in the index – which is just 100 – leading to dramatically different results.

Further Notes:

The optimizer has used an “index sanity” check when calculating join cardinalities since (at least) Oracle 8i – but the check has been restricted to unique indexes. From 11g it is effectively in use for any index.

The code has been back-ported to 10.2.0.4 (and 10.2.0.5) but requires you to set a value for the hidden _fix_control parameter (which you can do at the session or system level) to enable the feature. There are three levels for the setting (corresponding to my three queries) – single table cardinality, aggregation, and join cardinality. To enable the feature you need to execute a statement like:

alter session set "_fix_control"=''5765456:N';

The value for N can be 1, 3, or 7 (It’s a bitmap, but the effects are cumulative – the values are 1, 1 + 2, 1 + 2 + 4).

I’ve often demonstrated the effect of the index sanity check in my seminars and tutorials, with a follow-up comment to the effect that I don’t understand why the optimizer only applies the mechanism to unique indexes. From 11g, it doesn’t; the thing is, as I also point out in the seminar, this makes it harder to work out if it’s safe to drop an index. Even when an index doesn’t appear in an execution plan Oracle may be using its number of distinct keys to evaluate and choose an execution plan. So, to guard against changes in plans when you drop a multi-column index, you might want to think about creating extended stats on a set of columns that matches the index definition before dropping the index.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,429 other followers