## March 9, 2012

Filed under: CBO,Oracle,Upgrades — Jonathan Lewis @ 6:05 pm BST Mar 9,2012

Listening to a presentation by Paul Matuszyk on extended statistics yesterday, I learned something that I should have spotted ages ago. Here’s a little demo script to introduce the point:

```create table t1
as
with generator as (
select	--+ materialize
rownum id
from dual
connect by
level <= 1000)
select
rownum			id,
mod(rownum,100)		n1,
mod(rownum,100)		n2,
mod(rownum,100)		n3,
from
generator	v1,
generator	v2
where
rownum <= 1000000;

create index t1_i1 on t1(n1, n2, n3);

-- collect stats, no histograms.

```

Notice that in any one row the values of n1, n2, and n3 are all the same, and given the way I’ve used the mod() function to generate the values there are only 100 possible combinations across the entire million rows. This fact will be echoed in the number of distinct keys in the (non-unique) index t1_i1.

Now take a look at the following queries:

```select
*
from
t1
where
n1 = 50
and	n2 = 50
and	n3 = 50
;

select
n1, n2, n3, max(small_vc)
from
t1
group by
n1, n2, n3
;

select
max(t1.small_vc),
max(t2.small_vc)
from
t1,
t1	t2
where
t2.n1 = t1.n1
and	t2.n2 = t1.n2
and	t2.n3 = t1.n3
;
```

Here are the execution plans for the queries – two plans per query, the first of each pair comes from 10.2.0.5, the second comes from 11.1.0.7. Spot the differences:

```Simple select
----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   125 |  2761 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   125 |  2761 |
----------------------------------------------------------

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1220K|  2761 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  1220K|  2761 |
----------------------------------------------------------

Simple aggregate
-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   500K|  9765K|       |  6012 |
|   1 |  HASH GROUP BY     |      |   500K|  9765K|    30M|  6012 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
-------------------------------------------------------------------

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   100 |  2000 |  5033 |
|   1 |  HASH GROUP BY     |      |   100 |  2000 |  5033 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    19M|  2761 |
-----------------------------------------------------------

Simple join
--------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    40 |       |  6495 |
|   1 |  SORT AGGREGATE     |      |     1 |    40 |       |       |
|*  2 |   HASH JOIN         |      |  1000K|    38M|    30M|  6495 |
|   3 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
|   4 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
--------------------------------------------------------------------

--------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |TempSpc| Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    40 |       |  1006K|
|   1 |  SORT AGGREGATE     |      |     1 |    40 |       |       |
|*  2 |   HASH JOIN         |      |    10G|   372G|    30M|  1006K|
|   3 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
|   4 |    TABLE ACCESS FULL| T1   |  1000K|    19M|       |  2761 |
--------------------------------------------------------------------

```

In 11g, the optimizer can use the number of distinct keys for an index in much the same way that it can take advantage of extended statistics (specifically in the “column group” variant)  to improve its estimates of cardinality for queries involving predicates on correlated columns. In this case the 10g code has seen three columns with 100 distinct values in each, from which is has inferred that there could be 1,000,000 combinations (100 * 100 * 100) in the table; but 11g has based its calculations on the known number of distinct keys in the index – which is just 100 – leading to dramatically different results.

### Further Notes:

The optimizer has used an “index sanity” check when calculating join cardinalities since (at least) Oracle 8i – but the check has been restricted to unique indexes. From 11g it is effectively in use for any index. You may not notice the effect because the effect probably won’t show up unless you have indexes (or column group statistics) at both ends of the join; in my case this happened by “accident” because I was joining the table to itself.

The code has been back-ported to 10.2.0.4 (and 10.2.0.5) but in those versions it requires you to set a value for the hidden _fix_control parameter (which you can do at the session or system level) to enable the feature. There are three levels for the setting (corresponding to my three queries) – single table cardinality, aggregation, and join cardinality. To enable the feature you need to execute a statement like:

```alter session set "_fix_control"='5765456:N';
```

The value for N can range from 0 to 7; it’s a bitmap and the effects are cumulative. The effects are listed in MoS Doc ID 5040753.8:

• bit 0 – use it for single table cdn estimate
• bit 1 – for group by (#of groups) estimate
• bit 2 – join selectivity

In some simple tests I’ve found that I had to set bit 0 as well as bit 2 to get join selectivity to work, so levels 5 or 7.

I’ve often demonstrated the effect of the index sanity check in my seminars and tutorials, with a follow-up comment to the effect that I don’t understand why the optimizer applies the mechanism only to unique indexes. From 11g, the restriction disappears. The thing is, as I also point out in the seminar, this makes it harder to work out if it’s safe to drop an index. Even when an index doesn’t appear in an execution plan Oracle may be using its number of distinct keys to evaluate and choose an execution plan. So, to guard against changes in plans when you drop a multi-column index, you might want to think about creating extended stats on a set of columns that matches the index definition before dropping the index.

## 1 Comment »

1. […] together the num_distinct of the separate columns. In earlier versions of Oracle there were some restrictions about uniqueness, but the limitations were removed in […]

Pingback by Index Sanity | Oracle Scratchpad — June 28, 2016 @ 8:43 am BST Jun 28,2016