A couple of days ago I wrote an article about Oracle * picking the “wrong index”* after an index rebuild, and I mentioned that the sample data I had generated looked a little odd because it came from a script I had been using to investigate a completely different problem. This note describes that other problem, which

*mailing list last month.*

**appeared on the Oracle-L**Stripped to a bare minimum, here’s the issue: we have a simple query against a single table with two indexes * IDX2_AUFTRAG(arsAuftragsNr, dategAuftragsNr)* and

*, and a predicate*

**IDX7_AUFTRAG(arsAuftragsNr)***“arsAuftragsNr = {constant}”*. Since the second column in the two-column index is irrelevant (we can’t use it to avoid visiting the table, and it’s not part of a

*group by*or

*order by*clause), and since adding a column to an index is likely to increase the

*and leaf block count of the index, we would probably expect to see Oracle pick the single column index as the path to the table – but it doesn’t, it picks the two-column index.*

**clustering_factor**

Here’s a little extract from the 10053 trace (supplied in the original posting) showing the proximal reason why Oracle selected the “wrong” index (I’ve added comments to remind you which index is which):

Access Path: index (RangeScan) Index: IDX2_AUFTRAG -- two columns resc_io: 186.00 resc_cpu: 1585830 ix_sel: 0.000034 ix_sel_with_filters: 0.000034 Cost: 94.49 Resp: 94.49 Degree: 1 Access Path: index (AllEqRange) Index: IDX7_AUFTRAG -- single column resc_io: 1027.00 resc_cpu: 8999338 ix_sel: 0.000241 ix_sel_with_filters: 0.000241 Cost: 521.97 Resp: 521.97 Degree: 1

Here’s the really surprising thing – the index selectivity (* ix_sel* – the selectivity of the predicates that can be used to identify the range of the index range scan, i.e. the fraction of the leaf blocks that Oracle has to walk through) varies dramatically between the indexes. Technically, of course, it is possible for this difference to be perfectly valid – and there’s always the possibility that one index has been rebuilt after a large change in the data – but the “instinctive” response to the difference (which looks particularly large) is that something strange has happened.

After debunking my initial “index rebuild” hypothesis” I wondered about the possibility of there being a histogram on the column and Oracle doing something a little strange as a result – for example using the histogram stats when calculating * ix_sel* for the single column index, but using 1/

*for the column when calculating the*

**num_distinct***for the two-column index; so I asked for the complete trace file, and this is the first thing I looked at:*

**ix_sel**Column (#5): NewDensity:0.000035, OldDensity:0.000294 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:28290 Column (#5): ARSAUFTRAGSNR( AvgLen: 6 NDV: 28290 Nulls: 1900300 Density: 0.000035 Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 254

Bingo – the critical column has a * height-balanced histogram*, and the old and new densities show a huge variation; and look at the coincidence when you compare the

*values with the two different densities:*

**ix_sel**- single column: ix_sel = 0.000241, OldDensity = 0.000294
- two columns: ix_sel = 0.000034, newDensity = 0.000035

So as a first approximation it looks as if the optimizer is using the * newDensity* for the calculation for the two-column index and the

*for the calculation for the single-column index. but why; and why were the figures a not accurate (I wasn’t particularly surprised by a small difference for the two-column example, it’s a fairly common – though always small – effect that I’d expect with nullable columns)*

**oldDensity**I had all the statistics for the data, so I spent the next couple of hours trying to reproduce a data set that would show the same pattern of behaviour – the following extract from the trace file shows you the data pattern that I was attempting to model; note particularly the number of rows, column types, number of distinct values, number of nulls (often very important), size and clustering factor of indexes.

BASE STATISTICAL INFORMATION *********************** Table Stats:: Table: AUFTRAG Alias: AUFTRAG #Rows: 7274050 #Blks: 329126 AvgRowLen: 325.00 ChainCnt: 0.00 Index Stats:: Index: IDX2_AUFTRAG Col#: 5 8 LVLS: 2 #LB: 21180 #DK: 237766 LB/K: 1.00 DB/K: 22.00 CLUF: 5395080.00 Index: IDX7_AUFTRAG Col#: 5 LVLS: 2 #LB: 16870 #DK: 28290 LB/K: 1.00 DB/K: 149.00 CLUF: 4221470.00 Access path analysis for AUFTRAG *************************************** SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for AUFTRAG[AUFTRAG] Column (#5): NewDensity:0.000035, OldDensity:0.000294 BktCnt:254, PopBktCnt:0, PopValCnt:0, NDV:28290 Column (#5): ARSAUFTRAGSNR( AvgLen: 6 NDV: 28290 Nulls: 1900300 Density: 0.000035 Histogram: HtBal #Bkts: 254 UncompBkts: 254 EndPtVals: 254

I just couldn’t get the same extreme behaviour while keeping the data set similar – my initial hypothesis must have been wrong, unless there was a particularly special set of circumstances that triggered the switch. Eventually I started checking through the whole trace file looking for any possible clues – I’d already checked the “parameters” section, of course, and the “fix_control” section just to make sure that there were no funny features enabled of disabled. When I started looking at the whole trace file, though, it took me about 30 seconds to find the following (reformatted) text in the * “Unparsed Query”* section of the trace file:

SELECT /*+ OPT_ESTIMATE (INDEX_SCAN "AUFTRAG" "IDX7_AUFTRAG" MIN=1756.000000 ) OPT_ESTIMATE (INDEX_FILTER "AUFTRAG" "IDX7_AUFTRAG" ROWS=1756.000000 ) */ {lots of columns} FROM "OPS$SCM"."AUFTRAG" "AUFTRAG" WHERE "AUFTRAG"."ARSAUFTRAGSNR"='000327' etc...

Something had modified the incoming SQL at optimisation time by injecting a few hints that had explicitly changed the selectivity of the two-column index, and I had been wasting my time trying to create a data set to generate some numbers that were basically completely arbitrary ! (There’s a moral there, somewhere; something about taking a quick look at the big picture before sticking your head down a rabbit-hole, or maybe not jumping to conclusions before considering a couple of possible alternatives.)

These hints are the type of hints that are injected by SQL Profiles or (in 11g) * cardinality feedback*. They simply correct the optimizer estimates as the optimizer is doing its thing. In this case I came to the conclusion that it had to be cardinality feedback – unless someone can give me a third option – because the

*“other_xml”*section of the final plan didn’t say anything about using an SQL Profile.

You can interpret these hints as follows:

- OPT_ESTIMATE (INDEX_SCAN “AUFTRAG” “IDX7_AUFTRAG” MIN=1756.000000 ): The range scan will examine 1,756 index entries
- OPT_ESTIMATE (INDEX_FILTER “AUFTRAG” “IDX7_AUFTRAG” ROWS=1756.000000 ): after index filter predicates 1,756 rowids will be used to visit the table.

Given we have 7,274,050 rows in the table the derived selectivity on the single-column index is 1756/7274959 = 0.0002414, resulting in a much higher cost than the cost of the two-column index which (correctly) was using the * newDensity*, adjusted slightly for nulls.

**Footnote:** I am a little surprised that the derived density didn’t seem to be using * table.num_rows* –

*as the denominator since (for a single column index) we need the fraction of the index which represents only the non-null entries for the column.*

**column.num_distinct**I think I’ve said it before – even if you do nothing else with a 10053 trace a quick glance at the “Unparsed Query” may give you some ideas: I should have followed my own advice sooner in this case.

## Comments and related questions are welcome.