I think your first couple of statements summarise the informal argument about why we generally assume a two-column index will have a lower clustering_factor than a three-column index – the inclusion of the rowid in the index entry is very important, so if we have six rows with the same (two-column) key, three rows in each of two blocks (call them B1, B2) the block component of the index entries would be for (B1, B1, B1, B2, B2, B2). Adding an extra column to the index could then very easily change the order of visiting the rows to (B1, B2, B1, B2, B1, B2).

On the other hand, as you point out, there are patterns of data where an odd synchronsiation of values and block locations could contradict the general intuition.

]]>Your blog highlights an interesting side-effect of the flaw in the optimizer’s model when it comes to using the clustering_factor. (It also fit quite nicely with a little posting I did recently about enhancing index statistics to include * figures about every prefix* combination.)

The mixture of local and global indexes is certainly a case where the side effects on clustering_factor can be very counter-intuitive.

On a completely different tack, creating a hash-partitioned index on a non-partitioned table could also result in less contention for popular blocks on insertion, which can avoid a bug that causes indexes to become much bigger than they need to. There’s lots of room for discusion when looking at partitioned indexes.

]]>I wasn’t thinking about bitmap indexes when I posed the question, but my first thought is that if (‘x’,’y’) is a key value in the two-column index then it may have one index entry; if you add a third column you may have N index entries like (‘x’,’y’,’a’), (‘x’,’y’,’b’) – with a corresponding increase in the clustering_factor. This won’t change the cost of using the index, of course, since the clustering_factor of a bitmap index isn’t used in the cost. ]]>

As Valentin has pointed out, compression won’t affect the

Another good point – it is possible to find edge cases (particularly with a small number of index entries) where the order of data arrival can have a surprising impact. In your case the effect is also largely dependent on the great length of the table rows.

]]>srivenu ]]>

I’m working on a test case scenario (looks tricky!)

I posted something recently

http://srivenukadiyala.wordpress.com/2011/12/25/optimizer-might-ignore-a-more-suitable-superset-composite-index/

regards

srivenu ]]>

in a two column index, you can compress the first column

in a three column index (depending on the density of the second column) you can compress the first two columns.

Still, that will not be the deciding factor.

Fact is: third column value has to be stored in the index. Clustering factor increases due to the number of combinations.

Maybe i’m wrong, but that’s how i see it.

Alex

If the data is not bulk loaded and the third column increases for each insert (for example insertion date) I would expect a better CF for the 3-column table. ]]>