Here are a few thoughts on dbms_stats – in particular the procedure gather_index_stats.
The procedure counts the number of used leaf blocks and the number of distinct keys using a count distinct operation, which means you get an expensive aggregation operation when you gather stats on a large index. It would be nice efficiency feature if Oracle changed the code to use the new Approximate NDV mechanism for these counts.
Then there are a couple of additions I’d like to see to the actual stats gathered: the average space used by a key (including the rowid), calculated as sum(space used) / count(index entries), so that we could do a quick comparison of the index size compared to the space allocation without resorting to complex estimates. The code would, of course, have to handle the prefix and the tail for compressed indexes, so possibly it would be better to record the two figures for space used, and include the figure for number of prefix entries. Note, the number of prefix entries is not the same as the number of distinct values for prefixes, as the same combination of prefix values may appear in several consecutive leaf blocks – and this brings me to another feature enhancement, to match SQL Server.
For a multi-column index, how about a new structure to hold the number of distinct values for each of the possible prefixes for that index. This could piggy-back on the existing extended statistics feature, and wouldn’t be too expensive to operate if it used the approximate NDV mechanism; it could, of course, be a bit of a threat since this approach would be using some of the 1,000 column limit that you’re allowed for a single table; there are also a few messy details to handle relating to multiple indexes starting with the same combination of columns in different orders – and the side effects of dropping one index from a set with such an overlap.
Footnote 1: When gathering stats for a Unique index, the code copies num_rows to distinct_keys, so that’s one aggregation optimisation already in place.
Footnote 2: There may be some empty blocks which don’t get counted – see my book Cost Based Oracle – Fundamentals for the impact this can have on costing index fast full scans … I need to check if the optimizer still uses the number of used leaf blocks to cost the fast full scan, or whether it has switched to using the high water mark of the segment.