Warning – make sure you read to the end of this post.
Someone sent me an email this morning asking how Oracle calculates the index cardinality of an index range scan. Now, as I’ve often said, I don’t do private email about Oracle – unless it just happens to catch my attention and looks like something that is sufficiently generic to be worth publishing.
Today’s emailer was a lucky winner – he’d sent me a very short email that took about 30 seconds to read, contained a significant error, and (at first sight) probably had the right information in it for me to work with. Here’s the problem, as a cutnpaste from an SQL*Plus session:
SQL> explain plan for select * from admlxa.QRT_BENCH where QRT_BENCH_DATE < :a3; Explained. Elapsed: 00:00:00.01 SQL> select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT  Plan hash value: 2896103184   Id  Operation  Name  Rows  Bytes  Cost (%CPU) Time    0  SELECT STATEMENT   424K 81M 3170 (4) 00:00:05   1  TABLE ACCESS BY INDEX ROWID QRT_BENCH  424K 81M 3170 (4) 00:00:05  * 2  INDEX RANGE SCAN  IDX_QRT_BENCH_1  78876   303 (5) 00:00:01   Predicate Information (identified by operation id):  2  access("QRT_BENCH_DATE"<:A3)
Look closely at the “Rows” column – there’s clearly a logic error appearing here. If you select 78,876 rowids from an index you can’t possibly acquire 424,000 rows from the table – so where have those two numbers come from ?
The supporting information looked like this:
num_rows for QRT_BENCH table = 8480798 num_distinct for QRT_BENCH_DATE column = 458 num_rows for IDX_QRT_BENCH_1 = 8763975 distinct_keys for IDX_QRT_BENCH_1 = 537
Of course, I really needed to know whether this was a singlecolumn or multicolumn index if I wanted to model the problem correctly and do further checks on when the error appeared, but this was good enough to get started. We note, first, that the 424K for the table cardinality is just the standard “5% due to range predicate with unknown value”: round(0.05 * 8480798) = 424040.

Step 1: since I don’t know where the number 78876 comes from, let’s try to work backwards – use it in a bit if arithmetic and see what drops out. Let’s try dividing it into the table cardinality: 424040 / 78876 = 5.3760
Step 2: Does 5.376 look familiar — it may be a coincidence, but isn’t that really close to 1% of the number of distinct keys in the index ?
At this point I don’t have any time to investigate in detail, but a scratch hypothesis is that Oracle is calculating something like: 5 * (number of rows in table / (number of distinct keys in index)); and maybe that magic five appears through a piece of code that takes 5%, but for some reason then divides by the 1% associated with the selectivity normally associated with function(col).
If I had the time (and the data set) I’d start playing with dbms_stats.set_index_stats() et.al. to see the effect of changing num_distinct and num_rows to see if my initial guess was somewhere in the right ballpark. As it is I’ve just emailed this note back to the source.
Erratum:
Before I published this note I got a reply from the original correspondent, with the following comment:

I think the answer has already been given by you here: Chapter 4: Index Selectivity (14th Dec 2005)
Sure enough – five years ago I had published some details about exactly this “feature” – and that 537 vs. 5.376 really was just a coincidence.
I decided to publish the note anyway for three reasons –

one: to make sure you realise that I do make mistakes
two: to show you that simple games with numbers may give you a working hypothesis
three: to remind you that once you’ve got a working hypothesis it’s often easy to think of ways to demonstrate that your hypothesis is wrong. (A couple of hacks of the statistics would have shown me a constant 0.009 appearing, rather than anything like 5% divided by 1%.)
Your honesty in publishing your “mistake” is well appreciated !
(Better than some web/forums posts where the author *deletes* the text of his original post after he realizes his mistake/miscalculation/error).
BTW, if you hadn’t completed your post with the Erratum, you’d have lost some of my respect with the phrase “maybe that magic five appears through ..” . “magic” is not “scientific” (Whatever be the definition of “scientific” and “engineering” that has been bandied around for few years) {and with apologies to the late Arthur C Clarke}.
Then, again “think of ways to demonstrate that your hypothesis is wrong.” is the scientific method.
Comment by Hemant K Chitale — February 11, 2011 @ 8:11 am GMT Feb 11,2011 