As in – how come a unique (or primary key) index is predicted to return more than one row using a unique scan. Here’s and example (running on 10.2.0.3 – but the same type of thing happens on newer versions):
create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level <= 1e4
)
select
rownum id,
rownum n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1e4
;
create table t2
as
select *
from t1
where
mod(id,10) = 0
;
-- do something about stats
alter table t1 add constraint t1_pk primary key (id) using index (
create unique index t1_pk on t1(id)
)
;
select
t2.*
from
t2, t1
where
t2.id between 100 and 200
and t1.id = t2.n1
and t1.n1 is not null
;
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 3048 | 16 |
| 1 | NESTED LOOPS | | 24 | 3048 | 16 |
|* 2 | TABLE ACCESS FULL | T2 | 12 | 1428 | 4 |
|* 3 | TABLE ACCESS BY INDEX ROWID| T1 | 2 | 16 | 1 |
|* 4 | INDEX UNIQUE SCAN | T1_PK | 1 | | |
----------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("T2"."ID">=100 AND "T2"."ID"<=200)
3 - filter("T1"."N1" IS NOT NULL)
4 - access("T1"."ID"="T2"."N1")
Notice line 3 – the table access to t1 driven by line 4 (the unique scan of unique index t1_pk) which is predicted to return 2 rows. Clearly something has gone wrong with the optimizer since this is not a possible outcome at runtime.
There are probably a number of paths through the optimizer code that could result in unexpected results like this, and many of them are probably “emergent properties” rather than simple behaviour (for example, the combination of a subquery factoring clause that has been moved inline where it has then accepted a pushed join predicate might lead to an odd combination of calculations that doesn’t make sense). Before getting too excited, though, if you’re seeing an anomaly like this it’s worth taking a quick look at the statistics, you may simply have an unlucky variation in the timing and sample sizes used that happens to have produced a quirky bit of arithmetic.
Note in the example above the place where I commented: “do something about stats”. Normally that line would read “gather stats at this point”; in this case I did something a little bizarre to make sure that the critical table, index, and columns statistics were not self-consistent. In effect I managed to create some stats that said: “every distinct value for the id column will identify two rows – even though it’s got a unique constraint”. Naturally the optimizer arithmetic “believed” the statistics.
I don’t think the specific scenario I engineered in my example can actually happen in newer versions of Oracle – so I’m not going to tell you exactly what I did to force it – but do remember that odd effects can appear simply because the stats collection didn’t quite work the way you wanted.
In 11g, of course, you can test this hypothesis fairly simply by collecting pending stats, and then enabling them in your own session to see what effect they have, without running the risk of damaging public execution plans. It’s also possible that if you use the auto_sample_size in 11g with the approximate NDV feature enabled then this problem will be less likely to appear.

I don’t see any DDL in your script to create that index (T1_PK)?
Comment by Colin 't Hart — June 9, 2012 @ 2:25 pm UTC Jun 9,2012 |
Colin,
Thanks for that. In my scripts it’s after the calls to dbms_stats, and I forgot copy it.
Now updated.
Comment by Jonathan Lewis — June 10, 2012 @ 12:36 pm UTC Jun 10,2012 |
Reblogged this on lava kafle kathmandu nepal.
Comment by lkafle — June 10, 2012 @ 3:30 am UTC Jun 10,2012 |
> In 11g, of course, you can test this hypothesis fairly simply by collecting pending stats, and then enabling them in your own session to see what effect they have, without running the risk of damaging public execution plans
Just a word of caution: At least up to 11.2.0.2 not all DBMS_STATS call are “pending” statistics mode aware, which means that some statistics manipulations will still be immediately visible although PENDING statistics have been enabled for a particular table, More details on my post:
http://oracle-randolf.blogspot.de/2011/01/pending-statistics.html
Randolf
Comment by Randolf Geist — June 10, 2012 @ 1:34 pm UTC Jun 10,2012 |
Very interessting. It’s nice to see with a real exemple how much impact have the data statistics and system statistics on the execution plans.
Comment by Guillaume Goulet-Vallières — June 13, 2012 @ 8:36 pm UTC Jun 13,2012 |