One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.
Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of poor programming practice) but it does happen. Here’s an example demonstrating the principle:
rem rem Script: pk_histogram.sql rem Author: Jonathan Lewis rem rem Last tested rem 22.214.171.124 rem 126.96.36.199 rem 188.8.131.52 rem 10.2.0.5 rem create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum id from generator v1, generator v2 where rownum <= 1e4 ; insert into t1 select id + 1e6 from t1; insert into t1 select id + 1e7 from t1; alter table t1 add constraint t1_pk primary key(id); select /*+ dynamic_sampling(0) */ * from t1 where id between 12000 and 13000 ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1' ); end; / select column_name, sample_size, num_distinct, num_nulls, density, histogram, num_buckets from user_tab_cols where table_name = 'T1' order by column_name ;
I’ve created a small data set some large gaps in the ranges of values used, then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do it’s own thing with gathering stats (I think all the preferences are at their defaults). This is the result I got:
COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS -------------------- ----------- ------------ ---------- ---------- --------------- ----------- ID 5555 40000 0 .000024752 HEIGHT BALANCED 254
Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.
Note: The results above are from 184.108.40.206, in 220.127.116.11 I got a HYBRID histogram, on 10.2.0.5 the sample size was 40,000