One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.
Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of a bad design or of poor programming practices) but it does happen. Here’s an example demonstrating the principle:
rem rem Script: pk_histogram.sql rem Author: Jonathan Lewis rem rem Last tested rem 18.104.22.168 rem 22.214.171.124 rem 126.96.36.199 rem 10.2.0.5 rem create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum id from generator v1, generator v2 where rownum <= 1e4 ; insert into t1 select id + 1e6 from t1; insert into t1 select id + 1e7 from t1; alter table t1 add constraint t1_pk primary key(id); select /*+ dynamic_sampling(0) */ * from t1 where id between 12000 and 13000 ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1' ); end; / select column_name, sample_size, num_distinct, num_nulls, density, histogram, num_buckets from user_tab_cols where table_name = 'T1' order by column_name ;
I’ve created a small data with set some large gaps in the ranges of values used then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do its own thing with gathering stats (I think all the dbms_stats preferences are at their defaults on this database). This is the result I got from the final query:
COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS -------------------- ----------- ------------ ---------- ---------- --------------- ----------- ID 5555 40000 0 .000024752 HEIGHT BALANCED 254
Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.
Note: The results above are from 188.8.131.52; in 184.108.40.206 I got a HYBRID histogram; on 10.2.0.5 the sample size was 40,000