Oracle Scratchpad

September 26, 2016

PK Histogram

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 10:25 am BST Sep 26,2016

One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.

Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of a bad design or of poor programming practices) but it does happen. Here’s an example demonstrating the principle:


rem
rem     Script:         pk_histogram.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum  id
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into t1 select id + 1e6 from t1;
insert into t1 select id + 1e7 from t1;

alter table t1 add constraint t1_pk primary key(id);

select
        /*+ dynamic_sampling(0) */
        *
from    t1
where
        id between 12000 and 13000
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1'
        );
end;
/


select
        column_name, sample_size,
        num_distinct, num_nulls, density,
        histogram, num_buckets
from   
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

I’ve created a small data with set some large gaps in the ranges of values used then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do its own thing with gathering stats (I think all the dbms_stats preferences are at their defaults on this database). This is the result I got from the final query:

COLUMN_NAME          SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS
-------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID                          5555        40000          0 .000024752 HEIGHT BALANCED         254

Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.

Note: The results above are from 11.2.0.4; in 12.1.0.2 I got a HYBRID histogram; on 10.2.0.5 the sample size was 40,000

1 Comment »

  1. […] of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t […]

    Pingback by Log Buffer #489: A Carnival of the Vanities for DBAs | Official Pythian® Blog – Cloud Data Architect — September 30, 2016 @ 10:18 am BST Sep 30,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.