Oracle Scratchpad

September 26, 2016

PK Histogram

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 10:25 am BST Sep 26,2016

One of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t need a histogram on a single column unique/primary key”.

Not only can a histogram be helpful on a column that’s declared to hold unique values, the optimizer may even spot the need automatically. It’s a little unusual (and probably the result of a bad design or of poor programming practices) but it does happen. Here’s an example demonstrating the principle:


rem
rem     Script:         pk_histogram.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.1.0.7
rem             10.2.0.5
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum  id
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into t1 select id + 1e6 from t1;
insert into t1 select id + 1e7 from t1;

alter table t1 add constraint t1_pk primary key(id);

select
        /*+ dynamic_sampling(0) */
        *
from    t1
where
        id between 12000 and 13000
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1'
        );
end;
/


select
        column_name, sample_size,
        num_distinct, num_nulls, density,
        histogram, num_buckets
from   
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

I’ve created a small data with set some large gaps in the ranges of values used then queried the data with a range-based predicate that will return no rows. After that I’ve let Oracle do its own thing with gathering stats (I think all the dbms_stats preferences are at their defaults on this database). This is the result I got from the final query:

COLUMN_NAME          SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS
-------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID                          5555        40000          0 .000024752 HEIGHT BALANCED         254

Oracle will gather a histogram automatically if it can see that the distribution of the data you are querying is skewed; but people tend to think of “skewed” as meaning there is wide variation in the number of rows for a given value and forget that it can also mean a variation in the number of rows within a given size of range. All I’ve done with my example is cue Oracle to the fact that I have big gaps in the range of known values – so it has produced a histogram that let’s it know where the gaps are.

Note: The results above are from 11.2.0.4; in 12.1.0.2 I got a HYBRID histogram; on 10.2.0.5 the sample size was 40,000

2 Comments »

  1. […] of the little myths of Oracle appeared on the Oracle-L list server a few days ago – the one that says: “you don’t […]

    Pingback by Log Buffer #489: A Carnival of the Vanities for DBAs | Official Pythian® Blog – Cloud Data Architect — September 30, 2016 @ 10:18 am BST Sep 30,2016 | Reply

  2. […] Histogram myth (Sept 2016): Histograms are just about “rows per value”, a demo of a useful histogram on a primary key column. […]

    Pingback by Histogram catalogue | Oracle Scratchpad — October 15, 2022 @ 9:38 am BST Oct 15,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.