Oracle Scratchpad

August 2, 2018

Extended Histograms – 2

Filed under: extended stats,Histograms,Oracle,Statistics — Jonathan Lewis @ 2:13 pm GMT Aug 2,2018

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Here’s the starting code that generates the data, and creates histograms on all the columns (I’ve run this against 12.1.0.2 and 12.2.0.1 so far):


rem
rem     Script:         histogram_hack_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue.
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue.
;

variable v1 varchar2(128)

begin
        :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)');
        dbms_output.put_line(:v1);
end;
/

execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');

In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.

Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:


select
        endpoint_value
from 
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = :v1
;

select 
        distinct c2, c3, 
        mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_value
from t1
;

ENDPOINT_VALUE
--------------
    4794513072
    6030031083

2 rows selected.


C C ENDPOINT_VALUE
- - --------------
N Y     4794513072
Y N     6030031083

2 rows selected.

So we have a method of generating the values that Oracle should store in the histogram; now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:


declare
        l_distcnt number;
        l_density number;
        l_nullcnt number;
        l_avgclen number;

        l_srec  dbms_stats.statrec;
        n_array dbms_stats.numarray;

begin
        dbms_stats.get_column_stats (
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

        l_srec.novals := dbms_stats.numarray();
        l_srec.bkvals := dbms_stats.numarray();

        for r in (
                select
                        mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size
                from    (
                        select 'Y' c2, 'Y' c3, 1 bucket_size from dual
                        union all
                        select 'N' c2, 'N' c3, 1 from dual
                        union all
                        select 'Y' c2, 'N' c3, 71482 from dual
                        union all
                        select 'N' c2, 'Y' c3, 1994 from dual
                        )
                order by hash_value
        ) loop
                l_srec.novals.extend;
                l_srec.novals(l_srec.novals.count) := r.hash_value;

                l_srec.bkvals.extend;
                l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size;
        end loop;

        n_array := l_srec.novals;

        l_distcnt  := 4;
        l_srec.epc := 4;

--
--      For 11g rpcnts must not be mentioned
--      For 12c is must be set to null or you
--      will (probably) raise error:
--              ORA-06533: Subscript beyond count
--

        l_srec.rpcnts := null;

        dbms_stats.prepare_column_values(l_srec, n_array);

        dbms_stats.set_column_stats(
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

end;

The outline of the code is simply: get_column_stats, set up a couple of arrays and simple variables, prepare_column_values, set_column_stats. The special detail that I’ve included here is that I’ve used a “union all” query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). In the past I’ve used this type of approach but used an analytic query against the table data to produce the equivalent of 12c Top-frequency histogram in much older versions of Oracle.

A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.

After running this procedure, here’s what I get in user_tab_histograms for the column:


select
        endpoint_value                          column_value,
        endpoint_number                         endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                               prev_endpoint,
                endpoint_value
        from
                user_tab_histograms
        where
                table_name  = 'T1'
        and     column_name = :v1
        )
order by endpoint_number
;

COLUMN_VALUE ENDPOINT_NUMBER  FREQUENCY
------------ --------------- ----------
   167789251               1          1
  4794513072            1995       1994
  6030031083           73477      71482
  8288761534           73478          1

4 rows selected.


It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.

2 Comments »

  1. […] Column group histograms (Aug 2018): Formal coding method […]

    Pingback by Column Group Catalog | Oracle Scratchpad — September 27, 2018 @ 5:16 pm GMT Sep 27,2018 | Reply

  2. […] Extended Histogram:  faking values into a histogram for a column group – only special because we need to derive the value stored. […]

    Pingback by Faking Histograms | Oracle Scratchpad — October 15, 2018 @ 1:37 pm GMT Oct 15,2018 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.