Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.
We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).
Here’s the starting code that generates the data, and creates histograms on all the columns:
rem rem Script: histogram_hack_2a.sql rem Author: Jonathan Lewis rem Dated: Jul 2018 rem rem Last tested rem 19.3.0.0 rem 12.2.0.1 rem 12.1.0.2 rem 11.2.0.4 rem create table t1 as select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue. union all select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue. ; variable v1 varchar2(128) begin :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)'); dbms_output.put_line(:v1); end; / execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');
In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.
Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:
select endpoint_value from user_tab_histograms where table_name = 'T1' and column_name = :v1 ; select distinct c2, c3, mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_generated from t1 ; ENDPOINT_VALUE -------------- 4794513072 6030031083 2 rows selected. C C ENDPOINT_GENERATED - - ------------------ N Y 4794513072 Y N 6030031083 2 rows selected.
So we have a method of generating the values that Oracle should store in the histogram. Now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:
declare l_distcnt number; l_density number; l_nullcnt number; l_avgclen number; l_srec dbms_stats.statrec; n_array dbms_stats.numarray; begin dbms_stats.get_column_stats ( ownname =>null, tabname =>'t1', colname =>:v1, distcnt =>l_distcnt, density =>l_density, nullcnt =>l_nullcnt, avgclen =>l_avgclen, srec =>l_srec ); l_srec.novals := dbms_stats.numarray(); l_srec.bkvals := dbms_stats.numarray(); for r in ( select mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size from ( select 'Y' c2, 'Y' c3, 1 bucket_size from dual union all select 'N' c2, 'N' c3, 1 from dual union all select 'Y' c2, 'N' c3, 71482 from dual union all select 'N' c2, 'Y' c3, 1994 from dual ) order by hash_value ) loop l_srec.novals.extend; l_srec.novals(l_srec.novals.count) := r.hash_value; l_srec.bkvals.extend; l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size; end loop; n_array := l_srec.novals; l_distcnt := 4; l_srec.epc := 4; -- -- For 11g rpcnts must not be mentioned -- For 12c is must be set to null or you -- will (probably) raise error: -- ORA-06533: Subscript beyond count -- l_srec.rpcnts := null; dbms_stats.prepare_column_values(l_srec, n_array); dbms_stats.set_column_stats( ownname =>null, tabname =>'t1', colname =>:v1, distcnt =>l_distcnt, density =>l_density, nullcnt =>l_nullcnt, avgclen =>l_avgclen, srec =>l_srec ); end;
The outline of the code is simply: get_column_stats(), set up a couple of arrays and simple variables, prepare_column_values(), set_column_stats(). The special detail that I’ve included here is that I’ve used a union all query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). I’ve used a similar of approach in the past running an analytic query against the table data to produce the equivalent of the 12c “Top-Frequency” histogram in much older versions of Oracle.
A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.
After running this procedure, here’s what I get in user_tab_histograms for the column:
select endpoint_value column_value, endpoint_number endpoint_number, endpoint_number - nvl(prev_endpoint,0) frequency from ( select endpoint_number, lag(endpoint_number,1) over( order by endpoint_number ) prev_endpoint, endpoint_value from user_tab_histograms where table_name = 'T1' and column_name = :v1 ) order by endpoint_number ; COLUMN_VALUE ENDPOINT_NUMBER FREQUENCY ------------ --------------- ---------- 167789251 1 1 4794513072 1995 1994 6030031083 73477 71482 8288761534 73478 1 4 rows selected.
It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.
[…] Column group histograms (Aug 2018): Formal coding method […]
Pingback by Column Group Catalog | Oracle Scratchpad — September 27, 2018 @ 5:16 pm BST Sep 27,2018 |
[…] Extended Histogram: faking values into a histogram for a column group – only special because we need to derive the value stored. […]
Pingback by Faking Histograms | Oracle Scratchpad — October 15, 2018 @ 1:37 pm BST Oct 15,2018 |