This is a quick response to a question on an old blog post asking how you can adjust the high value if you’ve already got a height-balanced histogram in place. It’s possible that someone will come up with a tidier method, but this was just a quick sample I created and tested on 11.2.0.4 in a few minutes. (Note – this is specifically for height-balanced histograms, and it’s not appropriate for 12c which has introduced hybrid histograms that will require me to modify my “histogram faking” code a little).
rem rem Script: adjust_histogram.sql rem Author: Jonathan Lewis rem Dated: Jun 2014 rem Purpose: rem rem Last tested rem 11.2.0.4 rem Not tested rem 12.1.0.1 rem 11.1.0.7 rem 10.2.0.5 rem Outdated rem 9.2.0.8 rem 8.1.7.4 no WITH subquery rem rem Notes: rem Follow-on from a query on my blog about setting the high value rem when you have a histogram. We could do this by hacking, or by rem reading the user_tab_histogram values and doing a proper prepare rem start setenv set timing off execute dbms_random.seed(0) drop table t1; begin begin execute immediate 'purge recyclebin'; exception when others then null; end; begin dbms_stats.set_system_stats('MBRC',16); dbms_stats.set_system_stats('MREADTIM',10); dbms_stats.set_system_stats('SREADTIM',5); dbms_stats.set_system_stats('CPUSPEED',1000); exception when others then null; end; end; / create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 -- > comment to avoid WordPress format issue ) select trunc(sysdate,'YYYY') + trunc(dbms_random.normal * 100,1) d1 from generator v1, generator v2 where rownum <= 1e4 -- > comment to avoid WordPress format issue ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 32' ); end; / spool adjust_histogram.lst prompt ================== prompt Current High Value prompt ================== select to_char(max(d1),'dd-Mon-yyyy hh24:mi:ss') from t1; prompt ============================== prompt Initial Histogram distribution prompt ============================== select endpoint_number, to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val, endpoint_value, lag(endpoint_value,1) over(order by endpoint_number) lagged_epv, endpoint_value - lag(endpoint_value,1) over(order by endpoint_number) delta from user_tab_histograms where table_name = 'T1' and column_name = 'D1' ; rem rem Note - we can't simply overwrite the last srec.novals rem because that doesn't adjust the stored high_value. rem We have to make a call to prepare_column_values, rem which means we have to turn the stored histogram rem endpoint values into their equivalent date types. rem prompt ================== prompt Hacking the values prompt ================== declare m_distcnt number; m_density number; m_nullcnt number; srec dbms_stats.statrec; m_avgclen number; d_array dbms_stats.datearray := dbms_stats.datearray(); ct number; begin dbms_stats.get_column_stats( ownname => user, tabname => 't1', colname => 'd1', distcnt => m_distcnt, density => m_density, nullcnt => m_nullcnt, srec => srec, avgclen => m_avgclen ); ct := 0; for r in ( select to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val from user_tab_histograms where table_name = 'T1' and column_name = 'D1' order by endpoint_number ) loop ct := ct + 1; d_array.extend; d_array(ct) := r.d_val; if ct = 1 then srec.bkvals(ct) := 0; else srec.bkvals(ct) := 1; end if; end loop; d_array(ct) := to_date('30-Jun-2015','dd-mon-yyyy'); dbms_stats.prepare_column_values(srec, d_array); dbms_stats.set_column_stats( ownname => user, tabname => 't1', colname => 'd1', distcnt => m_distcnt, density => m_density, nullcnt => m_nullcnt, srec => srec, avgclen => m_avgclen ); end; / prompt ============================ prompt Final Histogram distribution prompt ============================ select endpoint_number, to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val, endpoint_value, lag(endpoint_value,1) over(order by endpoint_number) lagged_epv, endpoint_value - lag(endpoint_value,1) over(order by endpoint_number) delta from user_tab_histograms where table_name = 'T1' and column_name = 'D1' ; spool off doc #
Hi Jonathan
First, thanks for doing this.
I ran tests on 10.2.0.5 and it works very well
The only question I have is when running my tests I noticed that there were no digits to the right of the decimal and the decimal was missing when I checked the stats, but that was because there was no time added in your example (d_array(ct) := to_date(’30-Jun-2015′,’dd-mon-yyyy’);). Adding a time did give me the decimal and digits to the right.
So my question is, is the time important to have or is just the date sufficient enough ?
Thanks
Jim
Comment by Jim — July 14, 2014 @ 11:53 pm BST Jul 14,2014 |
Jim,
The date is sufficient for the technique to work – all it means is that the time component is implicitly midnight. However the exact choice of date may then matter. In this case if the users were allowed to query for (say) 2:30 pm on 30th June 2015 then the optimizer would go into its arithmetic for “out of range” data and in some cases that can have a dramatic effect on the cardinality; but if you’re using 30th June 2015 simply to have a date that’s comfortably ahead of dates the users then the time component doesn’t matter.
Comment by Jonathan Lewis — July 15, 2014 @ 6:54 am BST Jul 15,2014 |
[…] often speak for themselves, hence I found some good resources to demonstrate what really histograms […]
Pingback by Histogram vs Bar charts – which one I should use? | Tabagus - The Web Portal — October 10, 2014 @ 2:19 am BST Oct 10,2014 |
Hi Jonathan,
First of all, thank you for doing this research, which I really appreciate.
I have the case that MREAD That would be OK.
For ex:, In a faster SAN system/EXadata/Flash storage etc, the MREADTIM can be lower than the SREADTIM, Because multiple blocks access is faster in those systems than reading single block. Based on the data you have provided in the previous update the lower MREADTIM is OK.
I am a little concerned, that with MREADTIM < SREADTIM the execution plans are bad, this is the experience from our supplier Avaloq.com
Is it correct that the MREAD / SREAD ratio is important for the generation of good execution plans?
The ratio MREAD / SREAD should be in the range 1.0 .. 3.0 ? Is that so?
Comment by Romeo — July 21, 2021 @ 3:54 pm BST Jul 21,2021 |
Romeo,
Thanks for the question, sorry for the late reply.
There are a couple of notes about system stats and mreadtim and sreadtim etc. on the blog which might have given you a better place to ask the question.
A key detail is that when mreadtim < sreadtim Oracle "doesn't believe" the figures, and changes algorithm in a way that depends on the version, so it's quite easy for plans to change from good to very bad when the figures cross that boundary.
There is no correct setting or ratio for the values; ideally you need to get some values in place that reflect what your system can do, which is why Oracle has an "EXADATA" option for gathering stats that basically says: tablescans are very fast using very big reads.
For reasons of stability it's best to get one set of figures in place and then not change them unless your hardware (or typical load on the hardware) changes significantly. For OLTP systems this often means that deleting any gathered system stats and letting the optimizer use the defaults. In your case you might (e.g.) want to set the MBRC to 128, the sreadtim to 1ms and the mreadtim to 1.001 simply to say that a big multiblock read is almost as fast as a single block read.
Regards
Jonathan Lewis
Comment by Jonathan Lewis — August 3, 2021 @ 6:10 pm BST Aug 3,2021 |