Adjusting Histograms

July 4, 2014

Adjusting Histograms

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 8:32 pm BST Jul 4,2014

This is a quick response to a question on an old blog post asking how you can adjust the high value if you’ve already got a height-balanced histogram in place. It’s possible that someone will come up with a tidier method, but this was just a quick sample I created and tested on 11.2.0.4 in a few minutes. (Note – this is specifically for height-balanced histograms, and it’s not appropriate for 12c which has introduced hybrid histograms that will require me to modify my “histogram faking” code a little).

rem
rem	Script:		adjust_histogram.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Jun 2014
rem	Purpose:
rem
rem	Last tested
rem		11.2.0.4
rem	Not tested
rem		12.1.0.1
rem		11.1.0.7
rem		10.2.0.5
rem	Outdated
rem		 9.2.0.8
rem		 8.1.7.4	no WITH subquery
rem
rem	Notes:
rem	Follow-on from a query on my blog about setting the high value
rem	when you have a histogram.  We could do this by hacking, or by
rem	reading the user_tab_histogram values and doing a proper prepare
rem

start setenv
set timing off

execute dbms_random.seed(0)

drop table t1;

begin
	begin		execute immediate 'purge recyclebin';
	exception	when others then null;
	end;

	begin
		dbms_stats.set_system_stats('MBRC',16);
		dbms_stats.set_system_stats('MREADTIM',10);
		dbms_stats.set_system_stats('SREADTIM',5);
		dbms_stats.set_system_stats('CPUSPEED',1000);
	exception
		when others then null;
	end;
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4    -- > comment to avoid WordPress format issue
)
select
	trunc(sysdate,'YYYY') + trunc(dbms_random.normal * 100,1)	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 1e4    -- > comment to avoid WordPress format issue
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 32'
	);

end;
/

spool adjust_histogram.lst

prompt	==================
prompt	Current High Value
prompt	==================

select to_char(max(d1),'dd-Mon-yyyy hh24:mi:ss') from t1;

prompt	==============================
prompt	Initial Histogram distribution
prompt	==============================

select
	endpoint_number,
	to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val,
	endpoint_value,
	lag(endpoint_value,1) over(order by endpoint_number) lagged_epv,
	endpoint_value -
		lag(endpoint_value,1) over(order by endpoint_number)  delta
from	user_tab_histograms
where
	table_name = 'T1'
and	column_name = 'D1'
;

rem
rem	Note - we can't simply overwrite the last srec.novals
rem	because that doesn't adjust the stored high_value.
rem	We have to make a call to prepare_column_values,
rem	which means we have to turn the stored histogram
rem	endpoint values into their equivalent date types.
rem

prompt	==================
prompt	Hacking the values
prompt	==================

declare

	m_distcnt		number;
	m_density		number;
	m_nullcnt		number;
	srec			dbms_stats.statrec;
	m_avgclen		number;

	d_array			dbms_stats.datearray := dbms_stats.datearray();
	ct			number;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'd1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	); 

	ct := 0;
	for r in (
		select	to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val
		from	user_tab_histograms
		where	table_name = 'T1'
		and	column_name = 'D1'
		order by endpoint_number
	) loop

		ct := ct + 1;
		d_array.extend;
		d_array(ct) := r.d_val;
		if ct = 1 then
			srec.bkvals(ct) := 0;
		else
			srec.bkvals(ct) := 1;
		end if;

	end loop;

	d_array(ct) := to_date('30-Jun-2015','dd-mon-yyyy');

	dbms_stats.prepare_column_values(srec, d_array);

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'd1',
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec		=> srec,
		avgclen		=> m_avgclen
	);
end;
/

prompt	============================
prompt	Final Histogram distribution
prompt	============================

select
	endpoint_number,
	to_date(to_char(trunc(endpoint_value)),'J') + mod(endpoint_value,1) d_val,
	endpoint_value,
	lag(endpoint_value,1) over(order by endpoint_number) lagged_epv,
	endpoint_value -
		lag(endpoint_value,1) over(order by endpoint_number)  delta
from	user_tab_histograms
where
	table_name = 'T1'
and	column_name = 'D1'
;

spool off

doc

#

Comments (5)

5 Comments »

Hi Jonathan

First, thanks for doing this.

I ran tests on 10.2.0.5 and it works very well

The only question I have is when running my tests I noticed that there were no digits to the right of the decimal and the decimal was missing when I checked the stats, but that was because there was no time added in your example (d_array(ct) := to_date(’30-Jun-2015′,’dd-mon-yyyy’);). Adding a time did give me the decimal and digits to the right.

So my question is, is the time important to have or is just the date sufficient enough ?

Thanks
Jim

Comment by Jim — July 14, 2014 @ 11:53 pm BST Jul 14,2014 | Reply
- Jim,
  
  The date is sufficient for the technique to work – all it means is that the time component is implicitly midnight. However the exact choice of date may then matter. In this case if the users were allowed to query for (say) 2:30 pm on 30th June 2015 then the optimizer would go into its arithmetic for “out of range” data and in some cases that can have a dramatic effect on the cardinality; but if you’re using 30th June 2015 simply to have a date that’s comfortably ahead of dates the users then the time component doesn’t matter.
  
  Comment by Jonathan Lewis — July 15, 2014 @ 6:54 am BST Jul 15,2014 | Reply
[…] often speak for themselves, hence I found some good resources to demonstrate what really histograms […]

Pingback by Histogram vs Bar charts – which one I should use? | Tabagus - The Web Portal — October 10, 2014 @ 2:19 am BST Oct 10,2014 | Reply
Hi Jonathan,

First of all, thank you for doing this research, which I really appreciate.

I have the case that MREAD That would be OK.
For ex:, In a faster SAN system/EXadata/Flash storage etc, the MREADTIM can be lower than the SREADTIM, Because multiple blocks access is faster in those systems than reading single block. Based on the data you have provided in the previous update the lower MREADTIM is OK.

I am a little concerned, that with MREADTIM < SREADTIM the execution plans are bad, this is the experience from our supplier Avaloq.com
Is it correct that the MREAD / SREAD ratio is important for the generation of good execution plans?
The ratio MREAD / SREAD should be in the range 1.0 .. 3.0 ? Is that so?

Comment by Romeo — July 21, 2021 @ 3:54 pm BST Jul 21,2021 | Reply
- Romeo,
  
  Thanks for the question, sorry for the late reply.
  There are a couple of notes about system stats and mreadtim and sreadtim etc. on the blog which might have given you a better place to ask the question.
  
  A key detail is that when mreadtim < sreadtim Oracle "doesn't believe" the figures, and changes algorithm in a way that depends on the version, so it's quite easy for plans to change from good to very bad when the figures cross that boundary.
  
  There is no correct setting or ratio for the values; ideally you need to get some values in place that reflect what your system can do, which is why Oracle has an "EXADATA" option for gathering stats that basically says: tablescans are very fast using very big reads.
  
  For reasons of stability it's best to get one set of figures in place and then not change them unless your hardware (or typical load on the hardware) changes significantly. For OLTP systems this often means that deleting any gathered system stats and letting the optimizer use the defaults. In your case you might (e.g.) want to set the MBRC to 128, the sreadtim to 1ms and the mreadtim to 1.001 simply to say that a big multiblock read is almost as fast as a single block read.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — August 3, 2021 @ 6:10 pm BST Aug 3,2021 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

July 4, 2014