Oracle Scratchpad

September 13, 2012

Histogram Generation

Filed under: Histograms,Oracle,Statistics — Jonathan Lewis @ 5:21 pm BST Sep 13,2012

This really could be published in the Errata and Addenda of “Cost Based Oracle – Fundamentals”, but it’s more convenient to publish the whole thing here and just add a link to the Errata pages.

In chapter 7, on page 156, I gave an example of the type of SQL that Oracle runs (in the dbms_stats package) to generate a histogram. A sample of the code, and the plan from the 9.2.0.8 tkprof output, is listed below:

select
	min(minbkt),maxbkt,
	substrb(dump(min(val),16,0,32),1,120) minval,
	substrb(dump(max(val),16,0,32),1,120) maxval,
	sum(rep) sumrep, sum(repsq) sumrepsq,
	max(rep) maxrep, count(*) bktndv,
	sum(case when rep=1 then 1 else 0 end) unqrep
from
	(
	select
		val,min(bkt) minbkt, max(bkt) maxbkt,
		count(val) rep, count(val) *count(val) repsq
	from	(
		select
			/*+
				noparallel(t) noparallel_index(t)
				cursor_sharing_exact dynamic_sampling(0)
				no_monitoring
			*/
			"LN100" val,
			ntile(200) over (order by "LN100") bkt
		from
			sys.ora_temp_1_ds_616 t
		where
			"LN100" is not null
		)
	group by val
	)
group by
	maxbkt
order by
	maxbkt
;

Rows     Row Source Operation
-------  ---------------------------------------------------
    170  SORT GROUP BY
    825   VIEW
    825    SORT GROUP BY
 200028     VIEW
 200028      WINDOW SORT
 200028       TABLE ACCESS FULL ORA_TEMP_1_DS_616

The code in this case references a table called ora_temp_1_DS_616, which is a temporary table dynamically created by the package to hold a sample of just the columns that I was interested in. I had specified a histogram of 200 buckets, and you can see the ntile(200) analytic function in the code that Oracle has used to generate this – and “window sort”  in the execution plan that supports it; it’s a window sort of the entire sampled data set which could be quite large and, as we’ve seen elsewhere, could be very resource-intensive.

In a demonstratin of how important it is (and how nearly impossible it is) to keep revisiting the things you think you know, I’ve recently discovered – while constructing a demonstration of a problem in 11.2.0.3 – that the implementation had changed by  10.2.0.3 (and it may have changed even earlier, of course – but that’s the earliest version of 10g I have acces to). Here’s the generated code for the same data set, with plan, from 10.2.0.3:

select
	substrb(dump(val,16,0,32),1,120) ep, cnt
from
	(
	select
		/*+
			no_parallel(t) no_parallel_index(t) dbms_stats
			cursor_sharing_exact use_weak_name_resl
			dynamic_sampling(0) no_monitoring
		*/
		"LN100" val, count(*) cnt
	from
		sys.ora_temp_1_ds_1470 t
	where
		"LN100" is not null
	group by "LN100"
	)
order by
	val

Rows     Row Source Operation
-------  ---------------------------------------------------
    824  SORT GROUP BY (cr=499 pr=0 pw=0 time=42484 us)
 199451   TABLE ACCESS FULL ORA_TEMP_1_DS_1470 (cr=499 pr=0 pw=0 time=199490 us)

As you can see it doesn’t have any functionality built into it that could possibly provide the 200 buckets that I finally want to see. You’ll also notice that the sample size is slightly different from the 9i sample size – that’s one of the threats of sampled histogram creation, of course, you could get a slightly (but dangerously) different histogram every time you call dbms_stats even if the data hasn’t changed.

Based on the differences in the query, I think we can safely assume that Oracle is using some procedural code to count its way through the results from this query (ordered by column value) to construct the histogram details that we are used to seeing. I think it’s worth noting that this demonstrates the fact that using a simpler SQL statement with some surrounding PL/SQL can sometimes be more efficient than writing a devilishly clever piece of SQL that gets you the answer directly.

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,530 other followers