This really could be published in the Errata and Addenda of “Cost Based Oracle – Fundamentals”, but it’s more convenient to publish the whole thing here and just add a link to the Errata pages.
In chapter 7, on page 156, I gave an example of the type of SQL that Oracle runs (in the dbms_stats package) to generate a histogram. A sample of the code, and the plan from the 126.96.36.199 tkprof output, is listed below:
select min(minbkt),maxbkt, substrb(dump(min(val),16,0,32),1,120) minval, substrb(dump(max(val),16,0,32),1,120) maxval, sum(rep) sumrep, sum(repsq) sumrepsq, max(rep) maxrep, count(*) bktndv, sum(case when rep=1 then 1 else 0 end) unqrep from ( select val,min(bkt) minbkt, max(bkt) maxbkt, count(val) rep, count(val) *count(val) repsq from ( select /*+ noparallel(t) noparallel_index(t) cursor_sharing_exact dynamic_sampling(0) no_monitoring */ "LN100" val, ntile(200) over (order by "LN100") bkt from sys.ora_temp_1_ds_616 t where "LN100" is not null ) group by val ) group by maxbkt order by maxbkt ; Rows Row Source Operation ------- --------------------------------------------------- 170 SORT GROUP BY 825 VIEW 825 SORT GROUP BY 200028 VIEW 200028 WINDOW SORT 200028 TABLE ACCESS FULL ORA_TEMP_1_DS_616
The code in this case references a table called ora_temp_1_DS_616, which is a temporary table dynamically created by the package to hold a sample of just the columns that I was interested in. I had specified a histogram of 200 buckets, and you can see the ntile(200) analytic function in the code that Oracle has used to generate this – and “window sort” in the execution plan that supports it; it’s a window sort of the entire sampled data set which could be quite large and, as we’ve seen elsewhere, could be very resource-intensive.
In a demonstratin of how important it is (and how nearly impossible it is) to keep revisiting the things you think you know, I’ve recently discovered – while constructing a demonstration of a problem in 188.8.131.52 – that the implementation had changed by 10.2.0.3 (and it may have changed even earlier, of course – but that’s the earliest version of 10g I have acces to). Here’s the generated code for the same data set, with plan, from 10.2.0.3:
select substrb(dump(val,16,0,32),1,120) ep, cnt from ( select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */ "LN100" val, count(*) cnt from sys.ora_temp_1_ds_1470 t where "LN100" is not null group by "LN100" ) order by val Rows Row Source Operation ------- --------------------------------------------------- 824 SORT GROUP BY (cr=499 pr=0 pw=0 time=42484 us) 199451 TABLE ACCESS FULL ORA_TEMP_1_DS_1470 (cr=499 pr=0 pw=0 time=199490 us)
As you can see it doesn’t have any functionality built into it that could possibly provide the 200 buckets that I finally want to see. You’ll also notice that the sample size is slightly different from the 9i sample size – that’s one of the threats of sampled histogram creation, of course, you could get a slightly (but dangerously) different histogram every time you call dbms_stats even if the data hasn’t changed.
Based on the differences in the query, I think we can safely assume that Oracle is using some procedural code to count its way through the results from this query (ordered by column value) to construct the histogram details that we are used to seeing. I think it’s worth noting that this demonstrates the fact that using a simpler SQL statement with some surrounding PL/SQL can sometimes be more efficient than writing a devilishly clever piece of SQL that gets you the answer directly.