It has taken much longer than I anticipated to get around to writing part 3 of this mini-series on what Oracle has done about histograms in 12c.
In part 1 I gave a thumbnail sketch of the three types of histogram available in 12c
In part 2 I described in some detail the improvements in performance and accuracy for the frequency and top-frequency histograms
In part 3 of this mini-series I’ll be describing how the implementation of the “hybrid” histogram that Oracle produces if the “approximate NDV” mechanism has been enabled and you’ve left the estimate_percent to auto_sample_size. There is little difference between the work needed to create a hybrid histogram and the work needed to generate the old “height-balanced” histogram, but the degree of information captured by the hybrid is much greater than that of the height-balanced.
A three-part series on histograms up to and including 11g – written for Allthings Oracle. With Chinese transalation:
And a three part series on histograms in 12c on this blog
A comment I’ve made many times in the past about distributed queries is that Oracle doesn’t try to retrieve histogram information from remote databases when optimizing a query. Checking back through previous posts, though, I don’t think I’ve ever mentioned it on the blog – so here’s an example demonstrating the point.
I posted a couple of examples some time ago of oddities and boundary cases for frequency histograms on character columns. Part of the process of playing around with the 12c Beta was to re-run such cases to see if newer code made any difference. Looking at these examples, one was fixed (or improved, at least) the other wasn’t, so I’ve added a footnote to each and produced this little note to highlight the changes:
- Effects of long characters strings in frequency histograms is better: improved.
- When stats gathering misses the values that would cause endpoint_actual_values to be collected: no change.
In part 2 of this mini-series I’ll be describing the new mechanism for the simple frequency histogram and the logic of the Top-N frequency histogram. In part 3 I’ll be looking at the new hybrid histogram.
You need to know about the approximate NDV before you start examining the 12c implementation of the frequency and top-frequency histograms – but there’s a thumbnail sketch at the end of the posting if you need a quick reminder.
There are a few enhancements in 12c that might make a big difference to performance for a small investment in effort. One of the important enhancements comes from changes in histograms – which improve speed of collection with accuracy of results. The changes are so significant that I chose the topic as my presentation at OpenWorld last year.
This really could be published in the Errata and Addenda of “Cost Based Oracle – Fundamentals”, but it’s more convenient to publish the whole thing here and just add a link to the Errata pages.
In chapter 7, on page 156, I gave an example of the type of SQL that Oracle runs (in the dbms_stats package) to generate a histogram. A sample of the code, and the plan from the 184.108.40.206 tkprof output, is listed below:
A recent comment on a note I wrote some time ago about faking histograms asked about the calculations of selectivity in the latest versions of Oracle. As I read the question, I realised that I had originally supplied a formula for calculating cardinality, rather than selectivity, so I thought I’d supply a proper example.
We’ll start with a script to create some data and stats – and I’m going to start with a script I wrote in Jan 2001 (which is why it happens to use the analyze command rather than dbms_stats.gather_table_stats, even though this example comes from an instance of 220.127.116.11).
In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. I’ve already published a note about one such anomaly that can occur with fairly long character strings, this note describes another anomaly that could appear in less extreme cases. Again, we start by constructing a data set.
In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. This note describes one such anomaly. We start with a slightly curious data set:
In an earlier note on interpreting the content of frequency histograms I made a throwaway comment about the extra complexity of interpreting frequency histograms on character-based columns. This note starts to examine some of the complications.
The driving problem behind character columns is that they can get quite large – up to 4,000 bytes – so the content of an “accurate histogram” could become quite large, and Oracle seems to have taken a strategic decision (at some point in history) to minimise this storage. As a result we can see an algorithm that works roughly as follows:
Here’s the output I get from querying dba_tab_histograms for a column that has been given a frequency histogram by a call to dbms_stats.gather_table_stats().
I find it convenient occasionally to “translate” a frequency histogram into a report of the underlying data (sometimes to see how closely the histogram matches the real data). To demonstrate the type of query I use I’ve created a data set with a small number of distinct values and generated a frequency histogram on the data set. This is what the data and histogram look like:
I got an email a few days ago asking me how to create a “height-balanced” histogram using the dbms_stats.set_column_stats() procedure.
I posted an example of creating a frequency histogram a few months ago – and it doesn’t take much to turn this into an example for the height-balanced type; the key change apppears in three critical lines.
In a recent article on changes to the calculation of cardinality with frequency histograms I made a comment about writing programs to create fake but realistic and complete frequency histograms to work around the problems of dramatic changes in execution plan that can occur if you let Oracle create the histograms by gathering stats.
As a follow-up to that comment, here’s a little script to demonstrate the method (for a numeric column), and a few points to remember when using this method.