In an earlier note on interpreting the content of frequency histograms I made a throwaway comment about the extra complexity of interpreting frequency histograms on character-based columns. This note starts to examine some of the complications.
The driving problem behind character columns is that they can get quite large – up to 4,000 bytes – so the content of an “accurate histogram” could become quite large, and Oracle seems to have taken a strategic decision (at some point in history) to minimise this storage. As a result we can see an algorithm that works roughly as follows:
Here’s the output I get from querying dba_tab_histograms for a column that has been given a frequency histogram by a call to dbms_stats.gather_table_stats().
I find it convenient occasionally to “translate” a frequency histogram into a report of the underlying data (sometimes to see how closely the histogram matches the real data). To demonstrate the type of query I use I’ve created a data set with a small number of distinct values and generated a frequency histogram on the data set. This is what the data and histogram look like:
In case you don’t follow the link to Martin Widlake’s blog (see right) very often, he’s done a couple of recent posts on dba_tab_modifications that are worth reading.
(And I’ve just discovered the ‘gutter=”false”;’ option for the ‘sourcecode’ tag in one of the comments on Martin’s blog – and that’s also a helpful feature.)
If you’ve run the scripts from Analyze This, I hope you found that the query gave you two different execution plans. This is my output from the test (with a little cosmetic tidying):
Here’s a little script I wrote a few years ago to make a point about using the dbms_stats package. I’ve just re-run it on 10.2.0.3 to see if it still behaves the way it used to – and it does. If you want to be just a little bit baffled, set up a database with an 8KB blocks size, a tablespace that is locally managed, uniform extent size of 1MB, using freelist management, then run the script:
I got an email a few days ago asking me how to create a “height-balanced” histogram using the dbms_stats.set_column_stats() procedure.
I posted an example of creating a frequency histogram a few months ago – and it doesn’t take much to turn this into an example for the height-balanced type; the key change apppears in three critical lines.
It’s possible to spend ages talking about the best ways of collecting, or creating, statistics on partitioned tables.
The possible strategies for maintaining partitioned tables, (exchange partition, split partition, drop partition etc.) the types of partitioning available, and the way that the optimizer plays with the stats as you do so, have kept changing over the years, and I’ve got a large set of examples designed to test what happens to the stats as you do things to the table – but it’s impossible to keep it up to date.
Doug Burns is writing a series of articles about the trials, tribulations, and successes about partitioned tables and statistics. The series was well worth reading and will give you an insight into the problems you may have to address, so I’ve produced a catalogue to make it easy to visit the individual chapters in order. Make sure you also read the comments and related links.
- Part 1 – In which we see a simple example and do a default stats collection
- Part 2 – In which we consider Global Stats
- Part 3 – In which subpartitions and aggregation cause problems
- Part 4 – In which our hero fights his way through stats aggregation woes
- Part 5 – In which we encounter a partition exchange
- Part 6a – In which we start to use dbms_stats.copy_table_stats()
- Part 6b – In which we see how reputable individuals handle their mistakes
- Part 6c – In which we hear about 10.2.0.5 and lots of bugs
- Part 6d – In which we revisit earlier errors and discuss the benefit of discussion
- Part 6e – In which we revisit earlier problem again and talk about a bug.
- Part 7 – Not in the original series, but an interesting (slow) experience in 11g
I thought I’d collate a few other items on partition stats and optimizer behaviour – mainly from Randolf Geist’s blog:
And one from Kerry Osbourne – which lists a new granularity option, and a patch for 10.2.0.4
- Feb 2009: Maintaining statistics on a large partitioned table. (See also Metalink Doc ID: 6526370.8)
A couple (as pdf files) from David Kurtz, with a particular view to optimising Peoplesoft.
And an investigation into an oddity with the optimizer when using partitioned indexes
- Feb 2011: Jokes of the CBO with local indexes (10.2.0.4, 18.104.22.168)
In a blog that’s now three years old I pointed out the issue of the automatic stats collection job and the way it could affect performance by changing object-level statistics behind your back. There’s another reason it could have an annoying side effect, though – it actually takes time to collect statistics.
The automatic job is, by default, allowed to run from 10:00 pm to 6:00 am on weekdays and for even longer over the weekend – and the time it needs to complete can vary dramatically from day to day. So it’s nice to have a little script to cross-check when it started and how long it ran. All you have to do is query the view dba_scheduler_job_run_details:
A few days ago I did a presentation on SQL Server. This probably sounds a little strange given my status as an Oracle specialist – but the nice people at Microsoft asked me if I would contribute to one of their seminars so I downloaded and installed the 180 day free trial copy of the Enterprise version, then downloaded the “Books Online” manuals and started to play.
It was an interesting experience – and I think the audience (and organisers) found my presentation useful. The title was “What the Enterprise needs in an RDBMS” – and that’s something I do know about – and the presentation was about whether or not you could find everything you needed in SQL Server 2008, where you’d have to look in the manuals, and supplementary questions you’d have to ask.
In a recent posting on OTN, someone was having trouble copying stats from one index to another using the import_index_stats and export_index_stats procedures from package dbms_stats modifying the contents of their “stat table” between the export and import.
There was a question about the notorious dba_extents view on the comp.databases.oracle.server newsgroup a little while ago.
This is a view that has been popularly abused over the years despite warnings such as a note of mine in 2006 and even back as far as 2001 this one from Connor McDonald.
There are many reasons why this is a nasty view – and the number of reasons has grown as Oracle has evolved – but David FitzJarell has highlighted an important point in one of his blog postings that was worth a mention. So here it is.
In a recent article on changes to the calculation of cardinality with frequency histograms I made a comment about writing programs to create fake but realistic and complete frequency histograms to work around the problems of dramatic changes in execution plan that can occur if you let Oracle create the histograms by gathering stats.
As a follow-up to that comment, here’s a little script to demonstrate the method (for a numeric column), and a few points to remember when using this method.
One of my faviourite presentations in Collaborate 09 was a quick tip (30 minutes) on Tuning by Cardinality Feedback – i.e. comparing the optimizer’s predictions of cardinality with the actual rowcounts returned when you run the query.
The strategy is one that I first saw demonstrated in a presentation by Wolfgang Breitling a few years ago at one of the Hotsos seminars – but this latest example demonstrated an important variation on the theme in a short, precise, lesson.
There are some ideas about Oracle that are so fundamental that they should be lodged in your brain and floating behind your eyes whenever you want to investigate SQL performance problems. Here’s one of them:
Histograms and bind variables exist for diametrically opposed reasons – they won’t work well together without help.
You use bind variables because you want everyone to share the same child cursor for an SQL statement because that statement is going to be used very frequently, everyone is going to do the same (small) amount of work with it, the same execution plan should be ideal for everyone, and you don’t want to re-optimise it every time you use it because the overheads for optimisation would probably be greater than the resources need to run the statement. Typically we want to make great (but not exclusive) use of bind variables in OLTP systems.
You create histograms because statements that are virtually identical do enormously different amounts of work, need different execution plans, and the work done in optimising is tiny compared to the work done by the statements, and getting the wrong execution plan would lead to a huge waste of resources. Typically you need to generate histograms in data warehouse or decision support systems where the queries can be brutal and expensive.
Spot the contradiction – one technology is supposed to give you one plan that everyone uses; the other technology is supposed to give each individual the plan that’s best for them.
Remember that and you will remember to be cautious about sticking histograms into OLTP systems, and won’t be tempted to turn absolutely every literal into a bind variable.
Footnote: It seems that this article should have been called “Philosophy – 2” because I’ve just rediscovered a short note called “Philosophy”
[The Philosophy Series]