Occasionally I come across complaints that dbms_stats is not obeying the estimate_percent when sampling data and is therefore taking more time than it “should” when gathering stats. The complaint, when I have seen it, always seems to be about the sample size Oracle chose for indexes.
There is a simple but (I believe) undocumented reason for this: because indexes are designed to collate similar data values they are capable of accentuating any skew in the data distribution, which means a sample taken from a small number of leaf blocks can be highly misleading as a guide to the whole index – so Oracle aims for a minimum sample size for gathering index stats.
I’ve found remnants of a note I wrote on comp.databases.oracle.server in December 2004 which claims that this limit (as of Oracle 9.2) was 919 leaf blocks – and I have a faint memory of discovering this figure in an official Metalink (MOS) note. I can’t find the note any more, but it’s easy enough to set up a test to see if the requirement still exists and if the limit is still the same. Here’s a test I ran recently on 184.108.40.206 using an 8KB block size:
A recent post on Oracle-l complained about an oddity when deleting through a function-based index.
I have a function based index but the CBO is not using it. The DML that I expect to have a plan with index range scan is doing a FTS. Its a simple DML that deletes 1000 rows at a time in a loop and is based on the column on which the FBI is created.
Although execution plans are mentioned, we don’t get to see the statement or the plan – and it’s always possible that there will be some clue in the (full) plan that tells us something about the code that the OP has forgotten to mention. However, function-based indexes have a little history of not doing quite what you expect, so I thought I’d take a quick look at the problem, starting with the simplest possible step – do function-based indexes and “normal” b-tree indexes behave differently on a delete. Here’s the data set I created for my test:
I’ve commented in the past about the strange stories you can find on the internet about how Oracle works and how sometimes, no matter how daft those stories seem, there might be something behind them. Here’s one such remark I came across a little while ago – published in two or three places this year:
“An index that enforces referential integrity cannot be rebuilt online.”
There are a couple of problems with this statement – first, of course, indexes don’t enforce referential integrity, though they may help to enforce uniqueness, and the so-called “foreign key” index may avoid a locking issue related to referential integrity: that’s splitting hairs a little bit, though, and we can probably guess what the author means by “indexes enforcing referential integrity”. (An example demonstrating the problem would have been useful, though – it would have saved me from writing this note, and it might save other people from jumping to the wrong conclusion and taking unsuitable action as a consequence.)
So here’s a simple test (run under 220.127.116.11):
The Enkitec Extreme Exadata Expo (E4) event is over, but I still have plenty to say about the technology. The event was a great success, with plenty of interesting speakers and presentations. I was particularly keen to hear Frits Hoogland’s comments on Exadata and OLTP, Richard Foote on Indexes, and Maria Colgan’s comments on how Oracle is making changes to the optimizer to understand Exadata a little better.
All three presentations were interesting – but Maria’s was possiby the most important (and entertaining). In particular she told us about two patches for 18.104.22.168, one current and one that is yet to be released (unfortunately I forgot to take note of the patch numbers – ed: but they’ve been supplied by readers’ comment below).
When I arrived in Edinburgh for the UKOUG Scotland conference a little while ago Thomas Presslie, one of the organisers and chairman of the committee, asked me if I’d sign up on the “unconference” timetable to give a ten-minute talk on something. So I decided to use Hybrid Columnar Compression to make a general point about choosing and testing features. For those of you who missed this excellent conference, here’s a brief note of what I said.
For those not familiar with Richard Foote’s extensive blog about indexes (and if you’re not you should be) – the title of this note is a blatant hi-jacking of his preferred naming mechanism.
It’s just a short note to remind myself (and my readers) that anything you know about Oracle, and anything published on the Internet – even by Oracle Corp. and its employees – is subject to change without notice (and sometimes without being noticed). I came across one such change today while reading the Expert Oracle Exadata book by Kerry Osborne, Randy Johnson and Tanel Poder. It was just a little throwaway comment on page 429 to the effect that:
In NOARCHIVELOG mode all bulk operations (such as INSERT, APPEND, index REBUILD and ALTER TABLE MOVE) are automatically nologging.
I was in a discussion recently about how to estimate the size of a bitmap index before you build it, and why it’s much harder to do this for bitmap indexes than it is for B-tree indexes. Here’s what I wrote in “Practical Oracle 8i”:
There are many little bits and pieces lurking in the Oracle code set that would be very useful if only you had had time to notice them. Here’s one that seems to be virtually unknown, yet does a wonderful job of eliminating calls to decode().
The nvl2() function takes three parameters, returning the third if the first is null and returning the second if the first is not null. This is convenient for all sorts of example where you might otherwise use an expression involving case or decode(), but most particularly it’s a nice little option if you want to create a function-based index that indexes only those rows where a column is null.
If you drop a unique or primary key constraint the index that supports it may be dropped at the same time – but this doesn’t always happen. Someone asked me recently if it was possible to tell whether or not an index would be dropped without having to find out the hard way by dropping the constraint. The answer is yes – after all, Oracle has to make a decision somehow, so if we can find out how it makes the decision we can predict the decision.
So here’s my best theory so far – along with the observations that led to it. First, run a trace while dropping a primary key constraint and see if this gives you any clues; on an instance running 10gR2 I noticed the following statement appearing in the trace file immediately after the delete from cdef$ (constraint definitions).
Here’s a model of a little problem I came across recently. It’s something I wrote about many years ago, and I thought I’d seen a note on Metalink (probably Note 73167.1 – see comments 2 and 4 below) explaining that the issue had been addressed; but the problem is still there, even in 22.214.171.124.
We start with a little data set (and it’s my standard setup of 8KB blocks, LMTs, 1MB uniform extents, and no ASSM):
You might think from the title that this little note is going to be about the index hash join – you would be half right, it’s also about how the optimizer seems to make a complete hash of dealing with index hash joins.
Let’s set up a simple data set and a couple of indexes so that we can take a closer look:
In my previous post, I made the comment:
In general, if you have a three-column index that starts with the same columns in the same order as the two-column index then the three-column index will be bigger and have a higher clustering_factor.
So what scenarios can you come up with that fall outside the general case ?
Alternatively, what argument could you put forward that justifies the general claim ?
I’ll try to respond to comments on this post a little more quickly than the last one, but I still have quite a lot of other comments to catch up on.
Browsing a little history recently I came across a note I’d written about the new-style index hint. In that note I claimed that:
… the index has to start with the columns (product_group, id) in that order – with preference given to an exact match, otherwise using the lowest cost index that starts the right way.
On reading this statement I suddenly realised that I hadn’t actually proved (to myself, even) that if I had the indexes (product_group, id) and (product_group, id, other_col) then a two-column hint forced Oracle to use the two column index in all (legal) circumstances.
So, tonight’s quiz – are there any edge cases, and what easy ways can you think of to prove (or disprove) the claim for the general case.
Footnote: you don’t have to demonstrate the method, just a brief outline of the idea will be sufficient.
Here’s a follow-up to a post I did some time ago about estimating the size of an index before you create it. The note describes dbms_stats.create_index_cost() procedure, and how it depends on the results of a call to explain plan. A recent question on the OTN database forum highlighted a bug in explain plan, however, which I can demonstrate very easily. I’ll start with a small amount of data to demonstrate the basic content that is used to calculate the index cost.
Here’s a funny little optimizer bug – though one that seems to have been fixed by at least 10.2.0.3. It showed up earlier on today in a thread on the OTN database forum. We’ll start (in 126.96.36.199) with a little table and two indexes – one normal, the other descending.