I think anyone who has read Wolfgang Breitling’s material about the optimizer will be familiar with the concept of Cardinality Feedback and one particular detail that when Oracle gets a cardinality estimate of one for a “driving” table then there’s a good chance that the execution plan will go wrong. (That’s not rule, by the way, just a fairly common observation after things have gone wrong.)
A recent note on OTN reminded me of a particular scenario where this specific problem can occur. It’s not particularly common, but it may hit people who are building data warehouses from multiple different sources. We start with an unlikely looking data set and very simple query:
From time to time I’ve warned people that subquery factoring should be used with a little care if all you’re trying to do is make a query more readable by extracting parts of the SQL into “factored subqueries” (or Common Table Expressions – CTEs – if you want to use the ANSI term for them). In principle, for example, the following two queries should produce the same execution plan:
I’ve previously published a couple of notes (hereand here) about the use of the driving_site() hint with distributed queries. The first note pointed out that the hint was deliberately ignored if you write a local CTAS or INSERT that did a distributed query. I’ve just found another case where the hint is ignored – this time in a simple SELECT statement.
Try running an ordinary distributed query from the SYS account, and then try using the driving_site() hint to make it run at the remote site. When I tried this a few days ago I ended up wasting half an hour translating some SQL from ANSI to Oracle dialect because I thought that the ANSI was making Oracle transform the query in a way that lost the hint – then I discovered that both versions of the code worked correctly if I logged in as a different user.
I was running my queries between two databases using 184.108.40.206 – I won’t guarantee you get the same results on other versions, but it looks like SYS doesn’t honour the driving_site() hint. I can’t think of a robust argument why this should be the case, but if I were forced to do some vague hand-waving I’d probably mumble something about potential security loopholes.
Footnote: I should, of course, have mentioned that there are all sorts of things that behave in unexpected ways if you are logged on as SYS, and that you shouldn’t be logged on as SYS – especially in a production system.
[Further reading on “ignoring hints”]
In the latest Quiz Night, I asked how you could make a query more efficient by changing a two table join into a three table join – with the clue that my third table was a repeat of the first table. Gary Myers, in comment 4, provided the type of answer I was looking for. Sometimes it is more efficient to get a small amount of data from a table on a first pass then go back and get the rest of the data on a second pass – especially if the first pass is an ‘index only’ operation.
Here’s a simple piece of code demonstrating an irritating problem. I’ve created a table, a function-based index, collected stats (without histograms), and then run a query that should use that index – but doesn’t.
Greg Rahn has been writing a short series on “Core Performance Fundamentals of Oracle Data Warehousing”. Here’s his catalogue of the first four or five articles in the series.
Here’s a little script I wrote a few years ago to make a point about using the dbms_stats package. I’ve just re-run it on 10.2.0.3 to see if it still behaves the way it used to – and it does. If you want to be just a little bit baffled, set up a database with an 8KB blocks size, a tablespace that is locally managed, uniform extent size of 1MB, using freelist management, then run the script:
If you run a query using first_rows_N optimisation you could run into a massive performance problem in cases where the optimizer thinks the complete result set is quite large when it is actually very small.
If both conditions are true the optimizer may choose a very resource-intensive execution path “expecting” to stop (or at least pause between fetches) after N rows – hoping to give the impression that it can respond very quickly – but find that the query has to run to completion because the N rows simply don’t exist.
I posted a little holiday quiz – timed to appear just before midnight (GMT) on 24th December – that asked about the number of rows sorted and the memory used for queries like:
rownum <= 10
The number and variety of the responses was gratifying. It’s always interesting to see how many important little details appear as people start to tackle even fairly straight-forward questions like this.
In a recent ‘philosophy’ post I focused on the critical mental image that should be adopted when comparing B-tree and bitmap indexes. I was a little surprised, however, to discover that the idea I proposed needed further explanation. So here’s a note that expands on the original comment.
A brief note on reading execution plans.
| Id | Operation | Name | Time |
| 0 | SELECT STATEMENT | | 00:00:17 |
|* 1 | HASH JOIN | | 00:00:17 |
| 2 | VIEW | VW_SQ_1 | 00:00:01 |
| 3 | HASH GROUP BY | | 00:00:01 |
| 4 | TABLE ACCESS FULL | EMP | 00:00:01 |
| 5 | VIEW | | 00:00:17 |
| 6 | TABLE ACCESS BY INDEX ROWID| EMP | 00:00:17 |
| 7 | INDEX FULL SCAN | E_D | 00:00:01 |
Predicate Information (identified by operation id):
1 - access("DEPT_NO"="OUTER"."DEPT_NO")
There was a question about the notorious dba_extents view on the comp.databases.oracle.server newsgroup a little while ago.
This is a view that has been popularly abused over the years despite warnings such as a note of mine in 2006 and even back as far as 2001 this one from Connor McDonald.
There are many reasons why this is a nasty view – and the number of reasons has grown as Oracle has evolved – but David FitzJarell has highlighted an important point in one of his blog postings that was worth a mention. So here it is.
Some time ago I posted an extract from a short presentation I had given at the UKOUG annual conference about 12 years previously.
When I found that set of slides, I also found the paper copies of another set of slides I had used at another UKOUG SIG event in 1995 where I had been explaining the mechanisms used by the cost based optimizer to decide whether or not to use an index. I thought it might be quite revealing to reproduce those slides to show how much (or how little) things have changed since then.
In a recent article on changes to the calculation of cardinality with frequency histograms I made a comment about writing programs to create fake but realistic and complete frequency histograms to work around the problems of dramatic changes in execution plan that can occur if you let Oracle create the histograms by gathering stats.
As a follow-up to that comment, here’s a little script to demonstrate the method (for a numeric column), and a few points to remember when using this method.
How do you find indexes that might be worth the effort of the rebuild.
One option is to report indexes that take up much more space than they need to – and there are some fairly easy ways to find those indexes. 10g, for example, gave us the create_index_cost procedure in the dbms_space package so, for a single index, we can write a little routine that does something like the following example – running on a 10.2.0.3 database: