The UKOUG conference is over for another year – but it has left me with plenty to do and lots of things to investigate. Here’s just one little point that I picked up during one of the 10 minute “Oak Talks” that members of the Oak Table Network were doing in the lunch breaks.
There is a fairly well-known strategy for generating a list of numbers by using a “select from dual … connect by …” query, but I hadn’t realised that there were two ways of using it. The code I’ve usually used is this:
rownum <= 4000
A question about partitioning came up on OTN a few days ago – but the really interesting part of the thread was about the choice of indexes, and how the choice of partitioning, combined with the data patterns, could make a big difference to the choice of indexes. I think the comments I made about indexes are worth seeing, so I’ve linked to the thread.
Here’s a video of the presentation that I did as the keynote for the Turkish Oracle Users’ Group conference earlier on this year. Recording courtesy of the people at Formspider.
The topic is about tuning a two-table join, and the video is about an hour long (so there’s a lot more that could be said on the topic, and one day I might do a half day “masterclass” on it).
A question came up on Oracle-L recently about the difference in work done by the following two queries:
SELECT /*+ RULE */
TABLE(CAST(:B1 AS DOMAIN_LIST)) DL
DOM_NAME = DL.COLUMN_VALUE
DOM_NAME IN (
FROM TABLE(CAST(:B1 AS DOMAIN_LIST))
Before saying anything else, I should point out that these two queries are NOT logically equivalent unless you can guarantee that the table() operator returns a unique set of values – and Oracle doesn’t allow uniqueness to be enforced on collections.
After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.
A little while ago I published a note explaining how it was possible to find queries which ran faster if you manually de-coupled the index and table accesses. Here’s a further example that came up in discussion on a client site recently. The query looks something like this (at least in concept, although it was a little more complex, with some messy bits around the edges):
Browsing through some postings on Tony Hasler’s blog a little while ago I found this response to a note he had posted on some anomalies (i.e. bugs) in the costing of the “(min/max)” index scans:
My current understanding is it is not valid to try to compare costs across different queries (even if you just alter it by adding a hint). In general a better plan will have a lower cost but you cannot rely on this metric. The metric is really for the CBO to choose between alternative plans for this specific query, not to compare plans generated for different queries.
In a recent note I wrote about index joins I made a passing comment about limitations in the optimizer’s available strategies that might make you choose to write your code to emulate an index join through explicit SQL references.
Here are two SQL similar SQL statements (with execution plans) that demonstrate the initial problem – the first is just a restatement of the basic example I supplied in the first article:
Despite the title and content, this article is more about thinking and testing than it is about list partitions as such. It’s just based on a little thought experiment with list partitioning.
I have a list-partitioned table, with no indexes, defined as follows:
I’ve spent the last week in Spain – sightseeing, rather than working – with a minimum amount of access to the Internet.
Inevitably I now have to work hard to catch up with my email. As a moment of light relief in an otherwise irritating chore I thought I’d respond to an emailed request for help. (Regular readers of the blog will know that I don’t usually respond to private email requests for solutions, but sometimes someone gets lucky.)
The question was basically this: why do I get different execution plans for the following two statements:
In an earlier post on frequency histograms I described how Oracle creates an approximate histogram when dealing with character columns, and warned you that the strategy could lead to a couple of anomalies if you were unlucky. I’ve already published a note about one such anomaly that can occur with fairly long character strings, this note describes another anomaly that could appear in less extreme cases. Again, we start by constructing a data set.
I’ve often been heard to warn people of the accidents that can happen when they forget about the traps that appear when you start allowing columns to be NULL – but sometimes NULLs are good, especially when it helps Oracle understand where the important (e.g. not null) data might be.
An interesting example of this came up on OTN a few months ago where someone was testing the effects of changing a YES/NO column into a YES/NULL column (which is a nice idea because it allows you to create a very small index on the YESes, and avoid creating a histogram to tell the optimizer that the number of YESes is small).
They were a little puzzled, though, about why their tests showed Oracle using an index to find data in the YES/NO case, but not using the index in the YES/NULL case. I supplied a short explanation on the thread, and was planning to post a description on the blog, but someone on the thread supplied a link to AskTom where Tom Kyte had already answered the question, so I’m just going to leave you with a link to his explanation.
Here’s an example of “creative SQL” that I wrote in response to a question on OTN about combining data from two indexes to optimise access to a table. It demonstrates the principle that you can treat an index as a special case of a table – allowing you to make a query go faster by referencing the same table more times.
Unfortunately you shouldn’t use this particular example in a production system because it relies on the data appearing in the right order without having an “order by” clause. This type of thing makes me really keen to have a hint that says something like: /*+ qb_name(my_driver) assume_ordered(@my_driver) */ so that you could tell the optimizer that it can assume that the rowset from a given query block will appear in the order of the final “order by” clause as it does, for example, with sorted hash clusters.
There’s a thread on OTN that talks about a particular deletion job taking increasing amounts of time each time it is run.
It looks like an example where some thought needs to go into index maintenance and I’ve contributed a few comments to the thread – so this is a lazy link so that I don’t have to repeat myself on the blog.
I’ve written before about the effects of subquery factoring (common table expressions – or CTEs) on the optimizer, and the way that the optimizer can “lose” some strategies when you start factoring out subquery expressions. Here’s another example I came across quite recently. It involved a join of about 15 tables so I’ve only extracted a few lines from the SQL and resulting execution plans.
We start with the original query, which had factored out an aggregate subquery then used it in place of an inline view: