I’ve written notes about the different join mechanisms in the past – but such things are always worth revisiting, so here’s an accumulated bundle of comments about hash joins.
A hash join takes two inputs that (in most of the Oracle literature) are referred to as the “build table” and the “probe table”. These rowsources may be extracts from real tables or indexes, or might be result sets from previous joins. Oracle uses the “build table” to build a hash table in memory, consuming and using the rowsource in a single call; it then consumes the “probe table” one row at a time, probing the in-memory hash table to find a match.
Access to the hash table is made efficient by use of a hashing function applied to the join columns – rows with the same value on the join column end up hashing to the same place in the hash table. It is possible for different input values to produce the same hash value (a hash collision) so Oracle still has to check the actual values once it has identified “probable” joins in the hash table. Because the comparison is based on a hashing mechanism, hash joins can only be used for join predicates that are equality predicates.
There are a couple of posts on the blog describing problems with updateable join views or, to be more precise, join views which were key-preserved but which the optimizer did not recognize as key-preserved. Both scenarios are addressed in 12c:
I’ve written a few notes about anomalies in subquery factoring (with subquery) in the past, principally making a fuss about the fact that moving an inline view into a “with subquery” can cause a plan to change even when the internal code moves the subquery back in line. With the arrival of 12c one of my first sets of tests was to rerun all the examples to see how many of them had been addressed. I hadn’t written about as many examples as I had thought, and some of them had been fixed before 12c, but here are few references to a couple of outstanding items that I thought worth a mention:
[Further reading on “subquery factoring”]
There have been a couple of nice posts about the “Top N” (or First N / Next N) syntax that has appeared in 12c, here and here, for example. I particularly like the first set of examples because they include some execution plans that give you a good idea of what’s going on under the covers. “Under the covers” is important, because if you don’t actually have a large data set to test on you might not realise what impact a “Top N” query might have on a production data set.
An interesting little problem appeared on the Oracle-L mailing list earlier on this week – a query ran fairly quickly when statistics hadn’t been collected on the tables, but then ran rather slowly after stats collection even though the plan hadn’t changed, and the tkprof results were there to prove the point. Here are the two outputs (edited slightly for width – the original showed three sets of row stats, the 1st, avg and max, but since the query had only been run once the three columns showed the same results in each case):
Possibly an item to file under “philosophy”, but a question came up in the seminar I was presenting today that prompted me to blog (very briefly) about why I manage to be so good at avoiding errors and inventing workarounds to problems. You probably know that you may see an execution plan change when you add a foreign key constraint to a table – but today someone in the class asked me if this would still work if the constraint were to be disabled. The immediate response that sprang to my mind was “surely not” – but the second thought was that I didn’t really know the answer and would have to check; and the third thought was that maybe it wouldn’t if disabled, but what about novalidate; and the fourth thought was whether the setting for query_rewrite_integrity would make a difference; and the fifth thought was to wonder if there were other conditions that mattered.
So hey-ho for the weekend, when I have to set up a test case for a query that changes plan when I add a foreign key constraint – and then try (at least) eight different combinations of circumstances to check what it really takes to make the change happen.
You might have expected the following query ought to run reasonably efficiently, after all it seems to be targeted very accurately at precisely the few rows of information I’m interested in:
owner = 'TEST_USER'
and table_name = 'TEST_COMP'
and subpartition_name = 'P_MAX_D'
I have a small collection of postings where I’ve described anomalies or limitations in subquery factoring (the “with subquery”, or Common Table Expression (CTE) to give it the official ANSI name). Here’s another example of Oracle’s code not behaving consistently. You may recognise the basic query from yesterday’s example of logical tuning – so I won’t reprint the code to generate the data sets. This examples in this note were created on 18.104.22.168 – we start with a simple query and its execution plan:
Here’s a model of a problem I solved quite recently at a client site. The client’s query was much more complex and the volume of data much larger, but this tiny, two table, example is sufficient to demonstrate the key principle. (Originally I thought I’d have to use three tables to model the problem, which is why you may find my choice of table names a little odd). I ran this example on 22.214.171.124 – which was the client version:
My note on “NOT IN” subqueries is one of the most popular on my blog, staying in the top 5 hits for the last five years – but it’s getting a bit old, so it’s about time I said something new about “NOT IN” – especially since the Null Aware Anti Join has been around such a long time. The example I want to talk about is, as so often, something that came up as a problem on a customer site. Here’s a bit of SQL to model the situation, which is currently running under Oracle 126.96.36.199:
When I wrote a note last week about the fixes to the subquery factoring optimizer code in 188.8.131.52, I finished with a comment about having more to say on the test case if I materialized the subquery. Today’s the day to talk about it. As a reminder, here’s the query, but with the /*+ materialize */ hint in place:
Here’s an interesting little conundrum about subquery factoring that hasn’t changed in the recent (184.108.40.206) patch for subquery factoring. It came to me from Jared Still (a fellow member of Oak Table Network) shortly after I’d made some comments about the patch. It’s an example based on the scott/tiger schema – which I’ve extracted from the script $ORACLE_HOME/rdbms/admin/utlsampl.sql (though the relevant scripts may be demobld.sql or scott.sql, depending on version).
It’s always worth browsing through the list of Oracle’s bug fixes each time a new release or patch comes out because it can give you clues about where to look for problems in your production release – and how to anticipate problems on the upgrade. This article is an example of a fix that I found while while looking at the note for 220.127.116.11 (MOS licence required for link) quite recently.
Last year I wrote a few articles for Simpletalk, a web service created by Red Gate for users of SQL Server. This year, Red Gate is setting up a similar service called “All things Oracle” (I’ve added a link in my blogroll) for Oracle users, and I’ve volunteered to write articles for them occasionally.
Some of the stuff they publish will be complete articles on their website, some will be short introductions with links to the authors’ own websites. My first article for them has just been posted – it’s an article that captures a couple of key points from the optimizer presentation I did at the UKOUG conference a couple of weeks ago.
The UKOUG conference is over for another year – but it has left me with plenty to do and lots of things to investigate. Here’s just one little point that I picked up during one of the 10 minute “Oak Talks” that members of the Oak Table Network were doing in the lunch breaks.
There is a fairly well-known strategy for generating a list of numbers by using a “select from dual … connect by …” query, but I hadn’t realised that there were two ways of using it. The code I’ve usually used is this:
rownum <= 4000