Oracle Scratchpad

August 15, 2010

Joins – MJ

Filed under: Execution plans,Performance — Jonathan Lewis @ 5:50 pm BST Aug 15,2010

The final join mechanism in my “all joins are nested loop joins” argument is the Merge Join – a join mechanism that depends on both its row sources being pre-sorted on the join columns. (For a quick reference list of URLs to all three articles see: Joins.)

The note on hash joins pointed out that a “traditional” nested loop join may result in repeated visits to data blocks that make up the second (inner) table as we work through the data from the first (outer) table – and argued that there were cases where it made sense for Oracle to copy a subset of the second table into private memory and structure the copy as a single table hash cluster. Towards the end of the note, though, there was a brief mention of the increased complexity and I/O cost when that subset was too large to fit completely into memory. This is where the merge join may turn out to be better than the hash join as a way of removing the stress of random I/O and buffer visits.

We’ll start with a simple, visual, example of how the merge join works. Take the following (pre-sorted) data sets from which I’ve only made visible the single join column values. The first data set, in order, is:

	(1 ,...) (3, ...)  (3, ...) (4, ...) (5, ...) (8, ...) (10, ...)

The second data set, again in order, is:

	(0, ...) (1, ...) (1, ...) (2, ...) (2, ...) (3, ...) (3, ...) (3, ...)
	(5, ...) (5, ...) (5, ...) (7, ...) (7, ...) (8, ...) (8, ...) (9, ...)

Our target is to join items from set 1 that match items from set 2. To do this we read the first value (1, …) from the first data set and scan the second data set to find matches, then we read the second value from the first data set and scan the second data set to find matches and so on. Immediately you can see that we’re doing a nested loop join – for each row in the (sorted) first data set we scan the second data set.

Where, then, is the special feature of the merge join that gives it a separate name and execution plan? Most of it is in the benefit we get from the two data sets being sorted in the same order as we repeatedly scan the second data set. (There’s extra benefit, which we also saw in the hash join, because the data has been copied from the buffer cache into private memory allowing us to avoid latching and pinning overheads as we re-visit data).

Think about what happens as we hit the (4, …) in the first data set. We have to scan the second data set, but how much of it do we need to scan ? Because the first data set is sorted we can take advantage of remembering what we did with the previous element of the first data set. At worst we will need to start in the second data set at the point where we found the first match for previous element from the first data set; and because the second data set is sorted we can stop the scan the moment we overshoot our target value.

Assuming (for this example) that our test is for equality, we could imagine the code doing the following:

  • Select (4,…) from first set
    Start at the 3’s in the second set (because that’s where we got the first match last time)

… (3,…) too small
… (3,…) too small
… (3,…) too small
… (5,…) too big – stop, no matches for 4

  • Select 5 from first set
    Start at the 3’s in the second set (because we didn’t get any matches for 4 the start point hasn’t changed)

… (3,…) too small
… (3,…) too small
… (3,…) too small
… match, match, match
… (7,…) too big

  • Select 8 from the first set
    Start at the 5’s in the second set (smallest match of previous value)

… (5,…) too small
… (5,…) too small
… (5,…) too small
… (7,…) too small
… (7,…) too small
… match, match
… (9,…) too big

For each value in the first set (apart from the very first one – where we may simply start at the lowest value in the second set) we have a way of minimising the scan on the second set. Moreover if the second set is too large to keep entirely in memory this “windowing” effect means the extra I/O that we do on the merge is also kept to a minimum.

So where are the costs and benefits compared to the nested loop join and the hash join:

The benefit is the same as the benefit we got from the hash join – we avoid repeated access into data blocks that are in public memory and may even be on disc; so random I/O, latch acquisition, and buffer pinning costs are reduced.

The cost is that we need to have the data in the right order as we operate the join – so we may have to sort both sets of data first and sorting can be expensive, particularly if it spills to disc.

These observations may give you some way to compare nested loops joins with merge joins – but why bother with merge joins at all when you’ve got the hash join mechanism. (One feature, of course, is that merge joins need not be restricted to equality predicates as hash joins are, but we’ll ignore that for the present.) What’s the critical comparison there? The answer lies in the closing comment I made about hash joins – the complications that arise when the hash table doesn’t fit in memory.

If the hash table fits in memory the only I/O that Oracle does for the second data set (probe table) is to read it once. If the hash table doesn’t fit in memory Oracle will have to copy some of it to disc, and then copy a similar proportion of the probe table to disc as well. Then it will have to re-read matching portions of the two data sets to join them. The larger the first data set is (relative to the available memory), the larger the fraction of the second data set that has to be copied to disc, and the more times some parts of it may have to be re-read to complete the join.

If the sorted data sets for a merge join are too large to fit into memory Oracle dumps them to disc, but only reads through them (at most) once each for equality joins and the commoner types of range-based joins. (Of course, if the data sets are large then the work done in sorting them may be expensive on CPU and I/O – and there are several variations in mechanism that you may have to consider when thinking about forcing a merge join).

You can probably find lots of books and internet notes that say something like “hash joins are good if one of the data sets is small, merge joins are better if both data sets are large”. There is an element of truth in this statement – but words like “small” and “large” don’t have any absolute meaning in this context, so it’s important to have an understanding of how the hash join and merge join work so you can recognise what the trade-off is between them.

There’s plenty more to say about the workings of hash joins and merge joins and I have several draft notes  about them that I will probably complete and publish eventually. In the meantime if you’re prepared to spend a little cash on understanding more of the mechanisms you could always buy a copy of “Cost Based Oracle – Fundamentals”.

I’ll just close with a couple of examples of execution plans (from 11g) for merge joins:

rem
rem     Script:         c_merge_join.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2001
rem 

select 
	count(t1.v1)	ct_v1,
	count(t2.v2)	ct_v2
from
	t1, t2
where
	t2.n2 = t1.n1
;

-------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |  1508 |  1055 |
|   1 |  SORT AGGREGATE      |      |     1 |  1508 |       |
|   2 |   MERGE JOIN         |      |  1000 |  1472K|  1055 |
|   3 |    SORT JOIN         |      |  1000 |   980K|   184 |
|   4 |     TABLE ACCESS FULL| T1   |  1000 |   980K|    25 |
|*  5 |    SORT JOIN         |      | 10000 |  4921K|   871 |
|   6 |     TABLE ACCESS FULL| T2   | 10000 |  4921K|   113 |
-------------------------------------------------------------


-----------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |  1508 |  1017 |
|   1 |  SORT AGGREGATE               |       |     1 |  1508 |       |
|   2 |   MERGE JOIN                  |       |  1000 |  1472K|  1017 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    |  1000 |   980K|   146 |
|   4 |     INDEX FULL SCAN           | T1_I1 |  1000 |       |     3 |
|*  5 |    SORT JOIN                  |       | 10000 |  4921K|   871 |
|   6 |     TABLE ACCESS FULL         | T2    | 10000 |  4921K|   113 |
-----------------------------------------------------------------------


------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |     1 |  1508 |  1640 |
|   1 |  SORT AGGREGATE                |       |     1 |  1508 |       |
|   2 |   MERGE JOIN                   |       |  1000 |  1472K|  1640 |
|   3 |    TABLE ACCESS BY INDEX ROWID | T1    |  1000 |   980K|   146 |
|   4 |     INDEX FULL SCAN            | T1_I1 |  1000 |       |     3 |
|*  5 |    SORT JOIN                   |       | 10000 |  4921K|  1494 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2    | 10000 |  4921K|   736 |
|   7 |      INDEX FULL SCAN           | T2_I2 | 10000 |       |    21 |
------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."N2"="T1"."N1")
       filter("T2"."N2"="T1"."N1")

The predicate section of the three plans was the same in all three cases.

It important to note that you can avoid sorting the first table  (see 2nd ad 3rd plans) if a suitable index exists – but even when there is an appropriate index on the second table Oracle still reports a sort operation (3rd plan). Mind you, as I’ve pointed out in the past, sorts aren’t necessarily what they seem to be.

 

12 Comments »

  1. Hi Jonathan,

    Thanks for this Article.

    Just one doubt , the execution plan will decide to go for Merge Join depending upon the sql we write ? is this correct , unlike Hash join we need not create seperate Hash Tables i guess .. so please confirm if this determination is entirely based on the select statement plus the objects accessed?

    Secondly the book you have mentioned , is it available in India? if yes then kindly let me know the publisher details as I am not getting in book stores here….

    Thanks,
    Kartik

    Comment by kartik — August 19, 2010 @ 11:32 am BST Aug 19,2010 | Reply

  2. Kartik,

    We (the designers/DBAs/programmers) do not have to create a hash table to get a hash join. I introduced the real single table hash cluster in a nested loop as a way of showing the EFFECT of what Oracle does internally when it chooses to do a hash join.

    The choice of hash join, merge join, or nested loop join is determined entirely by the optimizer based on its assessment of data volume, data scatter, and available machine resources.

    I do not know if Cost Based Oracle Fundamentals is easily available in India.

    Comment by Jonathan Lewis — August 19, 2010 @ 7:08 pm BST Aug 19,2010 | Reply

  3. Hello Sir,

    Nice article..

    “If the sorted data sets for a merge join are too large to fit into memory Oracle dumps them to disc, but only reads through them (at most) once each”

    And in the last part of your article you show that if there is a suitable index exist on join column(s) of table(s) involve in join
    optimizer tend to avoid sorting which eventually lead to save on memory and CPU cycles

    Is it correct to say that in such scenario merge join outperform compare to hash join?

    Thanks

    Comment by henish — August 26, 2010 @ 8:26 pm BST Aug 26,2010 | Reply

  4. Henish,

    It’s certainly correct to say that there are cases where, for the same volume of data and same amount of memory used, the merge join CAN outperform the hash join for exactly these reasons.

    Comment by Jonathan Lewis — August 26, 2010 @ 9:44 pm BST Aug 26,2010 | Reply

  5. You claim here that a merge join is a form of nested loop join. This is clear with the hash join, where the inner loop must check all rows that match a particular hash. But I don’t see it here.

    Your example at the top of the post (scanning the second set) is artificial. A properly written merge join algorithm keeps a single pointer into both datasets and advances one or the other until either terminates. Certainly no looping is needed on the second dataset! The whole algorithm is a single loop – there shouldn’t be any nesting. Certainly when I designed and wrote merge joins for algorithms classes there weren’t nested loops.

    The caveat of course would be that both datasets are presorted. And of course SORTING requires some for of nesting and can’t beat O(n lg n). But if the input sets are truly presorted then the join itself should not require a loop and should execute in O(n) time, right?

    Comment by Jeremy Schneider — September 28, 2010 @ 3:30 pm BST Sep 28,2010 | Reply

  6. Jeremy,

    In what way is the example artificial ? I have rows which are non-unique in the first data set and some of those rows correspond to multiple rows in the second data set. The data is NOT artificial.

    Your point about “two pointers” and the sequential nature of walking the data is valid – and it’s the reason why we usually go to the expense of sorting and buffering the two sets of data.

    But to do the merge join we need a third pointer that starts from the position dictated by the second pointer (when it is in position) and walks the data set until it comes to a row which no longer matches current item in the first data set. At that point, the first pointer will move forward, the second point may, or may not, move forward, and the third pointer has to be set to the latest value of the second pointer.

    Irrespective of any thoughts about optimising the efficiency of the mechanism, though, the merge join at this point is still following the pattern: for each row in the first rowsource locate the matches in the second source.

    Comment by Jonathan Lewis — September 29, 2010 @ 2:28 pm BST Sep 29,2010 | Reply

  7. Aargh… ok I see it now.

    Many-to-many join on non-unique datasets… I wasn’t visualizing that quite right. I think that merge sort and merge join algorithms got a little jumbled in my head. It seemed “artificial” to me because I thought you unnecessarily invented the third pointer. The third pointer is needed for a merge join (though not for a merge sort).

    My apologies for accusing you of being artificial. :) Thanks for clearing up my misunderstanding.

    Comment by Jeremy Schneider — September 29, 2010 @ 5:03 pm BST Sep 29,2010 | Reply

  8. […] We now run a “report” that generates data for a number-crunching tool that extracts all the data from the tables – using an outer join so that parent rows don’t get lost. For various reasons the tool wanted the data sorted in a certain order – so there’s also an order by clause in the query. I’m going to show you the original query – first unhinted, and then hinted to use a merge join: […]

    Pingback by Join Surprise « Oracle Scratchpad — December 15, 2010 @ 8:56 pm GMT Dec 15,2010 | Reply

  9. […] the work by “remembering” where it started from on the previous probe – see, for example this blog post of mine on merge […]

    Pingback by Execution Plans Part 4: Precision and Timing – All Things Oracle — May 14, 2014 @ 6:21 pm BST May 14,2014 | Reply

  10. […] are only three join mechanisms used by Oracle: merge join, hash join and nested loop […]

    Pingback by Joins | Oracle Scratchpad — June 5, 2014 @ 8:02 am BST Jun 5,2014 | Reply

  11. […] that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more  efficient.  Consider the following query (I’ll supply the SQL to create the […]

    Pingback by Band Join 12c | Oracle Scratchpad — February 13, 2017 @ 1:54 pm GMT Feb 13,2017 | Reply

  12. […] anti-join with an index full scan (not FAST full scan). It’s interesting to note that the merge join anti hasn’t been as clever as the nested loop anti in avoiding probes of the second data […]

    Pingback by Validate FK | Oracle Scratchpad — July 10, 2018 @ 10:42 am BST Jul 10,2018 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.