Oracle Scratchpad

February 13, 2017

Band Join 12c

Filed under: 12c,Execution plans,Oracle,Performance,Upgrades — Jonathan Lewis @ 1:53 pm GMT Feb 13,2017

One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more  efficient.  Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set)  happens to be unique with sequential values though there’s no index or constraint in place:

select
        t1.v1, t2.v1
from
        t1, t2
where
        t2.id between t1.id - 1
                  and t1.id + 2
;

This query returns nearly 40,000 rows. Except for the values at the extreme ends of the range each of the 10,000 rows in t2 will join to 4 rows in t1 thanks to the simple sequential nature of the data. In 12.2 the query, with rowsource execution stats enabled, completed in 1.48 seconds. In 12.1.0.2 the query, with rowsource execution stats OFF, took a little over 14 seconds. (With rowsource execution stats enabled it took 12.1.0.2 a little over 1 minute to return the first 5% of the data – I didn’t bother to wait for the rest, though the rate would have improved over time.)

Here are the two execution plans – spot the critical difference:


12.1.0.2
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   1 |  MERGE JOIN          |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   2 |   SORT JOIN          |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   3 |    TABLE ACCESS FULL | T1   | 10000 |   146K|    27   (4)| 00:00:01 |
|*  4 |   FILTER             |      |       |       |            |          |
|*  5 |    SORT JOIN         |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   | 10000 |   146K|    27   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ID"<="T1"."ID"+2)   -- > had to add GT here to stop WordPress spoiling the format 
   5 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID">="T1"."ID"-1)

12.2.0.1
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   1 |  MERGE JOIN         |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   2 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   146K|    25   (4)| 00:00:01 |
|*  4 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   | 10000 |   146K|    25   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID"<="T1"."ID"+2 AND "T2"."ID">="T1"."ID"-1)

Notice how operation 4, the FILTER, that appeared in 12.1 has disappeared in 12.2 and the filter predicate that it used to hold is now part of the filter predicate of the SORT JOIN that has been promoted to operation 4 in the new plan.

As a reminder – the MERGE JOIN operates as follows: for each row returned by the SORT JOIN at operation 2 it calls operation 4. In 12.1 this example will then call operation 5 so the SORT JOIN there happens 10,000 times. It’s important to know, though, that the name of the operation is misleading; what’s really happening is that Oracle is “probing a sorted result set in local memory” 10,000 times – it’s only on the first probe that it finds it has to call operation 6 to read and move the data into local memory in sorted order.

So in 12.1 operation 5 probes (accesses) the in-memory data set starting at the point where t2.id >= t1.id – 1; I believe there’s an optimisation here because Oracle will recall where it started the probe last time and resume searching from that point; having found the first point in the in-memory set where the access predicate it true Oracle will walk through the list passing each row back to the FILTER operation as long as the access predicate is still true, and it will be true right up until the end of the list. As each row arrives at the FILTER operation Oracle checks to see if the filter predicate there is true and passes the row up to the MERGE JOIN operation if it is. We know that on each cycle the FILTER operation will start returning false after receiving 4 rows from SORT JOIN operation – Oracle doesn’t.  On average the SORT JOIN operation will send 5,000 rows to the FILTER operation (for a total of 50,000,000 values passed and discarded).

In 12.2, and for the special case here where the join predicate uses constants to define the range, Oracle has re-engineered the code to eliminate the FILTER operation and to test both parts of the between clause in the same subroutine it uses to probe and scan the rowsource. In 12.2 the SORT JOIN operation will pass 4 rows up to the MERGE JOIN operation and stop scanning on the fifth row it reaches. In my examples that’s an enormous (CPU) saving in subroutine calls and redundant tests.

Footnote:

This “band-join” mechanism only applies when the range is defined by constants (whether literal or bind variable). It doesn’t work with predicates like (e.g.):

where t2.id between t1.id - t1.step_back and t1.id + t1.step_forward

The astonishing difference in performance due to enabling rowsource execution statistics is basically due to the number of subroutine calls eliminated – I believe (subject to a hidden parameter that controls a “sampling frequency”) that Oracle will call the O/S clock twice each time it calls the second SORT JOIN operation from the FILTER operation to acquire the next row. In 12.1 we’re doing roughly 50M redundant calls to that SORT JOIN.

The dramatic difference in performance even when rowsource execution statistics isn’t enabled is probably something you won’t see very often in a production system – after all, I engineered a fairly extreme data set and query for the purposes of demonstration. Note, however, the band join does seemt to introduce a change in cost, so it’s possible that on the upgrade you may find a few cases where the optimizer will switch from a nested loop join to a merge join using a band-join.

6 Comments »

  1. So sort-merge join hasn’t been completely supplanted by hash join after all…

    In your opinion (and I know this would probably be just a gross generalization anyway), what is the likelihood of SMJ being useful instead of nested loops with an index? Most of the time that I’ve ever needed something like this I just rely on an index, building one if necessary.

    Comment by Jason Bucata — February 13, 2017 @ 8:28 pm GMT Feb 13,2017 | Reply

    • Jason, hash joins only work with equality joins so the SMJ has been the only bulky alternative to the nested loop join for non-equijoins.
      The question you need to ask is, is it worth me building up a sorted set of data that can be probed or are the join filters enough to reduce the work required to read the data from your table (and sort smaller chunks)?
      If you feel you need to build the index and it’s a one-off SQL then the work done in building the index is going to be pretty comparable to the work done in building the sorted row source – except that your SMJ might be able to apply other filters to your table before doing a sort. SMJ probably wins this scenario. If it’s a statement you’ll be repeating often then NL with the new index may be the way to go.
      IMO it is the sort of join operation you’d only see in one-off statements, or possibly ETL work where your joins are not always simple equalities.

      Comment by Andrew Sayer — February 14, 2017 @ 1:20 pm GMT Feb 14,2017 | Reply

    • Jason,

      I think the only possibly answer is so generic that it probably doesn’t help much. The key thing, as so often, is to identify which resource is (a) adding most time or (b) causing most contention. In the case of the SMJ you access shared memory (i.e. the blocks in the buffer cache) data just once and then burn up CPU thrashing through it, in the NLJ version you (presumably) access the buffer cache repeatedly allowing for the possibility of latch contention affecting all the other users.

      There’s also the consideration that, for example, with a predicate like: col1 = … and col2 = … and cola = .. and colb = … with some vaguely relevant indexes in place, it may be better to use the index on (cola, colb) to acquire the data once and then use the (col1, col2) path to do the SMJ because a nested loop on (col1, col2) (or on (cola, colb)) would be using inefficient indexes many times. Sometimes you just have to pick the least worst path because you don’t want to pay for another index.

      I have seen SMJs from time to time – but, though I can’t say this with certainty, possibly only when nested loops were clearly going to be inefficient.

      Comment by Jonathan Lewis — February 14, 2017 @ 1:20 pm GMT Feb 14,2017 | Reply

  2. […] of the optimizer enhancements that appeared in 12.2 for SQL is the “band join” that makes certain types of merge join much more […]

    Pingback by Log Buffer #504: A Carnival of the Vanities for DBAs | Official Pythian® Blog – Cloud Data Architect — February 15, 2017 @ 10:37 am GMT Feb 15,2017 | Reply

  3. Hello Jonathan,

    I realize the purpose of this 12.2 enhancement is to do better with the same code, but I can’t help wondering what different code might do.

    Would you be so kind as to try this alternative in your environment to compare timings? Without 12.2 I can’t do the comparison myself.

    select T1V1, T2V1 from (
      select id-1 id, 1 is_t1, v1 from t1
      union all
      select id, null, v1 from t2
    )
    match_recognize(
      order by id, is_t1
      measures first(V1) as T1V1, V1 as T2V1
      all rows per match
      after match skip to next row
      pattern({-T1-} ( T2 | {-T1-} )* T2)
        define T1 as is_t1 = 1,
               T2 as is_t1 is null and id &lt;= first(id) + 3
    );
    

    Thanks in advance, Stew

    Comment by stewashton — February 15, 2017 @ 5:15 pm GMT Feb 15,2017 | Reply

    • Stew,

      Interesting idea.

      With _rowsource_execution_statistics = typical (which I forgot to report originally) the run time of the merge join varied from 0.17 to 0.36 seconds. The match_recognize query varied from 0.26 to 0.56 seconds.

      With the stats enabled the original varied from 1.79 to 1.91, the match_recognize from 2.30 to 2.79. It’s hard to say where the time went but some of it seemed to be in the task of sorting 20,000 rows against sorting 10,000 rows twice, and some of it due to the difference in the merge compared to the match_recognize.

      In practise variations in row widths might make different data sets behave dramatically differently so I don’t think we could draw any sort of general conclusion from the result.

      Comment by Jonathan Lewis — February 15, 2017 @ 7:10 pm GMT Feb 15,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.