Oracle Scratchpad

September 7, 2013

Hash Joins

Filed under: CBO,Execution plans,Hints,Oracle,Tuning — Jonathan Lewis @ 12:53 pm GMT Sep 7,2013

I’ve written notes about the different join mechanisms in the past – but such things are always worth revisiting, so here’s an accumulated bundle of comments about hash joins.

A hash join takes two inputs that (in most of the Oracle literature) are referred to as the “build table” and the “probe table”. These rowsources may be extracts from real tables or indexes, or might be result sets from previous joins. Oracle uses the “build table” to build a hash table in memory, consuming and using the rowsource in a single call; it then consumes the “probe table” one row at a time, probing the in-memory hash table to find a match.

Access to the hash table is made efficient by use of a hashing function applied to the join columns – rows with the same value on the join column end up hashing to the same place in the hash table. It is possible for different input values to produce the same hash value (a hash collision) so Oracle still has to check the actual values once it has identified “probable” joins in the hash table. Because the comparison is based on a hashing mechanism, hash joins can only be used for join predicates that are equality predicates.

The hint to force a hash join is /*+ use_hash(rowsource_alias) */. This tells the optimizer that the join method to be used when “rowsource_alias” is the next rowsource in the join order should be a hash join; however it does not tell the optimizer whether that rowsource should be used as the build table or the probe table. To specify how the rowsource is used you need a second hint: no_swap_join_inputs(rowsource_alias) if you want Oracle to use the rowsource as the probe table, or swap_join_inputs(rowsource_alias) if you want Oracle to use it as the build table; for example, with a little cosmetic editing:


select  /*+ leading(table_1 table_2) use_hash(table_2) no_swap_join_inputs(table_2) */ *
from    t1 table_1, t2 table_2
where   table_2.n1 = table_1.n1
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 45000 |    16M|    44 |
|*  1 |  HASH JOIN         |      | 45000 |    16M|    44 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 |   547K|    14 |
|   3 |   TABLE ACCESS FULL| T2   |  3000 |   547K|    14 |
-----------------------------------------------------------

select  /*+ leading(table_1 table_2) use_hash(table_2) swap_join_inputs(table_2) */ *
from    t1 table_1, t2 table_2
where   table_1.n1 = table_1.n1
;

-----------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost  |
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 45000 |    16M|    44 |
|*  1 |  HASH JOIN         |      | 45000 |    16M|    44 |
|   2 |   TABLE ACCESS FULL| T2   |  3000 |   547K|    14 |
|   3 |   TABLE ACCESS FULL| T1   |  3000 |   547K|    14 |
-----------------------------------------------------------

If you were to look at the second execution plan without analysing the optimizer trace file you would probably assume that the join order (very specifically the thing that Oracle labels “join order” for the purposes of examining execution paths in a structured sequence) was (table_2, table_1) – it is important to remember that the “internal” join order and the apparent join order are not necessarily the same because the optimizer may have done some “side-swapping” for a plan using hash joins.

A common misunderstanding is that a hint of the form /*+ use_hash(table_1 table_2) */ is a directive to Oracle to do a hash join with table_1 as the build table and table_2 as the probe table. This is not the case; the hint in this form is simply a shorthand for a pair of single-table hints, viz: use_hash(table_1) use_hash(table_2). In the simplest case this translates as: “if table_1 is the second table in the join order then use a hash join to access it, if table_2 is the second table in the join order then use a hash join to access it”; given this interpretation it’s not too surprising that the hint appears to mean more than it really does.

There is a third hint that you can associate with hash joins, and this is used to dictate how one set of slaves should distribute data to the next set of slaves when running parallel queries – the pq_distribute() hint. The description in the manuals for this hint reports something like the following: pq_distribute( tablespec, inner_distribution, outer_distribution). I think this leaves some room for confusion – which table, for example, should be supplied in the tablespec, and how should you interpret inner and outer ?  (The manual helpfully tells us that the outer_distribution is the distribution for the outer table, but that doesn’t really explain anything.) The “outer” table is the one specified in the hint, the “inner” table is the previous rowsource – and this doesn’t change even if you’ve used the swap_join_inputs() hint. Basically, when you hint a hash join for a table in a parallel query you need three hints to describe the hash join and for clarity you might as well make them three consecutive hints:


/*+
        use_hash(table_X)
        [no_]swap_join_inputs(table_X)
        pq_distribute(table_X {distribution for previous rowsource} {distribution for table_X})
*/

I won’t go into the details of the possible distribution methods and the effects they have. but here’s a little example demonstrating the hint. I’ve supplied an SQL statement (with the pq_distribute() hint commented out) and shown two plans – the first is the unhinted plan, the second shows the effect of my chosen distribution:

select  /*+
                leading(table_1 table_2)
                use_hash(table_2)
                swap_join_inputs(table_2)
--              pq_distribute(table_2 broadcast none)
        */ *
from    t1 table_1, t2 table_2
where   table_2.n1 = table_1.n1
;

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 45000 |    16M|    19 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 | 45000 |    16M|    19 |  Q1,02 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED   |          | 45000 |    16M|    19 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          |  3000 |   547K|     8 |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 |  3000 |   547K|     8 |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          |  3000 |   547K|     8 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T2       |  3000 |   547K|     8 |  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |  3000 |   547K|     8 |  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |  3000 |   547K|     8 |  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |  3000 |   547K|     8 |  Q1,01 | PCWC |            |
|  11 |        TABLE ACCESS FULL| T1       |  3000 |   547K|     8 |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          | 45000 |    16M|    19 |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 | 45000 |    16M|    19 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN             |          | 45000 |    16M|    19 |  Q1,01 | PCWP |            |
|   4 |     PX BLOCK ITERATOR    |          |  3000 |   547K|     8 |  Q1,01 | PCWC |            |
|   5 |      TABLE ACCESS FULL   | T2       |  3000 |   547K|     8 |  Q1,01 | PCWP |            |
|   6 |     BUFFER SORT          |          |       |       |       |  Q1,01 | PCWC |            |
|   7 |      PX RECEIVE          |          |  3000 |   547K|     8 |  Q1,01 | PCWP |            |
|   8 |       PX SEND BROADCAST  | :TQ10000 |  3000 |   547K|     8 |  Q1,00 | P->P | BROADCAST  |
|   9 |        PX BLOCK ITERATOR |          |  3000 |   547K|     8 |  Q1,00 | PCWC |            |
|  10 |         TABLE ACCESS FULL| T1       |  3000 |   547K|     8 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------

I’ve dictated the join order (t1, t2) and a hash join into t2, but I’ve also told the optimizer to swap the join inputs so t2 becomes the build table and when you look at the execution plan the join order appears to be (t2, t1).

By default – i.e. in the first plan – the optimizer used the (hash hash) distribution – so the first set of parallel execution slaves scanned t2 (sharing the task of the full tablescan fairly, we hope) and distributed the data “pseudo-randomly” (the hashing algorithm on the join column) between the slaves in the second set; then it scanned t1 and distributed that data using the same hashing algorithm on the join column. The second set of slaves built and probed with the partial data sets they were given.

In the second plan the first set of slaves scanned its selected portion of t2 and built a hash table from it, after which [see footnote] the second set of slaves started scanning t1 - but each slave in the second set sent every row it had scanned to every slave in the first set. This example is a little unusual, I’ve chosen it simply to demonstrate the principle; normally you might expect to broadcast the hash table if it produced a “small” data set as this tends to minimise the number of PX messages passed between slaves.

It’s worth pointing out that there’s a lot more you can do to investigate parallelism and hash joins – especially if you’re working with the Exadata database machine – and I will be publishing further notes on the topic. I’ll leave you with one warning, though. If you change the join in the example to an outer join:  “table_2.n1(+) = table_1.n1″ the (broadcast none) distribution will become invalid so the pq_distribute() hint will be “ignored”, even though the use_hash() and swap_join_inputs() will still be followed – you have to be very careful when trying to engineer parallel queries to follow a specific path.

Footnote: taking a closer look at the set of trace files generated in the broadcast test, I discovered that the first set of slave start their parallel tablescan of t1 first, but stops after just one read from each slave, then the second set of slaves scans and builds the hash table before calling for further data from the first set.

 

10 Comments »

  1. Hello Jonathan

    You say:
    If you were to look at the second execution plan without analysing the optimizer trace file you would probably assume that the join order (very specifically the thing that Oracle labels “join order” for the purposes of examining execution paths in a structured sequence) was (table_2, table_1)

    Is this then only valid for the second execution plan? If so why is this not true for the first execution plan?

    Regards Hans-Peter

    Comment by Hans-Peter — September 10, 2013 @ 7:20 am GMT Sep 10,2013 | Reply

    • Hans-Peter,
      I’m not sure I understand the thinking behind your question.

      In the serial execution plans, the first plan “looks” as if the join order examined by the optimizer was [t1, t2] – although, technically the trace file might show that the plan came from considering the join order [t2. t1]. Since we can see the hint, and the plan looks as if it is obeying we might not consider reviewing the trace file.

      The second plan “looks” as if the join order examined by the optimizer was [t2, t1], even though we have a leading() hint that requires the optimizer to consider only the join order [t1, t2] – so, if we didn’t examine the trace file, we might conclude that the optimizer had ignored the leading() hint if we weren’t aware of the significance of the swap_join_inputs() hint.

      Comment by Jonathan Lewis — September 10, 2013 @ 8:03 am GMT Sep 10,2013 | Reply

      • Hi Jonathan,

        This article is quite nice.

        However, I am not able to understand the second line of the second paragraph of your response to Hans-Peter’s query – “although, technically the trace file might show that the plan came from considering the join order [t2. t1]. ”

        My question is from where the join order [t2. t1] is coming in picture as far as the first SQL is concerned ?

        Regards,
        Sudipta.

        Comment by sudiptabiswas2 — October 5, 2013 @ 9:52 am GMT Oct 5,2013 | Reply

        • Sudipta,

          If you read the whole comment again, paying most attention to the second half, you will see that I’ve said about the SECOND plan that it looks as if Oracle has produced it by examining the join order (t2, t1) even though the hint told the optimizer to consider only the join order (t1, t2) – the point is that when considering the join order (X, Y) for a hash join the optimizer is allowed to decide which of X and Y should be the build table and which the hash unless explicitly hinted.

          If you apply the same thinking to the FIRST plan, we can see that it appears to be from the join order (t1, t2) — but you cannot see in the plan whether or not it appeared because Oracle was considering join order (t1, t2) or join order (t2, t1). The point is that the join order is NOT visible in the picture, and you have to check the trace file to discover the actual join order (or, in this particular case, believe that the hints are correct).

          Comment by Jonathan Lewis — October 5, 2013 @ 10:27 am GMT Oct 5,2013

  2. Hi,
    You have understood my correctly.
    Thanks for the clarification.
    Regards Hans-Peter

    Comment by Hans-Peter — September 11, 2013 @ 11:29 am GMT Sep 11,2013 | Reply

  3. Thanks Jonathan for the reply. But again I have a question which stems from your reply.

    Consider the first line “If you apply the same thinking to the FIRST plan, we can see that it appears to be from the join order (t1, t2) — but you cannot see in the plan whether or not it appeared because Oracle was considering join order (t1, t2) or join order (t2, t1)” of the 2nd paragraph in your reply (10:27 am BST Oct 5,2013)

    My question here is : The 1st plan clearly shows the join order is (t1,t2). Then why have you said “but you cannot see in the plan whether or not it appeared because Oracle was considering join order (t1, t2) or join order (t2, t1)” ? If Oracle was considering a join order of (t2,t1) then why in the plan will a join order of (t1,t2) appear ?

    Regards,
    Sudipta.

    Comment by sudiptabiswas2 — October 5, 2013 @ 12:29 pm GMT Oct 5,2013 | Reply

    • Sudipta,

      Note this fragment in the article:

    • “… (very specifically the thing that Oracle labels “join order” for the purposes of examining execution paths in a structured sequence) …”
    • Both execution plans are the result of Oracle examining only a single join order, namely the join order T1 -> T2, and yet the second execution appears to be following the join order T2 -> T1.

      I could get the first plan by forcing Oracle to examine only the join order T2 -> T1 and swapping the roles of build and probe tables – so the first plan doesn’t “… clearly show the join order is t1->t2 …”.

      You might like to review this posting which shows 8 different execution plans that are all based on limiting the optimizer to same single join order.

      Comment by Jonathan Lewis — October 5, 2013 @ 12:41 pm GMT Oct 5,2013 | Reply

  • Hello Jonathan,

    Seems a bit clear now.

    Question 1 : By the phrase “I could get the first plan by forcing Oracle to examine only the join order T2 -> T1 and swapping the roles of build and probe tables – so the first plan doesn’t” do you mean that at the first place you could have got the first plan to consider a join order T2->T1 by using only onehint /*+ leading(table_1 table_2) */ (no other hints are there) but once you add the hints (no_swap_join_inputs(table_2)) that swaps/changes the roles of build and probe tables then the join order of T1->T2 would be implemented ?

    Comment by sudiptabiswas2 — October 5, 2013 @ 1:12 pm GMT Oct 5,2013 | Reply


  • RSS feed for comments on this post. TrackBack URI

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    The Rubric Theme. Blog at WordPress.com.

    Follow

    Get every new post delivered to your Inbox.

    Join 4,524 other followers