Oracle Scratchpad

December 8, 2020

Hash Joins

Filed under: Hash Join,Joins,Oracle — Jonathan Lewis @ 11:55 am GMT Dec 8,2020

This is a follow-up to a note that’s 10 years old [opens in new tab], prompted by a thread on the Oracle Developer Community forum asking about the order of operation in an execution plan, and following up with a question about PGA memory use that I had left open in the original note.

The original note pointed out that the serial execution plan for a 4 table join that used hash joins for every join and had a leading() hint dictating a single join order could still have 8 distinct execution plans (which I then corrected to 4, because half of them were eliminated by an odd little inconsistency of the leading() hint).

The source of the number 8 was the fact that when you hint a hash join with /*+ use_hash(next_alias) */ the optimizer will consider a /*+ swap_join_inputs(next_alias) */ – in other words, it will choose which of the two inputs should be the “build” table and which the “probe” table, even though the human eye will probably assume that next_alias was clearly intended as the probe table. (The “little inconsistency” in the leading() hint is that it blocks the option for a swap for the first – and only the first – join unless it’s explicitly hinted.)

In the article I showed all 8 possible plans, and described how the two plans at the extreme ends of my list would operate at run-time. I’ve reproduced these two plans below, preceded by the hints that produced them:

leading(t1 t2 t3 t4)
use_hash(t2) no_swap_join_inputs(t2)
use_hash(t3) no_swap_join_inputs(t3)
use_hash(t4) no_swap_join_inputs(t4)
 
--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|*  3 |    HASH JOIN          |      |    70 |  1260 |     8 |
|*  4 |     HASH JOIN         |      |    70 |   840 |     5 |
|   5 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
|   6 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   7 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|   8 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
--------------------------------------------------------------

leading(t1 t2 t3 t4)
use_hash(t2)    swap_join_inputs(t2)
use_hash(t3)    swap_join_inputs(t3)
use_hash(t4)    swap_join_inputs(t4)
 
--------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    24 |    11 |
|   1 |  SORT AGGREGATE       |      |     1 |    24 |       |
|*  2 |   HASH JOIN           |      |    70 |  1680 |    11 |
|   3 |    TABLE ACCESS FULL  | T4   |    70 |   420 |     2 |
|*  4 |    HASH JOIN          |      |    70 |  1260 |     8 |
|   5 |     TABLE ACCESS FULL | T3   |    70 |   420 |     2 |
|*  6 |     HASH JOIN         |      |    70 |   840 |     5 |
|   7 |      TABLE ACCESS FULL| T2   |    70 |   420 |     2 |
|   8 |      TABLE ACCESS FULL| T1   |    70 |   420 |     2 |
--------------------------------------------------------------

Side note: one of the comments on the original post raised the point that (e.g.) the second of the plans above could have been generated by the join order represented by the hint leading(t2 t1 t3 t4), and that’s absolutely correct. In fact, if you see the plan for a 4 table join consisting of nothing but 3 hash join you cannot be certain what join order the optimizer was examining when it produced that plan unless you look at the CBO trace file (or check to see if it’s been explicitly and completely, hinted in the code).

Having set the scene, we come to the question that prompted this note. The question related to the second plan above, and said:

“However, this one is the plan that takes the most memory in the PGA because it builds 3 hash tables before joining any table, right?”

The quick answer to the question is: “Not necessarily.”

In the original article I had pointed out that plans following the pattern of the first plan above with N tables and N – 1 joins would have at most two build tables in memory at any one moment while the bottom plan would create N – 1 build tables in memory before any join results could be created. This does rather invite the inference that the bottom plan has to be the one that is going to use most PGA memory, but what I had said in the original article was (emphasis added):

“Notice: the number of in-memory hash (build) tables we have in the first of the 8 plans at any instant after the first join starts is two and (no matter how many tables are involved in this pattern) the number of in-memory hash tables will always be two. The actual size of the two hash tables is a little unpredictable and, as a very crude guideline, you might expect the size to grow as more tables are joined into the result set.

As a thought experiment, consider 3 small dimension tables and one big fact table. If Oracle were to create in-memory hash tables from the three dimension tables and then start scanning the fact table (following the pattern of the second plan above with t1 in the role of the fact table) probing each of the dimension tables in turn, it could deliver the first result row very quickly without requiring more memory to store intermediate results.

Conversely if Oracle were to create a tiny in-memory hash table from the first dimension and probe it with the fact table (following the pattern of the first plan above with t2 in the role of the fact table) Oracle would then have to build a very large in-memory hash table before before probing it with the second dimension table, and as that second join takes place it would be generating a new result set that would become the next big in-memory hash table.

In this thought experiment we would probably find that the optimizer did the right thing without prompting and constructed three tiny in-memory hash tables – but it’s not always so clear-cut, and even the “right” decision can result in very large intermediate build tables (and if those intermediate build tables spill to disc in an Exadata environment the change in performance can be huge).

To finish off, here’s a script to turn the thought experiment into a concrete example (by careful, but deliberately silly, hinting).

rem
rem     Script:         c_treblehash_3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2020
rem     Purpose:        
rem
rem     Last tested 
rem             19.3.0.0
rem 

create table t1 
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t1 
        add constraint t1_pk primary key(id)
;

create table t2
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t2
        add constraint t2_pk primary key(id)
;

create table t3
as
select
        rownum          id,
        to_char(rownum) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 70
;

alter table t3
        add constraint t3_pk primary key(id)
;

create table t4
nologging
as
select
        t1.id                   id1,
        t2.id                   id2,
        t3.id                   id3,
        rpad(rownum,10)         small_vc,
        rpad('x',100)           padding
from
        t1, t2, t3
;

connect test_user/test
set linesize 156
set serveroutput off

select
        /*+ 
                leading(t4 t1 t2 t3)
                full(t4) 
                use_hash(t1) full(t1) swap_join_inputs(t1)
                use_hash(t2) full(t2) swap_join_inputs(t2)
                use_hash(t3) full(t3) swap_join_inputs(t3) 
        */
        count(t1.small_vc),
        count(t2.small_vc),
        count(t3.small_vc),
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

column pga_max_mem format 99,999,999

select pga_max_mem from v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from V$mystat where rownum = 1
                )
        )
;

connect test_user/test
set linesize 156
set serveroutput off

select
        /*+ 
                leading(t4 t1 t2 t3)
                full(t4) 
                use_hash(t1) full(t1) no_swap_join_inputs(t1)
                use_hash(t2) full(t2) no_swap_join_inputs(t2)
                use_hash(t3) full(t3) no_swap_join_inputs(t3) 
        */
        count(t1.small_vc),
        count(t2.small_vc),
        count(t3.small_vc),
        count(t4.small_vc)
from
        t4,     
        t1,     
        t2,     
        t3
where
        t1.id = t4.id1
and     t2.id = t4.id2
and     t3.id = t4.id3
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

column pga_max_mem format 99,999,999

select pga_max_mem from v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from V$mystat where rownum = 1
                )
        )
;

All I’ve done is create three small “dimension” tables of 70 rows each then created a table which is their Cartesian join, which produces a “fact” table of 343,000 rows. Then I’ve written a simple query to join the three dimension tables to the fact table.

I’ve used swap_join_inputs() for all the joins in one version of the query, and no_swap_join_inputs() in the other versions; and I’ve reconnected to the data before running each query to make it easier to see the different impact on the PGA of the two plans.

I’ve use dbms_xplan.display_cursor() to pull the execution plans from memory after running the queries, and since I’ve not set statistics_level to all, or added the hint /*+ gather_plan_statistics */ to the queries the only rowsource execution statistics I’ll get are the PGA usage: estimates and actuals.

The final step of each test reports the maximum PGA memory usage the session reached in the course of the test.

Here are the two plans, each followed by the PGA maximum memory size.

--------------------------------------------------------------------------
| Id  | Operation             | Name | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |        |       |       |          |
|   1 |  SORT AGGREGATE       |      |      1 |       |       |          |
|*  2 |   HASH JOIN           |      |    343K|  2171K|  2171K| 1684K (0)|
|   3 |    TABLE ACCESS FULL  | T3   |     70 |       |       |          |
|*  4 |    HASH JOIN          |      |    343K|  2171K|  2171K| 1681K (0)|
|   5 |     TABLE ACCESS FULL | T2   |     70 |       |       |          |
|*  6 |     HASH JOIN         |      |    343K|  2171K|  2171K| 1711K (0)|
|   7 |      TABLE ACCESS FULL| T1   |     70 |       |       |          |
|   8 |      TABLE ACCESS FULL| T4   |    343K|       |       |          |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."ID"="T4"."ID3")
   4 - access("T2"."ID"="T4"."ID2")
   6 - access("T1"."ID"="T4"."ID1")

PGA_MAX_MEM
-----------
 13,859,925

--------------------------------------------------------------------------
| Id  | Operation             | Name | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |        |       |       |          |
|   1 |  SORT AGGREGATE       |      |      1 |       |       |          |
|*  2 |   HASH JOIN           |      |    343K|    23M|  3727K|   29M (0)|
|*  3 |    HASH JOIN          |      |    343K|    23M|  3727K|   29M (0)|
|*  4 |     HASH JOIN         |      |    343K|    23M|  3667K|   29M (0)|
|   5 |      TABLE ACCESS FULL| T4   |    343K|       |       |          |
|   6 |      TABLE ACCESS FULL| T1   |     70 |       |       |          |
|   7 |     TABLE ACCESS FULL | T2   |     70 |       |       |          |
|   8 |    TABLE ACCESS FULL  | T3   |     70 |       |       |          |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T3"."ID"="T4"."ID3")
   3 - access("T2"."ID"="T4"."ID2")
   4 - access("T1"."ID"="T4"."ID1")

PGA_MAX_MEM
-----------
 52,984,917

As you can see, each of the hash joins in the first plan required roughly 1.7MB of memory. All three hash tables would have been in existence at the same time, giving a total of about 5.1MB of memory for the query. The session’s maximum PGA usage shows up as 13MB, of which 5MB was my basic “startup” PGA, leaving 3MB “unaccounted”.

In comparison, each of the hash joins in the second plan required roughly 29MB, although only two of the hash tables would have been in existence at any one moment. That’s still an allocation of 58MB for the same basic 4 table join. In fact things aren’t quite as bad as they seem in this case since the maximum PGA allocated was only about 52MB (again with 5MB of startup PGA). The apparent contradiction may be due to the way that Oracle allocates PGA in increasing chunks – the 29MB reported may have been the result of the session doing something like: “I’ve reached 23MB, my next allocation will be 4MB, oops, I only needed another 128KB)”

As a final check of activity, I’ve also run a couple of queries against V$sql_workarea – a convenient memory structure you can examine to get some “recent history” of queries that have been using large amount of memory, or spilling to disc. In this case I’ve query the structure by sql_id for the two queries, reporting just a little detail about the last execution and memory usage.

SQL> select operation_id, operation_type, last_memory_used, last_execution  from V$sql_workarea where sql_id = 'dcc01q28gcbmy';

OPERATION_ID OPERATION_TYPE                   LAST_MEMORY_USED LAST_EXECUTION
------------ -------------------------------- ---------------- ----------------------------------------
           2 HASH-JOIN                                 1724416 OPTIMAL
           4 HASH-JOIN                                 1721344 OPTIMAL
           6 HASH-JOIN                                 1752064 OPTIMAL


SQL> select operation_id, operation_type, last_memory_used, last_execution  from V$sql_workarea where sql_id = 'b52uwjz07fwhk';

OPERATION_ID OPERATION_TYPE                   LAST_MEMORY_USED LAST_EXECUTION
------------ -------------------------------- ---------------- ----------------------------------------
           2 HASH-JOIN                                30930944 OPTIMAL
           3 HASH-JOIN                                30945280 OPTIMAL
           4 HASH-JOIN                                30650368 OPTIMAL

As you can see, the view reports each plan operation (with id and type) that needed memory for an SQL workarea – and the numbers confirm the 1.7MB and 29MB reported by the execution plans. (Which is not surprising since it’s exactly these figures that are used to supply the details in the plans.)

You need to be a little cautious with this view in older versions of Oracle – it used to do a “full tablescan” of the library cache, which resulted in a lot of library cache latch activity and could cause some contention if you ran it frequently, but it’s a very useful view for finding queries that are doing unexpectedly large sorts or hash joins, and one you might query occasionally if you see any space management threats in the temporary tablespace.

3 Comments »

  1. […] I’ve just written a note in response to a question on the Oracle developer forum about this article. The question was checking whether it was correct to assume that the plan that created all the in-memory hash tables before doing any joining would be the most memory-intensive plan. The answer is no. […]

    Pingback by Quiz Night | Oracle Scratchpad — December 8, 2020 @ 1:13 pm GMT Dec 8,2020 | Reply

  2. Thanks for this follow up. I’m sorry in advance, but there is nothing really useful in the below comment, just some reflections :)

    I recall, I once had an idea that for this particular case could be expressed like

    (leading(t4, t1, t2, t3) + swap_join_inputs) = (leading(t3, t2, t1, t4) + no_swap_join_inputs)

    In my mind with this you should have been able to always use no_swap_join_inputs and have gotten more readable code. Unfortunately, this leads to a completely different join order. Out of interest I tried this in Cloud on 19.5, and predictably optimizer switches to MJ Cartesian on t3, t2, t1 and reports use_hash hints as unused (which is no surprise, since it has no join conditions on these 3 tables).

    It would be cool to have use_hash working something like a function of 2 arguments, for example:

    use_hash(t3, use_hash(t2, (use_hash(t1, t4)))) = leading(t4, t1, t2, t3) + swap_join_inputs + use_hash

    Resembles the actual plan, uses the first parameter as build, and the second one as probe. Probably, should work fine for other join methods. This could produce more readable outlines, and an easier way to express what you need from the optimizer. Though, given Oracle’s determination to maintain backwards compatibility, nothing like it could never be implemented, I believe.

    On the other hand, looks like, that in Cloud DWH optimizer_ignore_hints=true by default. Which, in my opinion, may suggest that in several years user hints might fall out of favour altogether.

    Comment by Viacheslav Andzhich — December 17, 2020 @ 9:37 pm GMT Dec 17,2020 | Reply

    • Viacheslav,

      Thanks for the comment, it raises an interesting discussion point.

      I think part of the messiness of expressing the desired plans through hints comes from the fact that Oracle only considers left-deep plans (with the enhancement in 12.2 that “bushy” joins are now possible) and uses the swap-join-inputs option for hash joins as a nod in the direction of right-deep trees and the full search space.

      I imagine it would be quite messy to try and modify the use_hash() hint to ensure that it is correctly parsed and is a legal match for the leading() hint (or for the first N tables in the leading hint, as a minor extra detail). A possible strategy might be have a new hint, e.g.

       leading (t1 t2 t3 t4) catalan(t3 ( t2 (t1 t4))) 

      with the rule that the leading() and catalan() must specify exactly the same set of aliases, and that the first of any bracketed pair in the catalan() hint is to be used as the build table. Then the optimizer could transform the catalan() hint into the equivalent set of use_hash() and [no_]swap_join_inputs() hints (and maybe modify the leading() hint to match) before doing anything else.

      Regards
      Jonathan Lewis

      P.S. I think I may have to write a short note about optimizer search spaces for joins and Catalan numbers.

      Comment by Jonathan Lewis — December 18, 2020 @ 11:27 am GMT Dec 18,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.