Oracle Scratchpad

December 8, 2013

Parallel Execution – 3

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 10:09 pm BST Dec 8,2013

It’s finally time to take a close look at the parallel versions of the execution plan I produced a little while ago for a four-table hash join. In this note I’ll examine the broadcast parallel distribution. First, here’s a list of the hints I’m going to use to get the effect I want:

	/*+
		leading(t4 t1 t2 t3)
		full(t4) parallel(t4, 2)
		use_hash(t1) swap_join_inputs(t1) pq_distribute(t1 none broadcast)
		full(t1) parallel(t1, 2)
		use_hash(t2) swap_join_inputs(t2) pq_distribute(t2 none broadcast)
		full(t2) parallel(t2, 2)
		use_hash(t3) swap_join_inputs(t3) pq_distribute(t3 none broadcast)
		full(t3) parallel(t3, 2)
		monitor
	*/

If you compare this set of hints to the hints I used for the serial plan you’ll see that the change involves two features:

I’ve added a parallel(alias, 2) for each table to the full(alias) hint – this simply ensures that the full tablescans I’ve demanded should operate as parallel tablescans – then I’ve added a pq_distribute(alias none broadcast) hint to the use_hash(alias) and swap_join_inputs(alias) pair of hints – this completes the specification of a hash join when running parallel: not only do we have to specify the join method, and which of the two data sources should be treated as the build (‘first’)table and which the probe (‘second’) table, we also have to specify the method a slave set uses to pass data on to the next. In combination the set of hash-related hints give the following instructions to the optimizer:

“when you get to the point in the code path for ‘now joining {alias}’, you should do a hash join, using {‘alias’} as the build table and the current result set as the probe table; {‘alias’} should be broadcast to every slave, but the current result set does not need distribution”

This (with a number of blank lines inserted) is the resulting execution plan from 11.2.0.4:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |   345 |        |      |            |
|   1 |  SORT AGGREGATE             |          |     1 |    38 |       |        |      |            |
|   2 |   PX COORDINATOR            |          |       |       |       |        |      |            |
|   3 |    PX SEND QC (RANDOM)      | :TQ10003 |     1 |    38 |       |  Q1,03 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE          |          |     1 |    38 |       |  Q1,03 | PCWP |            |
|*  5 |      HASH JOIN              |          |    26 |   988 |   345 |  Q1,03 | PCWP |            |
|   6 |       PX RECEIVE            |          |     3 |    18 |     2 |  Q1,03 | PCWP |            |

|   7 |        PX SEND BROADCAST    | :TQ10000 |     3 |    18 |     2 |  Q1,00 | P->P | BROADCAST  |
|   8 |         PX BLOCK ITERATOR   |          |     3 |    18 |     2 |  Q1,00 | PCWC |            |
|*  9 |          TABLE ACCESS FULL  | T3       |     3 |    18 |     2 |  Q1,00 | PCWP |            |

|* 10 |       HASH JOIN             |          |   612 | 19584 |   343 |  Q1,03 | PCWP |            |
|  11 |        PX RECEIVE           |          |     3 |    18 |     2 |  Q1,03 | PCWP |            |

|  12 |         PX SEND BROADCAST   | :TQ10001 |     3 |    18 |     2 |  Q1,01 | P->P | BROADCAST  |
|  13 |          PX BLOCK ITERATOR  |          |     3 |    18 |     2 |  Q1,01 | PCWC |            |
|* 14 |           TABLE ACCESS FULL | T2       |     3 |    18 |     2 |  Q1,01 | PCWP |            |

|* 15 |        HASH JOIN            |          | 14491 |   367K|   341 |  Q1,03 | PCWP |            |
|  16 |         PX RECEIVE          |          |     3 |    18 |     2 |  Q1,03 | PCWP |            |

|  17 |          PX SEND BROADCAST  | :TQ10002 |     3 |    18 |     2 |  Q1,02 | P->P | BROADCAST  |
|  18 |           PX BLOCK ITERATOR |          |     3 |    18 |     2 |  Q1,02 | PCWC |            |
|* 19 |            TABLE ACCESS FULL| T1       |     3 |    18 |     2 |  Q1,02 | PCWP |            |

|  20 |         PX BLOCK ITERATOR   |          |   343K|  6699K|   339 |  Q1,03 | PCWC |            |
|* 21 |          TABLE ACCESS FULL  | T4       |   343K|  6699K|   339 |  Q1,03 | PCWP |            |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."ID"="T4"."ID3")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter((TO_NUMBER("T3"."SMALL_VC")=1 OR TO_NUMBER("T3"."SMALL_VC")=2 OR
               TO_NUMBER("T3"."SMALL_VC")=3))
   10 - access("T2"."ID"="T4"."ID2")
   14 - access(:Z>=:Z AND :Z<=:Z)
        filter((TO_NUMBER("T2"."SMALL_VC")=1 OR TO_NUMBER("T2"."SMALL_VC")=2 OR
               TO_NUMBER("T2"."SMALL_VC")=3))
   15 - access("T1"."ID"="T4"."ID1")
   19 - access(:Z>=:Z AND :Z<=:Z)
        filter((TO_NUMBER("T1"."SMALL_VC")=1 OR TO_NUMBER("T1"."SMALL_VC")=2 OR
               TO_NUMBER("T1"."SMALL_VC")=3))
   21 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER_LIST(SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID1"),SYS_OP_BLOOM_F
              ILTER(:BF0000,"T4"."ID2"),SYS_OP_BLOOM_FILTER(:BF0000,"T4"."ID3")))

The serial plan had 9 lines, this plan has 22 – which makes it a little more intimidating.

Each “TABLE ACCESS FULL” is the child of a “PX BLOCK ITERATOR” – telling us that we are doing parallel tablescans, breaking each table into “rowid ranges” (called granules – you’ll frequently see 13 granules per slave, though very large objects can have more) and sharing the pieces out between the set of slaves.

There are pairs of “PX SEND / PX RECEIVE” showing us where one set of parallel slaves passes rows to the next set of slaves. In this case we can see that (most of) our sending slave sets are using “PX SEND BROADCAST”, echoed as “BROADCAST” in the PQ Distrib column. Taking line 17 as an example, we get a clue about how slave sets communicate – the Name column references something called “:TQ10002″, this is a “table queue (TQ)” or virtual table; to send data, one set of slaves writes to a virtual table, the other set of slaves reads that virtual table.

A key point to reading parallel execution plans is that the names of the virtual tables reflect the order in which they are written to, so in this case we know that the virtual table in line 7 (:TQ10000) is populated before the virtual table in line 12 (:TQ10001) which is populated before the virtual table in line 17 (:TQ10002) which is populated before the virtual table in line 3 (:TQ10003).

In fact the plan shows us quite a lot about each of the virtual tables – the Name column gives us the name of a virtual table, the TQ column with a little cosmetic massage highlights the lines of the plan that populate that virtual table. So we can see that :TQ10000 at line 7 is populated by a “PX BLOCK ITERATOR” (line 8) which implements the parallelism of the “TABLE SCAN FULL” (line 9). Following this pattern we can see that the plan shows us doing parallel tablescans of T3, T2, and T1 in that order, populating virtual tables :TQ10000, :TQ10001, and :TQ10002 in that order.

It is not immediately obvious (until you’ve had a little practice) how the two sets of slaves are interacting as this goes on – but if you now look for the lines that populate virtual table :TQ10003 you will see that we have a single activity which drains all three of the previous virtual tables AND does a parallel tablescan of T4. (Check the TQ column of lines 5,6, 10/11, 15/16, 20/21 to see all the inputs to :TQ10003)

Putting the pieces together: there is a single slave set that operates a hash join at line 5 by reading a virtual table :TQ1000 and building an in-memory hash table; after which it reads a second virtual table :TQ10001 and builds a second in-memory hash table, then it reads a third virtual table :TQ10002 building a final in-memory hash table, before reading the real table T4 (using rowid ranges) which it uses to probe each of the three in-memory hash tables in turn. This plan is really the closest equivalent of the original serial execution plan that we could get.

As each slave in the slave set generates a result row from the three consecutive hash joins it keeps a running aggregate (the query is a simple count(col), remember) so that it can eventually forward a single row to the query co-ordinator. That’s what’s going into virtual table :TQ10003 at line 3. The query co-ordinator (the end-user’s shadow process) reads all the rows from the virtual table and aggregates them to the final result that will be sent to the end-user.

There’s a little more that I can say about this execution plan – but I think that’s enough for a first dose; I’ll add an update to this article in a couple of days covering a few details about the virtual tables, parallel execution message size, and Bloom filters.

Footnote:

As often happens, a newer version of Oracle produces a more informative representation of the plan; in this case 12c shows operation lines for Bloom filter creation and usage.  It also happened, in this case, to bypass the redundant broadcasting of the small tables by doing all the scans and joins inside a single set of slaves – so every slave scanned every block of the previously broadcast tables, a strategy that you can force with the 12c /*+ pq_replicate(alias) */ hint and block with the /*+ no_pq_replicate(alias) */ hint.

4 Comments »

  1. […] aware that in the previous article in this series I said I’d continue “in a few days” and it has now been more like 11 weeks […]

    Pingback by Parallel Execution – 4 | Oracle Scratchpad — February 26, 2014 @ 1:58 pm BST Feb 26,2014 | Reply

  2. […] “broadcast” distribution option of parallel queries. I mentioned this in a footnote to a longer article a couple of months ago; this note simply expands on that brief comment with an example. We’ll […]

    Pingback by 12c pq_replicate | Oracle Scratchpad — March 5, 2014 @ 5:40 pm BST Mar 5,2014 | Reply

  3. […] in my code. Here are two sets of hints showing what I’ve done; the first is a repeat from the third article showing the broadcast example, the second shows the small change needed to get the hash […]

    Pingback by Parallel Execution – 5 | Oracle Scratchpad — March 10, 2014 @ 1:30 pm BST Mar 10,2014 | Reply

  4. […] Parallel hash join with Broadcast strategy (and a note on 12c pq_replicate) […]

    Pingback by Parallel Plans | Oracle Scratchpad — March 11, 2014 @ 7:39 am BST Mar 11,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,508 other followers