12c pq_replicate

March 5, 2014

12c pq_replicate

Filed under: 12c,Exadata,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 5:39 pm GMT Mar 5,2014

One of the changes that appeared in 12c was a change to the broadcast distribution option for parallel execution. I mentioned this in a footnote to a longer article a couple of months ago; this note simply expands on that brief comment with an example. We’ll start with a simple two-table hash join – which I’ll first construct and demonstrate in 11.2.0.4:

rem
rem     Script: pq_replicate.sql
rem     Dated:  March 2014
rem     Author: J P Lewis
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum  id
        from    all_objects
        where   rownum <= 3000
)
select
        rownum                          n1,
        lpad(rownum,6,'0')              small_vc,
        lpad(rownum,200,'0')            padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1000
;

create table t2
as
with generator as (
        select  --+ materialize
                rownum  id
        from    all_objects
        where   rownum <= 3000
)
select
        1 + mod(rownum,10000)                   n1,
        lpad(1 + mod(rownum,10000),6,'0')       small_vc,
        lpad(rownum,500,'0')                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 20000 ;

-- collect stats, no histograms.

select
        /*+
                leading(t1 t2)
                parallel(t1 2)
                parallel(t2 2)
                use_hash(t2)
        */
        t1.padding,
        t2.padding
from    t1, t2
where   t2.n1 = t1.n1
and     t2.small_vc = t1.small_vc
;

-------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |  1000 |   707K|   135 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001 |  1000 |   707K|   135 |  Q1,01 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN            |          |  1000 |   707K|   135 |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |          |  1000 |   207K|     4 |  Q1,01 | PCWP |            |
|   5 |      PX SEND BROADCAST  | :TQ10000 |  1000 |   207K|     4 |  Q1,00 | P->P | BROADCAST  |
|   6 |       PX BLOCK ITERATOR |          |  1000 |   207K|     4 |  Q1,00 | PCWC |            |
|   7 |        TABLE ACCESS FULL| T1       |  1000 |   207K|     4 |  Q1,00 | PCWP |            |
|   8 |     PX BLOCK ITERATOR   |          | 20000 |     9M|   131 |  Q1,01 | PCWC |            |
|   9 |      TABLE ACCESS FULL  | T2       | 20000 |     9M|   131 |  Q1,01 | PCWP |            |
-------------------------------------------------------------------------------------------------

In this plan slave set 2 scans table t1 in parallel and broadcasts the result set to slave set 1 (lines 5 – 7). The significance of the broadcast option is that each slave in slave set 2 sends all the rows it has read to every slave in slave set 1. For a fairly large table with a high degree of parallelism this could be a lot of inter-process communication; the total number of rows passing through the PX message pool is “DOP x number of rows filtered from t1”.

After a slave in slave set 1 has receive the whole of the t1 result set it builds an in-memory hash table and starts scanning rowid ranges (PX BLOCK ITERATOR) from table t2 probing the in-memory hash table to effect the join (lines 3,4, 8,9). Since each slave has a copy of the whole result set from t1 it can scan any chunk of t2 and handle the contents locally. Moreover because slave set 1 isn’t reading its second input from a virtual table it is able to write its output immediately to the virtual table (:TQ10001) that feeds the query coordinator with the result (lines 1,2) – so we haven’t had to do a “hash join buffered” operation and buffer the entire second input before starting to execute the join.

So how does 12c change things. With the same starting data and query, here’s the execution plan:

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost  |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |  1000 |   707K|   135 |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  1000 |   707K|   135 |  Q1,00 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN          |          |  1000 |   707K|   135 |  Q1,00 | PCWP |            |
|   4 |     TABLE ACCESS FULL | T1       |  1000 |   207K|     4 |  Q1,00 | PCWP |            |
|   5 |     PX BLOCK ITERATOR |          | 20000 |     9M|   131 |  Q1,00 | PCWC |            |
|   6 |      TABLE ACCESS FULL| T2       | 20000 |     9M|   131 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------

Notice, in particular, that we only have one virtual table (or table queue :TQ10000) rather than two, which is between a parallel query slave set and the query co-ordinator, running parallel to serial: the query only uses one set of parallel query slaves.

Until you run the query with rowsource execution statistics enabled and look at the output from v$pq_tqstat it’s not going to be immediately obvious what has happened, but we should see that somehow Oracle is no longer broadcasting the first table even though it’s still doing something in parallel with both tables.

The run-time statistics confirm that we’ve only used one set of slaves, and tell us that each slave in the slave set has scanned the whole of table t1. This means each slave can build the full hash table and then go on to read rowid ranges from table t2. We’ve managed to get the benefit of broadcasting t1 (every slave has the whole of t1 so we don’t have to scan and distribute the big table t2 through the PX message pool) but we haven’t had to clone it multiple times through the PX message pool.

Clearly there’s a trade-off here that Oracle Corp. has decided is worth considering. I’m guessing it’s biased towards Exadata where you might run queries with a very high degree of parallelism. In that case the overhead of task switching as large numbers of messages are passed around may (and this is pure supposition) be greater than the added cost of loading the table into the buffer cache (of each instance) and having each slave scan it from there. (Reminder – 11g introduced two “opposite” changes to tablescans: “serial direct reads” and “parallel in-memory scans”.)

This is a cost-based decision, though. Given a “large enough” build table the optimizer will switch back to the plan that I reported for 11.2.0.4 with two slave sets and the scan of the build table being shared across the first set with the extracted rows being broadcast to the second set.

There’s one little oddity [Update: not so odd, as explained by Franck Pachot in comment #2] in this replication – there’s a pair of hints: pq_replicate and no_pq_replicate that you can use to control the effect if you think the optimizer is making the wrong choice. I would have guessed that in my example the hint would read: /*+ pq_replicate(t1) */ as it’s table t1 that is read by every single slave. Strangely, though, this is what the outline section of the execution plan showed:


  /*+
      BEGIN_OUTLINE_DATA
      PQ_REPLICATE(@"SEL$1" "T2"@"SEL$1")
      PQ_DISTRIBUTE(@"SEL$1" "T2"@"SEL$1" BROADCAST NONE)
      USE_HASH(@"SEL$1" "T2"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T2"@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      OUTLINE_LEAF(@"SEL$1")
      ALL_ROWS
      OPT_PARAM('_optimizer_cost_model' 'io')
      DB_VERSION('12.1.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Notice how the hint specifies table t2, not table t1 !

Footnote

Here’s a little anomaly, and a generic warning about “optimizer_features_enable”: I found that if I used the hint /*+ optimizer_features_enable(‘11.2.0.4’) */ in 12c I could still get the pq_replicate() hint to work. Unfortunately there are a few places where the hint (or parameter) isn’t guaranteed to take the optimizer code backwards the full 100%.

Comments (4)

4 Comments »

Hi Jonathan,

I was just thinking, if we have segments which are left over with default parallel degree more than 1. Then querying/joining such tables/segments or using CTAS on such tables would certainly lead to Parallel distribution. So, it may replicate the same behavior on 11.2.0.4 and 12c.

Comment by Pavan Kumaran — March 6, 2014 @ 4:29 am GMT Mar 6,2014 | Reply
Hi Jonathan,
Why do you think the hint syntax is an oddity ? I think that PQ_REPLICATE argument identifies the join rather than the table, as with the USE_HASH and PQ_DISTRIBUTE.
If you prefer to see PQ_REPLICATE(T1) you can run the query with LEADING(T2 T1) USE_HASH(T1) SWAP_JOIN_INPUTS(T1). Same plan, just different way to hint (and possible here because we have only 2 tables).
PQ_REPLICATE(T2) just means it uses PQ_REPLICATE when joining to T2. The table is determined because it will be the build table of the hash join.
Of course, if we inverse the LEADING order we will have PQ_DISTRIBUTE(T1 NONE BROADCAST) instead of PQ_DISTRIBUTE(T2 BROADCAST NONE).
Regards,
Franck.

Comment by Franck Pachot — March 6, 2014 @ 7:31 pm GMT Mar 6,2014 | Reply
- Franck,
  
  I’ll certainly buy that as an explanation – but there’s a different oddity to explain … why have the hint at all ?
  
  If you have to look at all the hash-join related hints to work out which table is going to be replicated, why not change the pq_distribute hint – after all with your interpretation the pq_replicate() hint is essentially saying: “when looking at the pq_distribute hint, change BROADCAST into REPLICATE. So why not make (alias none replicate) and (alias replicate none) legal options for pq_distribute ? (I don’t think replication occurs for anything other than changing broadcast – but I could be wrong.)
  
  Comment by Jonathan Lewis — March 6, 2014 @ 9:42 pm GMT Mar 6,2014 | Reply
[…] PQ_REPLICATE […]

Pingback by Oracle SQL | 12c: New SQL PLAN OPERATIONS and HINTS — July 7, 2016 @ 7:54 pm BST Jul 7,2016 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

March 5, 2014

12c pq_replicate

Footnote

4 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS

Oracle Scratchpad

March 5, 2014

12c pq_replicate

Footnote

Share this:

Related

4 Comments »

Comments and related questions are welcome. Cancel reply

Click on the motto below

Everything Changes Eventually

Search this blog

Categories

Special Links

Recent Posts

Recent Comments

Popular articles

Popular References

Archives

Blogroll

Posts by RSS