Oracle Scratchpad

February 28, 2014

Empty Hash

Filed under: Bugs,CBO,Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 6:45 pm GMT Feb 28,2014

A little while ago I highlighted a special case with the MINUS operator (that one of the commentators extended to include the INTERSECT operator) relating to the way the second subquery would take place even if the first subquery produced no rows. I’ve since had an email from an Oracle employee letting me know that the developers looked at this case and decided that it wasn’t feasible to address it because – taking a wider view point – if the query were to run parallel they would need a mechanism that allowed some synchronisation between slaves so that every slave could find out that none of the slaves had received any rows from the first subquery, and this was going to lead to hanging problems.

The email reminded me that there’s another issue of the same kind that I discovered several years ago – I thought I’d written it up, but maybe it was on a newsgroup or forum somewhere, I can’t find it on my blog or old website). The problem can be demonstrated by this example (which I ran under 11.2.0.4):

rem
rem     Script:         parallel_hash_empty.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2002
rem

create table t1
as
select
        rownum                  id,
        mod(rownum,25)          rep_col,
        rpad('x',50)            padding
from
        all_objects
where
        rownum <= 3000 -- > comment to avoid WordPress format issue
;

delete from t1 where rep_col = 12;

create table t2
as
select
        rownum                  id,
        mod(rownum,25)          rep_col,
        rpad('x',50)            padding
from
        all_objects
where
        rownum <= 10000 -- > comment to avoid WordPress format issue
;

-- collect stats, no histograms

select
        /*+
                leading(t1 t2)
                use_hash(t2)
                no_swap_join_inputs(t2)
                pq_distribute(t2, hash, hash)
        */
        count(*)
from
        t1,t2
where
        t1.rep_col = 12
and     t2.id = t1.id
;


You’ll notice that I’ve created a data set for table t1 where the values 12 does not appear in column rep_col; so my query will return no rows. However, for the purposes of demonstration, I’ve hinted an execution path that will scan t1 and distribute it by hash, then scan t2 to distribute that by hash before doing the join. Here’s the plan – which I’ve generated with rowsource execution statistics enabled and pulled from memory after executing the query:


-------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Starts | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |          |      1 |      1 |00:00:00.01 |       6 |      2 |
|   1 |  SORT AGGREGATE            |          |      1 |      1 |00:00:00.01 |       6 |      2 |
|   2 |   PX COORDINATOR           |          |      1 |      2 |00:00:00.01 |       6 |      2 |
|   3 |    PX SEND QC (RANDOM)     | :TQ10002 |      0 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE         |          |      2 |      2 |00:00:00.01 |       0 |      0 |
|*  5 |      HASH JOIN             |          |      2 |      0 |00:00:00.01 |       0 |      0 |
|   6 |       JOIN FILTER CREATE   | :BF0000  |      2 |      0 |00:00:00.01 |       0 |      0 |
|   7 |        PX RECEIVE          |          |      2 |      0 |00:00:00.01 |       0 |      0 |
|   8 |         PX SEND HASH       | :TQ10000 |      0 |      0 |00:00:00.01 |       0 |      0 |
|   9 |          PX BLOCK ITERATOR |          |      2 |      0 |00:00:00.01 |      54 |     27 |
|* 10 |           TABLE ACCESS FULL| T1       |     27 |      0 |00:00:00.01 |      54 |     27 |
|  11 |       PX RECEIVE           |          |      0 |      0 |00:00:00.01 |       0 |      0 |
|  12 |        PX SEND HASH        | :TQ10001 |      0 |      0 |00:00:00.01 |       0 |      0 |
|  13 |         JOIN FILTER USE    | :BF0000  |      2 |      2 |00:00:00.01 |     116 |     87 |
|  14 |          PX BLOCK ITERATOR |          |      2 |      2 |00:00:00.01 |     116 |     87 |
|* 15 |           TABLE ACCESS FULL| T2       |     29 |      2 |00:00:00.01 |     116 |     87 |
-------------------------------------------------------------------------------------------------

Since this was 11.2.0.4 Oracle has used Bloom filtering to reduce the traffic between slave sets – but you can see that despite returning no rows from t1 (lines 7 – 10), Oracle still performed the parallel tablescan of t2 (lines 12 – 15). Thanks to the Bloom filter we don’t transfer 10,000 rows between slave sets, but we can see from the Buffers and Reads columns that we really did do the tablescan – and if we take a snapshot of instance activity we would have seen 10,000 rows fetched by tablescan at that point.

If I ran this query serially Oracle would stop after discovering that the first tablescan returned no rows – why bother scanning the probe table when the hash table is empty ? But because the query is running in parallel, a single slave that receives no data from the first tablescan cannot assume that every other slave in the same set has also received no data – there’s no cross-chat that allows the slaves to discover that every slave has no data – so the second scan goes ahead.

I was a little surprised by this when I first found it since I thought (from looking at some of the 1039x trace information) that the parallel slaves were telling the query coordinator how many rows they had acquired on each granule – which would allow the coordinator to spot the zero total. But it looks as if I was misinterpreting the trace.

On the plus side for this example – it’s probably the case that if zero is a possible volume of data returned by the query then the optimizer will have decided that it was going to get a “small” data set for the build table and therefore do a broadcast distribution – and if that happens the second tablescan won’t occur – as we see below (note the zero Reads, zero Buffers):


------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Starts | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |      1 |      1 |00:00:00.01 |       6 |      2 |
|   1 |  SORT AGGREGATE           |          |      1 |      1 |00:00:00.01 |       6 |      2 |
|   2 |   PX COORDINATOR          |          |      1 |      2 |00:00:00.01 |       6 |      2 |
|   3 |    PX SEND QC (RANDOM)    | :TQ10001 |      0 |      0 |00:00:00.01 |       0 |      0 |
|   4 |     SORT AGGREGATE        |          |      2 |      2 |00:00:00.01 |       0 |      0 |
|*  5 |      HASH JOIN            |          |      2 |      0 |00:00:00.01 |       0 |      0 |
|   6 |       PX RECEIVE          |          |      2 |      0 |00:00:00.01 |       0 |      0 |
|   7 |        PX SEND BROADCAST  | :TQ10000 |      0 |      0 |00:00:00.01 |       0 |      0 |
|   8 |         PX BLOCK ITERATOR |          |      2 |      0 |00:00:00.01 |      54 |     27 |
|*  9 |          TABLE ACCESS FULL| T1       |     27 |      0 |00:00:00.01 |      54 |     27 |
|  10 |       PX BLOCK ITERATOR   |          |      0 |      0 |00:00:00.01 |       0 |      0 |
|* 11 |        TABLE ACCESS FULL  | T2       |      0 |      0 |00:00:00.01 |       0 |      0 |
------------------------------------------------------------------------------------------------

Unfortunately the same type of argument can’t be used to dismiss the minus/intersect cases.

Update June 2019

Searching MoS for an issue relating to “Cursor: Pin S wait X” I discovered Doc ID 21293073.8: Parallel queries hangs on ‘cursor: pin s wait on x’ in 12c. Strangely the summary description of the bug (which is not public) seemed to have nothing to do with mutex pins, instead the description was about how a hash (anti)join with an empty build table would still scan the probe data instead of short-circuiting. According to the note this has been addressed in 18.1 but requests for a backport to 12.1.0.2 or 12.2.0.1 have been rejected as unsafe.

If the fix in 18.1 really is about the empty table detail then it must be related to a more general class of cases than my basic example since, even in 12.2.0.1, I found that the query as it stands did not scan table t2. I had to take an extra step to ensure that t2 would still be scanned with the hash/hash distribution, and that was to disable the “hybrid hash” option, which I did by setting setting parameter “_px_adaptive_dist_method” to off. The hybrid hash (part 1, part 2)counts the incoming rows for the build table and decides how to proceed from there onwards – presumably if Oracle decides to use this strategy there’s some short-circuit code for when the count is zero.

 

 

1 Comment »

  1. […] case where the second table produces no data at all. (For reference you might also want to see an older note on how a parallel hash join doesn’t skip the probe table when the build table is […]

    Pingback by Hash Optimisation- | Oracle Scratchpad — March 4, 2019 @ 9:54 am GMT Mar 4,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.