Oracle Scratchpad

July 11, 2013

12c Top N (px)

Filed under: 12c,Oracle,Parallel Execution — Jonathan Lewis @ 9:04 am GMT Jul 11,2013

A comment from Greg Rahn in response to my posting yesterday prompted me to do a quick follow-up (test time ca. 3 minutes, write-up time, ca. 50 minutes – thanks for the temptation, Greg ;). Greg asked if the “Top N” would push down for a parallel query, so all I had to do was re-run my script with a parallel hint in place.  (Such is the joy of constructing test cases – when you get a new idea you may not need to do much work to test it.)

Here’s the resulting execution plan (from explain plan, but v$sql_plan showed the same structure):

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |   100K|    14M|       |   191   (3)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |   100K|    14M|       |   191   (3)| 00:00:01 |  Q1,01 | P->S | QC (ORDER) |
|*  3 |    VIEW                  |          |   100K|    14M|       |   191   (3)| 00:00:01 |  Q1,01 | PCWP |            |
|   4 |     WINDOW SORT          |          |   100K|    11M|    13M|   191   (3)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE          |          |   100K|    11M|       |   188   (2)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANGE      | :TQ10000 |   100K|    11M|       |   188   (2)| 00:00:01 |  Q1,00 | P->P | RANGE      |
|   7 |        PX BLOCK ITERATOR |          |   100K|    11M|       |   188   (2)| 00:00:01 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| T1       |   100K|    11M|       |   188   (2)| 00:00:01 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("from$_subquery$_002"."rowlimit_$$_rownumber">10 AND "from$_subquery$_002"."rowlimit_$$_rank"<=CASE
                 WHEN (10>=0) THEN 10 ELSE 0 END +CEIL("from$_subquery$_002"."rowlimit_$$_total"*1/100))

Two key things to notice – the WINDOW SORT has been pushed down to line 4, which means its results are passed up to line 2 which shows parallel slaves sending the final result to the query coordinator, so the sorting for the TOP N, at least, has been pushed through to the slaves.

The second thing, though, is that the parallel distribution method (line 6) is RANGE, which means that each slave in the second set will receive a subset of the data based on its value for the ordering columns. In my case, thanks to the size of the data and the values for OFFSET and NEXT, this means that only one of the slaves is going to end up with any data that will have to be forwarded to the query co-ordinator. (I ran parallel 4, so each slave will – we hope – get roughly one quarter of the data set, and my query was after only 1 percent of the data.)

To confirm this behaviour I can query v$pq_tqstat after running the query (or, if I had the license, I could use OEM SQL Monitoring for the graphic display). This is the result I got from querying v$pq_tqstat.

DFO_NUMBER      TQ_ID SERVER_TYPE PROCESS   NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY   INSTANCE
---------- ---------- ----------- ------- ---------- ---------- ---------- ---------- ----------- ----------
         1          0 Ranger      QC              12       2126          5          2           0          1
                      Producer    P004         28296    3539339         13          4           0          1
                                  P005         21324    2662501          4          0           0          1
                                  P006         23259    2910843          8          2           0          1
                                  P007         27136    3394190         18          5           0          1
                      Consumer    P000         31336    3917059        639        634           0          1
                                  P001         41399    5177643        637        632           0          1
                                  P002         25117    3141394        637        632           0          1
                                  P003          2148     268651        637        632           0          1

                    1 Producer    P000          1006     124820          0          0           0          1
                                  P001             0         24          0          0           0          1
                                  P002             0         24          0          0           0          1
                                  P003             0         24          0          0           0          1
                      Consumer    QC            1006     124892          4          2           0          1

As you can see in line 13, slave set 1, process p000 produced 1,006 rows which it sent to the query co-ordinator (QC); the other slaves in slave set 1 didn’t supply any data to the QC (it didn’t ask for any because it got the 1 percent it needed from slave p000.

You’ll notice, by the way, that the RANGE distribution required the QC to do a “ranging” exercise before starting the query – and it didn’t do it very well. In line number 3 we see that it selected just 12 rows from the table to decide how to split the range of values for my “order by” clause across the 4 slaves. Then in lines 4 to 7 we see that slaves p004 to p007 each did a fairly reasonably job of scanning a quarter of the table (to produce ca. 250,000 rows each), but lines 8 to 11 show that the distribution of data from then on was pretty uneven – one slave got virtually no data, another got more than 40%; that’s the effect of the error in the ranging exercise.

At this point, each slave (as a consumer) has to sort ALL the data it has received before it becomes a producer and can forward data to the QC. The QC knows that the data is sorted by the slave set, so will request slave p000 to send its data first – and in my test it needs only the first 1,006 rows (of 31,336) to be sent before the required result set is complete and it can shut down all the slaves.

One little detail that isn’t really visible in this example is that the QC and the slaves co-operate over the offset. When I changed the code to “offset 33000 next 3 percent with ties”, my query against v$pq_tqstat showed that slave p000 didn’t forward ANY data, and slave p001 forwarded 3,004 rows. This tells me that the QC knows how many rows each slave has, and works out how to tell the relevant slave how many to offset from its starting point before forwarding rows. With my second example the QC would have said: “p000 I don’t need your data, p001 set your offset to 33,000 – 31,336 next and start sending”.

Conclusion: During parallel execution of TOP N, the optimizer can push both the sorting, and the offset/next requirements to the slaves to minimise the work done.

Footnote: I have just published an example of the code I use to produce the output from v$pq_tqstat in my script catalogue. (There are also a few notes the view it on my old website)

1 Comment »

  1. Reblogged this on lava kafle kathmandu nepal <a href="https://plus.google.com/102726194262702292606&quot; rel="publisher">Google+</a>.

    Comment by lkafle — July 11, 2013 @ 9:17 am GMT Jul 11,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,422 other followers