Oracle Scratchpad

June 6, 2013

Parallel DML

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 8:06 am BST Jun 6,2013

Parallel DML is not enabled by default, and it’s easy to forget this and think you’ve made an operation “as parallel as possible” by declaring objects parallel or putting in hints for parallelism.

A recent question on OTN asked about speeding up a  materialized view refresh that seemed to be spending a lot of its time waiting on “PX Deq Credit: send blkd”. The manuals describe this as an “idle event”; but that’s not always true. The OP had supplied the output from tkprof for one of the sessions showing the “insert as select” that was the (complete) refresh and it was clear that the select was running in parallel, but the insert wasn’t – and that’s one case in which the “PX Deq Credit: send blkd” is arguably an “idle” wait (with a timeout of 2 seconds). It’s possible that the refresh could go faster if the OP enabled parallel DML.

The argument for calling this wait idle (in this case) is that N-1 of the parallel slaves that are trying to feed the query co-ordinator are waiting because the query co-ordinator is soaking up data from the Nth slave as fast as it can – the query co-ordinator can’t go any faster and the slaves are being told to wait until the query co-ordinator is ready for their input. On the other hand, if you don’t need those waits to happen at all you could argue that they aren’t idle because they are affecting the end-user response time. To stop them happening, you can minimise the messages from the PX slaves to the query co-ordinator by making the insert run in parallel; and since it’s important to recognise the difference in plans between a parallel and non-parallel insert I thought I’d give you a little model to test.

create table t1 as select * from all_objects;
create table t2 as select * from t1 where rownum <= 1;

alter table t1 parallel (degree 2);
alter table t2 parallel (degree 2);

explain plan for
insert /*+ append */ into t2 select * from t1;

select * from table(dbms_xplan.display(null,null,'-note -cost -rows -bytes'));

commit;
alter session enable parallel dml;

explain plan for
insert /*+ append */ into t2 select * from t1;

select * from table(dbms_xplan.display(null,null,'-note -cost -rows -bytes'));

This code creates a couple of tables, declared as parallel, then inserts (twice) from one to the other. In the first case only the select can run parallel so all the data from the parallel slaves will be passed to the query co-ordinator to be inserted. In the second case both the select and the insert can take place in parallel, so each slave would build its own data segments, and the only messages sent to the query co-ordinator would be about the list of extents in each “private” segment that have to be merged. Here are the two execution plans:

-----------------------------------------------------------------------
| Id  | Operation             | Name     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------
|   0 | INSERT STATEMENT      |          |        |      |            |
|   1 |  LOAD AS SELECT       | T2       |        |      |            |
|   2 |   PX COORDINATOR      |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T1       |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------

-----------------------------------------------------------------------
| Id  | Operation             | Name     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------
|   0 | INSERT STATEMENT      |          |        |      |            |
|   1 |  PX COORDINATOR       |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT     | T2       |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T1       |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------

Both plans are so simple in shape that you basically read them from the bottom up. The first plan shows the data passing from the PX slaves to the QC (PX coordinator) which does the LOAD AS SELECT. The second plan shows the PX slaves loading as they select (the load is grouped as PCWP – parallel combined with child – with the tablescan), and information is sent to the QC only after the load has completed.

Bottom line: if the PX SEND QC is above (lower line number) the LOAD AS SELECT the slaves are doing the insert, if the LOAD AS SELECT is above the PX SEND QC the query coordinator is doing the insert (and you’ll probably see lots of “PX Deq Credit: send blkd” at that point as the PX slaves wait for the QC to load data into blocks).

6 Comments »

  1. Finally got to this point in my reading list: Thanks for the clear and concise explanation Jonathan.

    Comment by Jared — June 11, 2013 @ 1:55 pm BST Jun 11,2013 | Reply

  2. Thanks. This post helped me explain how to read a Parallel INSERT in an Explain Plan.

    Comment by Hemant K Chitale — September 25, 2013 @ 7:25 am BST Sep 25,2013 | Reply

  3. Hi Jonathan.

    But then again , How do you make a fast refresh work with parallel dml ?
    I had tested it few times without success.. . forcing parallel dml, parallel query, modifying all the related objects with parallel… nothing really worked.
    (11.2.0.3 on exadata x2-2 quarter rack)

    Comment by sagizvi — October 6, 2013 @ 8:48 pm BST Oct 6,2013 | Reply

    • Sagizvi,

      I have to admit that I don’t feel particularly surprised that you’re having trouble getting a FAST refresh running in parallel – the underlying technology rather assumes that you’re trying to bring a small amount of data incrementally from a very large data set into a summary, so I would expect to see a “default” strategy that was all about “small” jobs rather than “big” jobs.

      As a first step I would trace a fast refresh at the right scale (there may be different code paths depending on various aspects of the view definition, and usage) to see which SQL statements might benefit from parallelism, and then try to figure out what generates them and how they could be switch to run parallel (and whether they could be switched to run parallel).

      Note that in the original example the target was a full refresh.

      Comment by Jonathan Lewis — October 7, 2013 @ 9:07 am BST Oct 7,2013 | Reply

  4. Hi Jonathan,

    I have a large amount of data (70 million rows) being inserted into an Index Organized Table. This is done serially and is a huge performance problem.

    Is there any way to parallelize this effort other than a CTAS? Reading the docs it appears I can’t, but hoping you know of some way to.

    Thanks

    Comment by Justin — February 24, 2014 @ 1:24 am BST Feb 24,2014 | Reply

    • Justin,

      I think you meant “can’t” in the sentence about reading the docs – so I’ve edited the sentence.

      The only way I know of to parallelize the load is to partition the IOT. This allows you to run the load with a degree of paralleism that matches the number of partitions. This means the sorting for the index component can go parallel, but the load still is still done with a conventional load which means you get a large volume of undo and redo combined with “writes from cache” rather than direct path writes.

      Of course, this would only be appropriate if (a) you had already licensed the partitioning option and (b) the partitioning approach didn’t result in undesirable side effects on the performance of other parts of your system.

      An alternative strategy which may be more effective in you case is to load through a named pipe – the select and sort could be made parallel and the insert could then run in direct path mode with no sorting. It’s a technique I wrote about some time ago: http://jonathanlewis.wordpress.com/2012/11/27/iot-load/

      Comment by Jonathan Lewis — February 24, 2014 @ 10:02 am BST Feb 24,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,113 other followers