Oracle Scratchpad

June 6, 2013

Parallel DML

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 8:06 am BST Jun 6,2013

Parallel DML is not enabled by default, and it’s easy to forget this and think you’ve made an operation “as parallel as possible” by declaring objects parallel or putting in hints for parallelism.

A recent question on OTN asked about speeding up a  materialized view refresh that seemed to be spending a lot of its time waiting on “PX Deq Credit: send blkd”. The manuals describe this as an “idle event”, but that’s not always an accurate description.

The OP had supplied the output from tkprof for one of the sessions showing the “insert as select” that was the (complete) refresh, and it was clear that the select was running in parallel while the insert wasn’t – and that’s one case in which the “PX Deq Credit: send blkd” is arguably NOT an “idle” wait (with a timeout of 2 seconds). It’s possible (probable, even) that the refresh could go faster if the OP enabled parallel DML.

The argument for calling this wait idle (in this case) is that N-1 of the parallel slaves feeding the query co-ordinator are waiting because the query co-ordinator is soaking up data from the Nth slave as fast as it can – the query co-ordinator can’t go any faster and the slaves are being told to wait until the query co-ordinator is ready for their input.

On the other hand, if you don’t need those waits to happen at all you could argue that they aren’t idle because they are affecting the end-user response time (i.e. time for the refresh to complete). To stop them happening you can minimise the messages from the PX slaves to the query co-ordinator by making the insert run in parallel; and since it’s important to recognise the difference in plans between a parallel and non-parallel insert I thought I’d give you a little model to test.

create table t1 as select * from all_objects;
create table t2 as select * from t1 where rownum <= 1;

alter table t1 parallel (degree 2);
alter table t2 parallel (degree 2);

explain plan for
insert /*+ append */ into t2 select * from t1;

select * from table(dbms_xplan.display(null,null,'-note -cost -rows -bytes'));

alter session enable parallel dml;

explain plan for
insert /*+ append */ into t2 select * from t1;

select * from table(dbms_xplan.display(null,null,'-note -cost -rows -bytes'));

This code creates a couple of tables, declared as parallel, then inserts (twice) from one to the other. In the first case only the select can run parallel so all the data from the parallel slaves will be passed to the query co-ordinator to be inserted. In the second case both the select and the insert can take place in parallel, so each slave would build its own data segments, and the only messages sent to the query co-ordinator would be about the list of extents in each “private” segment that have to be merged. Here are the two execution plans:

| Id  | Operation             | Name     |    TQ  |IN-OUT| PQ Distrib |
|   0 | INSERT STATEMENT      |          |        |      |            |
|   1 |  LOAD AS SELECT       | T2       |        |      |            |
|   2 |   PX COORDINATOR      |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T1       |  Q1,00 | PCWP |            |

| Id  | Operation             | Name     |    TQ  |IN-OUT| PQ Distrib |
|   0 | INSERT STATEMENT      |          |        |      |            |
|   1 |  PX COORDINATOR       |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT     | T2       |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          |  Q1,00 | PCWC |            |
|   5 |      TABLE ACCESS FULL| T1       |  Q1,00 | PCWP |            |

Both plans are so simple in shape that you basically read them from the bottom up. The first plan shows the data passing from the PX slaves to the QC (PX coordinator) which does the LOAD AS SELECT. The second plan shows the PX slaves loading as they select (the load is grouped as PCWP – parallel combined with child – with the tablescan), and information is sent to the QC only after the load has completed.

Bottom line: if the PX SEND QC is above (lower line number) the LOAD AS SELECT the slaves are doing the insert, if the LOAD AS SELECT is above the PX SEND QC the query coordinator is doing the insert (and you’ll probably see lots of “PX Deq Credit: send blkd” at that point as the PX slaves wait for the QC to load data into blocks).


  1. Finally got to this point in my reading list: Thanks for the clear and concise explanation Jonathan.

    Comment by Jared — June 11, 2013 @ 1:55 pm BST Jun 11,2013 | Reply

  2. Thanks. This post helped me explain how to read a Parallel INSERT in an Explain Plan.

    Comment by Hemant K Chitale — September 25, 2013 @ 7:25 am BST Sep 25,2013 | Reply

  3. Hi Jonathan.

    But then again , How do you make a fast refresh work with parallel dml ?
    I had tested it few times without success.. . forcing parallel dml, parallel query, modifying all the related objects with parallel… nothing really worked.
    ( on exadata x2-2 quarter rack)

    Comment by sagizvi — October 6, 2013 @ 8:48 pm BST Oct 6,2013 | Reply

    • Sagizvi,

      I have to admit that I don’t feel particularly surprised that you’re having trouble getting a FAST refresh running in parallel – the underlying technology rather assumes that you’re trying to bring a small amount of data incrementally from a very large data set into a summary, so I would expect to see a “default” strategy that was all about “small” jobs rather than “big” jobs.

      As a first step I would trace a fast refresh at the right scale (there may be different code paths depending on various aspects of the view definition, and usage) to see which SQL statements might benefit from parallelism, and then try to figure out what generates them and how they could be switch to run parallel (and whether they could be switched to run parallel).

      Note that in the original example the target was a full refresh.

      Comment by Jonathan Lewis — October 7, 2013 @ 9:07 am BST Oct 7,2013 | Reply

  4. Hi Jonathan,

    I have a large amount of data (70 million rows) being inserted into an Index Organized Table. This is done serially and is a huge performance problem.

    Is there any way to parallelize this effort other than a CTAS? Reading the docs it appears I can’t, but hoping you know of some way to.


    Comment by Justin — February 24, 2014 @ 1:24 am GMT Feb 24,2014 | Reply

    • Justin,

      I think you meant “can’t” in the sentence about reading the docs – so I’ve edited the sentence.

      The only way I know of to parallelize the load is to partition the IOT. This allows you to run the load with a degree of paralleism that matches the number of partitions. This means the sorting for the index component can go parallel, but the load still is still done with a conventional load which means you get a large volume of undo and redo combined with “writes from cache” rather than direct path writes.

      Of course, this would only be appropriate if (a) you had already licensed the partitioning option and (b) the partitioning approach didn’t result in undesirable side effects on the performance of other parts of your system.

      An alternative strategy which may be more effective in you case is to load through a named pipe – the select and sort could be made parallel and the insert could then run in direct path mode with no sorting. It’s a technique I wrote about some time ago:

      Comment by Jonathan Lewis — February 24, 2014 @ 10:02 am GMT Feb 24,2014 | Reply

  5. Thanks Jonathan, I was able to understand from this post that a current issue in my DB is not running parallel though app team used parallel hint.

    I need your advice on my below issue, I could see the query entirely spending its time on CPU, so I verified the amount logical reads for the query and they are huge, so I believe the query might be doing a conventional load instead of a direct path load, if I am correct, may I know why even after using Append hint the query seems to have gone for conventional load.

           Elapsed                  Elapsed Time
            Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
    ---------------- -------------- ------------- ------ ------ ------ -------------
            29,990.3            465         64.50   88.8   84.7   14.2 gf2wh54z7v2wq
    Module: SQL*Plus
    Load Profile              Per Second    Per Transaction   Per Exec   Per Call
    ~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
          DB Time(s):                4.7               23.8       0.00       2.13
           DB CPU(s):                4.0               20.4       0.00       1.83
           Redo size:        3,691,827.5       18,805,623.2
       Logical reads:           39,640.7          201,924.0
    Explain plan from the SQLID is below
    select * from table(
    	dbms_xplan.display_cursor('gf2wh54z7v2wq',1,'outline peeked_binds')
    | Id  | Operation                | Name | Cost  |
    |   0 | INSERT STATEMENT         |      |     1 |
    |   1 |  LOAD TABLE CONVENTIONAL |      |       |
       - cpu costing is off (consider enabling it)

    Comment by RakeshDBA — March 12, 2015 @ 3:25 pm GMT Mar 12,2015 | Reply

    • The plan suggests you have an “insert values()” clause, not insert as select.
      You need to read the documentation on the APPEND hint and the APPEND_VALUES hint and their relevance (or not) to inserting values.

      Update: (Jan 2016) – I’ve just come back to this blog note and re-read your posting (after formatting it properly). I didn’t realise originally that you must have deleted loads of lines from your plan output. There are various reasons why the /*+ append */ hint has to be ignored – for example, if there’s a trigger on the table, or if the table is at the child to a foreign key constraint.

      Comment by Jonathan Lewis — March 24, 2015 @ 7:35 am GMT Mar 24,2015 | Reply

  6. […] is DDL. Parallel DDL is automatically enabled, parallel DML has to be enabled explicitly otherwise the select will run in parallel but the insert will be serialized. Look at operations 1 – 4 of the insert – the query co-ordinator does the “load […]

    Pingback by Parallel DML | Oracle Scratchpad — February 5, 2016 @ 1:02 pm GMT Feb 5,2016 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: