Oracle Scratchpad

July 15, 2015

PQ Index anomaly

Filed under: Indexing,Oracle,Parallel Execution — Jonathan Lewis @ 8:42 am BST Jul 15,2015

Here’s an oddity prompted by a question that appeared on Oracle-L last night. The question was basically – “Why can’t I build an index in parallel when it’s single column with most of the rows set to null and only a couple of values for the non-null entries”.

That’s an interesting question, since the description of the index shouldn’t produce any reason for anything to go wrong, so I spent a few minutes on trying to emulate the problem. I created a table with 10M rows and a column that was 3% ‘Y’ and 0.1% ‘N’, then created and dropped an index in parallel in parallel a few times. The report I used to prove that the index build had run  parallel build showed an interesting waste of resources. Here’s the code to build the table and index:

create table t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
                when mod(rownum,100) < 3 then 'Y'
                when mod(rownum,1000) = 7 then 'N'
        end                     flag,
        rownum                  id,
        rpad('x',30)            padding
        generator       v1,
        generator       v2
        rownum <= 1e7

-- gather stats here

explain plan for
create index t1_i1 on t1(flag) parallel 4 nologging

select * from table(dbms_xplan.display);

create index t1_i1 on t1(flag) parallel 4 nologging;

select index_name, degree, leaf_blocks, num_rows from user_indexes;
alter index t1_i1 noparallel;

As you can see, I’ve used explain plan to get Oracle’s prediction of the cost and size, then I’ve created the index, then checked its size (and set it back to serial from its parallel setting). Here are the results of the various queries (from – it’s interesting to note that Oracle thinks there will be 10M index entries when we know that “completely null entries don’t go into the index”:

| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
|   0 | CREATE INDEX STATEMENT   |          |    10M|    19M|  3073   (3)| 00:00:16 |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (ORDER)     | :TQ10001 |    10M|    19M|            |          |  Q1,01 | P->S | QC (ORDER) |
|   3 |    INDEX BUILD NON UNIQUE| T1_I1    |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     SORT CREATE INDEX    |          |    10M|    19M|            |          |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE          |          |    10M|    19M|  2158   (4)| 00:00:11 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANGE      | :TQ10000 |    10M|    19M|  2158   (4)| 00:00:11 |  Q1,00 | P->P | RANGE      |
|   7 |        PX BLOCK ITERATOR |          |    10M|    19M|  2158   (4)| 00:00:11 |  Q1,00 | PCWC |            |
|   8 |         TABLE ACCESS FULL| T1       |    10M|    19M|  2158   (4)| 00:00:11 |  Q1,00 | PCWP |            |

   - estimated index size: 243M bytes

INDEX_NAME           DEGREE                                   LEAF_BLOCKS   NUM_ROWS
-------------------- ---------------------------------------- ----------- ----------
T1_I1                4                                                562     310000

Although the plan says it’s going to run parallel, and even though the index says it’s a parallel index, we don’t have to believe that the creation ran as a parallel task – so let’s check v$pq_tqstat, the “parallel query table queue” statistics – and this is the result I got:

---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Ranger                 1 QC                      12        528          4          0           0
                      Producer               1 P004               2786931   39161903          9          1           0
                                             1 P005               2422798   34045157         11          1           0
                                             1 P006               2359251   33152158         12          1           0
                                             1 P007               2431032   34160854         14          2           0
                      Consumer               1 P000               3153167   44520722          3          0           0
                                             1 P001               1364146   19126604          4          1           0
                                             1 P002               2000281   28045742          3          0           0
                                             1 P003               3482406   48826476          3          0           0

                    1 Producer               1 P000                     1        298          0          0           0
                                             1 P001                     1        298          0          0           0
                                             1 P002                     1        298          0          0           0
                                             1 P003                     1         48          0          0           0
                      Consumer               1 QC                       4       1192          2          0           0

Check the num_rows column – the first set of slaves distributed 10M rows and roughly 140MB of data to the second set of slaves – and we know that most of those rows will hold (null, rowid) which are not going to go into the index. 97% of the data that went through the message queues would have been thrown away by the second set of slaves, and “should” have been discarded by the first set of slaves.

As for the original question about the index not being built in parallel – maybe it was, but not very parallel. You’ll notice that the parallel distribution at operation 6 in the plan is “RANGE”. If 97% of your data is null and only 3% of your data is going to end up in the index then you’d need to run at higher than parallel 33 to see any long lasting executions – because at parallel 33 just one slave in the second set will get all the real data and do all the work of sorting and building the index while the other slaves will (or ought to) be just throwing their data away as it arrives. When you’ve got 500M rows with only 17M non-null entries (as the OP had) to deal with, maybe the only thing happening by the time you get to look might be the one slave that’s building a 17M row index.

Of course, one of the reasons I wanted to look at the row distribution in v$pq_tqstat was that I wanted to check whether I was going to see all the data going to one slave, or a spread across 2 slaves (Noes to the left, Ayes to the right – as they used to say in the UK House of Commons), or whether Oracle had been very clever and decided to distribute the rows by key value combined with rowid to get a nearly even spread. I’ll have to set up a different test case to check whether that last option is possible.


There was another little oddity that might be a simpler explanation of why the OP’s index creation might actually have run serially. I dropped and recreated the index in my test case several times and at one point I noticed (from view v$pq_slave) that I had 16 slave processes live (though, at that point, IDLE). Since I was the only user of the instance my session should probably have been re-using the same set of slaves each time I ran the test; instead, at some point, one of my test runs had started up a new set of slaves. Possibly something similar had happened to the OP, and over the course of building several indexes one after the other his session had reached the stage where it tried to start “yet another” set of slaves, failed, and decided to run serially rather than reuse any of the slaves that were nominally available and IDLE.


It gets worse. I decided to query v$px_sesstat (joined to v$statname) while the query was running, and caught some statistics just before the build completed. Here are a few critical numbers taken from the 4 sessions that received the 10M rows and built the final index:

Coord   Grp Deg    Set  Sno   SID
264/1     1 4/4      1    1   265
            physical writes direct                            558
            sorts (memory)                                      1
            sorts (rows)                                2,541,146

264/1     1 4/4      1    2    30
            sorts (memory)                                      1
            sorts (rows)                                2,218,809

264/1     1 4/4      1    3    35
            physical writes direct                          7,110
            physical writes direct temporary tablespace     7,110
            sorts (disk)                                        1
            sorts (rows)                                2,886,184

264/1     1 4/4      1    4   270
            sorts (memory)                                      1
            sorts (rows)                                2,353,861

Not only did Oracle pass 10M rows from one slave set to the other, the receiving slave set sorted those rows before discarding them. One of the slaves even ran short of memory and spilled its sort to disc to do the sort. And we can see (physical writes direct = 558) that one slave set was responsible for handling all the “real” data for that index.


Update 2

A couple of follow-ups on the thread have introduced some other material that’s worth reading.  An item from Mohamed Houri about what happens when a parallel slave is still assigned to an executing statement but isn’t given any work to do for a long time; and an item from Stefan Koehler about _px_trace and tracking down why the degree of parallelism of a statement was downgraded.

1 Comment »

  1. I, too, have suspected a number of times that IDLE slaves are not being used and a session is spawning (or attempting to spawn till parallel_max_servers is hit) more slaves. PARALLEL_MIN_SERVERS is of no help here. Is there any “knob” to force reuse of IDLE slaves or is some of the “auto” intelligence that decides when an IDLE slave is re-used ?

    Comment by Hemant K Chitale — July 16, 2015 @ 4:08 am BST Jul 16,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at