Oracle Scratchpad

May 29, 2007

Autoallocate and PX

Filed under: Infrastructure,Oracle,Parallel Execution,Performance,Troubleshooting — Jonathan Lewis @ 9:22 pm BST May 29,2007

Here’s a quirky little set of results from trace files. First a 10391 at level 64 – this shows the way in which an object was broken into granules for a parallel tablescan. I’ve just listed the first few granules:

        Distributors associated to that generator:
            dist:0 nbgra:52 nid:65535 nbslv:4 size:5131
            filedist:0 filno:0 nbgra:0
            filedist:1 filno:6 nbgra:52
               gnum:0    sz:99 pnum:1048576 rid:(file:6 blk:12..130)
               gnum:1    sz:99 pnum:1048576 rid:(file:6 blk:131..235)
               gnum:2    sz:99 pnum:1048576 rid:(file:6 blk:236..341)
               gnum:3    sz:99 pnum:1048576 rid:(file:6 blk:342..446)

Now, with just a little cosmetic enhancement, the 10046 trace at level 8 of the parallel server process that was given the first granule to process. You can tell from the parameter naming that this is a 10g trace file. Note especially the order in which the blocks are visited.

WAIT #1: nam='PX Deq: Execution Msg' ela= 24415 sleeptime/senderid=268566527 passes=1 p3=0 obj#=-1 tim=1730298979
WAIT #1: nam='direct path read' ela=  18913 file number=6 first dba= 12 block cnt= 5 obj#=59196 tim=1730318924
WAIT #1: nam='direct path read' ela=   5431 file number=6 first dba= 37 block cnt= 4 obj#=59196 tim=1730324753
WAIT #1: nam='direct path read' ela=   5017 file number=6 first dba= 49 block cnt= 8 obj#=59196 tim=1730330528
WAIT #1: nam='direct path read' ela=   6651 file number=6 first dba= 82 block cnt= 7 obj#=59196 tim=1730338312
WAIT #1: nam='direct path read' ela=   5033 file number=6 first dba= 97 block cnt= 8 obj#=59196 tim=1730343998
WAIT #1: nam='direct path read' ela=   2711 file number=6 first dba=114 block cnt= 7 obj#=59196 tim=1730348009
WAIT #1: nam='direct path read' ela=   8746 file number=6 first dba=129 block cnt= 2 obj#=59196 tim=1730357846
WAIT #1: nam='direct path read' ela=  66938 file number=6 first dba= 69 block cnt=12 obj#=59196 tim=1730425271
WAIT #1: nam='direct path read' ela=  80266 file number=6 first dba= 90 block cnt= 7 obj#=59196 tim=1730506969
WAIT #1: nam='direct path read' ela=   4849 file number=6 first dba=105 block cnt= 8 obj#=59196 tim=1730512609
WAIT #1: nam='direct path read' ela=   2831 file number=6 first dba=122 block cnt= 7 obj#=59196 tim=1730516537
WAIT #1: nam='direct path read' ela= 150077 file number=6 first dba= 29 block cnt= 4 obj#=59196 tim=1730667253
WAIT #1: nam='direct path read' ela=  47274 file number=6 first dba= 41 block cnt= 8 obj#=59196 tim=1730715604
WAIT #1: nam='direct path read' ela=      6 file number=6 first dba= 21 block cnt= 4 obj#=59196 tim=1730716822
WAIT #1: nam='direct path read' ela=  42396 file number=6 first dba= 57 block cnt= 8 obj#=59196 tim=1730759816
WAIT #1: nam='PX Deq: Execution Msg' ela= 103 sleeptime/senderid=268566527 passes=1 p3=0 obj#=59196 tim=1730760814

As you can see, this stream of waits corresponds to a PX slave waiting to be told to process the first granule (PX Deq: Execution Msg), and ends with the slave waiting to be told which granule to process next.

But why are the direct path reads so small and in such a funny order? After all, the slave was given a stream of 99 consecutive blocks – and even that’s a little odd because if you count from block 12 to block 130 you get rather more than 99 blocks. Moreover, there seem to be some blocks that didn’t get read.

The answer lies in the title to this piece. I have created the table in a tablespace with system managed extents (the auto allocation type for locally managed tablespaces). In a “clean” datafile, the effect of this is that if you create a table and start to insert data into it, the first 16 extents will be allocated at 64KB, the next 63 at 1MB, followed by 120 at 8MB, before Oracle finally switches to 64MB extents.  (Current observations, not documented by Oracle).

However, if you run a parallel “insert /*+ append */”, every parallel execution slave thinks it is populating its own private data segment, so each slave will generate 16 extents of 64KB, then 63 extents of 1MB, and so on. Moreover, if you use 3 separate “insert  /*+ append */” statements to load the table, each statement starts the cycle all over again.

On top of that, when you use autoallocate, the last stage in the insertion process does extent trimming to release the trailing free space from the last extent allocated by each slave.

Oracle tries very hard to avoid problems with this extent trimming – it’s done to the nearest “unit size”, i.e. to the next 64KB, 1MB, or 8MB boundary above the actual limit of the data. This has the effect of avoiding randomly sized holes all over the tablespace but it will still leave several holes of varying sizes.

Now assume you have a very large tablespace, are running with 32 CPUs and default parallelism, frequently drop, re-create, and populate objects, even dropping and recreating several objects concurrently. If you do this you will probably find that you gradually accumulate lots of little holes of all sorts of sizes all over the tablespace.

Oracle does try to use up these holes as fast as possible – but only by allocating extents of the legal unit sizes, starting from the start of the file and working along it in order. So if you have lots of little holes scattered through the tablespace, and lots of parallel slaves busily soaking them up, you end up with situations like the above: 108 consecutive blocks in one file consisting of 14 chunks of 8 blocks allocated by 14 different parallel execution slaves, and read in order of extent_id when a parallel tablescan starts.

Mixing very large autoallocate tablespaces with repeated parallel create/insert at high degrees of parallelism can result in objects acquiring a very large number of small extents – which can have a surprising impact on subsequent parallel queries.

Update August 2009:

Christian Antognini reports an enhancement arriving (silently) in 11.1.0.7 that addresses this issue.

5 Comments »

  1. Fantastic,
    This clearly explains some of the issues I’ve been seeing and have been experimenting over the last week.
    Works done now…. Thanks Jonathan.

    Comment by Ted — May 30, 2007 @ 6:20 pm BST May 30,2007 | Reply

  2. […] type of operation then these are just some of the questions you need to answer. (See, for example, a note I wrote three years ago about some of the anomalies of I/O sizes when running parallel query, and a related enhancement in […]

    Pingback by Fragmentation 1 « Oracle Scratchpad — July 13, 2010 @ 8:34 pm BST Jul 13,2010 | Reply

  3. […] 假如说,每个区间都只能是64K,这会限制我们将发起的“db file multiblock read”请求的大小吗或者这些请求可以跨越区间边界读取吗?如果这个表空间是有两个(或多个)数据文件组成,而这些区间又是以“轮流”在两个文件之间分配的,这会影响读操作的方式吗?如果我们尝试进行并行表扫描,这些限制在“direct path read”上会不会有所不同呢?如果你的运行系统是一个数据仓库,需要花费大量的时间运行这种操作,那么这些就是你需要回答的问题.(例如,参见我3年前写的关于运行并行查询时的部分IO异常的记录,以及Christian Antognini在大约几年后描述的Oracle 11g中的一个相关改进.) […]

    Pingback by 碎片(Fragmentation)–介绍 « a db thinker's home — July 27, 2010 @ 7:21 am BST Jul 27,2010 | Reply

  4. Jonathon,

    That ‘s solve our problem. we have a huge data warehouse with 32K block size and with quite a good IO system with oracle 11r2. but we never manage to get db_file_multiblock_read_count=128 and we have alot of holes in tablespace also. Now we are setting initial=8M but not able to decide Next value. Also a value of db_file_multiblock_read_count=256 is under discussion. our system admin says that we can change this value by setting the maxphy tunable parameter of solaris.

    Comment by Amir Riaz — October 24, 2010 @ 9:01 am BST Oct 24,2010 | Reply

  5. […] few years ago (2007) I wrote about a problem that could appear when you mixed parallel execution with system managed extent allocation. A couple of years later I added a note that Christian Antognini had observed a patch in 11.1.0.7 […]

    Pingback by PX and system allocation « Oracle Scratchpad — June 14, 2012 @ 5:01 pm BST Jun 14,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.