Oracle Scratchpad

January 3, 2010

Pseudo-parallel

Filed under: Infrastructure,Parallel Execution — Jonathan Lewis @ 11:32 am GMT Jan 3,2010

There’s a posting on the OTN database forum asking for suggestions about restricting a query to data from one datafile of a tablespace.  It seems to be a slightly odd thing to do, but it reminded me of a “pseudo-parallel” trick I once used (and described) more than 10 years ago on a system running Oracle version 7. It was a long time ago, but I think I needed to do this because I wanted to execute a massive update with as many parallel threads as possible in the days when parallel DML wasn’t possible.

The basic trick is simple – you need only remember that a single extent resides within a single file, then you can drive a loop through dba_extents picking all the extents for one file at a time. The code is made a little easier in newer versions of Oracle than it was in the original v7 thanks to the dbms_rowid() package, and the introduction of subquery-factoring.

As a simple demonstration, here’s a little script to build a data set (3 rows per block in an 8KB block size when using freelist management rather than ASSM), running to a couple of dozen extents, then querying the data for a single extent.

create table t1
pctfree 95
pctused  5
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum <= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

column data_object_id new_value m_object

select
	data_object_id
from 	user_objects
where	object_name = 'T1'
;

select
	extent_id, file_id, block_id, blocks
from
	dba_extents
where
	owner = 'TEST_USER'
and	segment_name = 'T1'
and	segment_type = 'TABLE'
order by
	extent_id
;

set serveroutput off
set linesize 180

with rowid_range as (
	select
		/*+ materialize */
		dbms_rowid.rowid_create(
			1,
			&m_object,
			file_id,
			block_id,0
		)	low_rowid,
		dbms_rowid.rowid_create(
			1,
			&m_object,
			file_id,
			block_id+blocks-1,
			4095
		)	high_rowid
	from
		dba_extents
	where
		owner		= 'TEST_USER'
	and	segment_name	= 'T1'
	and	segment_type	= 'TABLE'
	and	extent_id	= 8
)
select
	/*+
		gather_plan_statistics
		ordered
		use_nl(t1)
		rowid(t1)
	*/
	t1.rowid,
	t1.small_vc
from
	rowid_range rr,
	t1
where
	t1.rowid between rr.low_rowid and rr.high_rowid
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

The final lines of the execution plan (10.2.0.3) as pulled from memory on my first test run were as follows (the first 140+ were from the horrendous query against dba_extents):

---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |
---------------------------------------------------------------------------------------------------------------------------------------------
| 143 |   NESTED LOOPS                      |                           |      1 |     50 |    384 |00:00:00.03 |     150 |     63 |      0 |
| 144 |    VIEW                             |                           |      1 |      2 |      1 |00:00:00.03 |       6 |      1 |      0 |
| 145 |     TABLE ACCESS FULL               | SYS_TEMP_0FD9D660C_88CA86 |      1 |      2 |      1 |00:00:00.03 |       6 |      1 |      0 |
|*146 |    TABLE ACCESS BY ROWID RANGE      | T1                        |      1 |     25 |    384 |00:00:00.01 |     144 |     62 |      0 |
---------------------------------------------------------------------------------------------------------------------------------------------

A few points to note:
The critical line of the execution plan is the “table access by rowid range” – this is exactly what we want to see. Further tests with event 10046 enabled at level 8 (and a preceding “flush buffer_cache”) showed that Oracle really was reading only the single extent specified, and was doing it using multi-block reads – here’s a bit of the tidied up trace from a re-run of the query, preceded by a shortened list of extents (check extent 8 in the query output and the block ranges in the trace):


 EXTENT_ID    FILE_ID   BLOCK_ID     BLOCKS
---------- ---------- ---------- ----------
         0          5        265        128
         1          5       3849        128
         2          5       3977        128
         3          5       4105        128
         4          5       4233        128
         5          5       4489        128
         6          5       4617        128
         7          5       4745        128
         8          5       4873        128
         9          5       5001        128
...
        22          5       6665        128
        23          5       6793        128
        24          5       6921        128
        25          5       7049        128
        26          5       7177        128

27 rows selected.

WAIT #48: nam='db file scattered read' ela= 48127 file#=5 block#=4873 blocks=57 obj#=13744 tim=141509371101
FETCH #48:c=0,e=94556,p=58,cr=9,cu=1,mis=0,r=1,dep=0,og=1,tim=141509375781
FETCH #48:c=0,e=286,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509392875
FETCH #48:c=0,e=630,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509488422
FETCH #48:c=0,e=211,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509538318
FETCH #48:c=0,e=182,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509575541
FETCH #48:c=0,e=186,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509607322
FETCH #48:c=0,e=193,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509639153
WAIT #48: nam='db file scattered read' ela= 68589 file#=5 block#=4930 blocks=57 obj#=13744 tim=141509740153
FETCH #48:c=15625,e=70478,p=57,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509741827
FETCH #48:c=0,e=236,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509773991
FETCH #48:c=0,e=125,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509806662
FETCH #48:c=0,e=167,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509842109
FETCH #48:c=0,e=174,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509874456
FETCH #48:c=0,e=156,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509908995
FETCH #48:c=0,e=158,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509937588
WAIT #48: nam='db file scattered read' ela= 18046 file#=5 block#=4987 blocks=14 obj#=13744 tim=141509985862
FETCH #48:c=0,e=18687,p=14,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509986305
FETCH #48:c=0,e=175,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141510017060
FETCH #48:c=0,e=150,p=0,cr=5,cu=0,mis=0,r=8,dep=0,og=1,tim=141510050453
*** 2010-01-03 10:51:19.656

One important detail, by the way – it may not be extremely obvious in the documentation but the “object id” referenced in the call to dbms_rowid() should be the data_object_id from dba_objects, not the object_id. You might also note that I’ve inserted 4095 as the “highest row in block” value to calculate the high rowid. (There’s an interesting consequence of this hard limit that could cause surprises in data warehouses and other system (e.g. Peoplesoft) that use 32KB block sizes during certain kinds of ETL processes).

4 Comments »

  1. Is there a convenient way around using DBA_EXTENTS? I have a few scripts that, on paper, are simply lovely assistants for the busy DBA; in reality, they stink like Grimsby docks. I am thinking of a “not quite up-to-the-minute” pseudo DBA_EXTENTS-like idea. Have you any idea if such a beast could be created, without whacking the DB?

    Comment by Nigel — January 6, 2010 @ 4:01 pm GMT Jan 6,2010 | Reply

    • Nigel,
      There was some discussion about damage limitarion for queries relating to the information in dba_extents in a posting I did some time ago.

      Ideally I think you’d want a trigger on seg$ which is kept up to date with the current number of extents in a segment – but you can’t create triggers on SYS objects. So maybe you could do something with a log miner session that kept looking for changes to seg$ instead. Access to dba_extents by way of a segment ID is (I think) one of the efficient access paths, so if you saw a segment change you may than have the option for keeping your flat copy in synch fairly efficiently.

      Investigation (or alternative ideas) left as exercise for interested readers.

      Comment by Jonathan Lewis — January 7, 2010 @ 7:44 am GMT Jan 7,2010 | Reply

  2. Hello, Jonathan,

    why do you use explicit subquery factoring materialization in this case? I see no obvious reasons for doing this in your case; actually, IMO subquery factoring should be avoided in production-like code (if possible).

    It’s also good to note the 11gR2 out-of-the-box new feature for tasks parallelization – DBMS_PARALLEL_EXECUTE. I haven’t used it yet, but it looks nice and neat.

    Comment by Timur Akhmadeev — January 8, 2010 @ 8:03 pm GMT Jan 8,2010 | Reply

  3. [...] 12-How to force Oracle to scan only single datafile for manual parallelization ? Jonathan Lewis-Pseudo Parallel [...]

    Pingback by Blogroll Report 01/01/2009 – 08/01/2010 « Coskan’s Approach to Oracle — January 20, 2010 @ 3:46 pm GMT Jan 20,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers