There’s a posting on the OTN database forum asking for suggestions about restricting a query to data from one datafile of a tablespace. It seems to be a slightly odd thing to do, but it reminded me of a “pseudo-parallel” trick I once used (and described) more than 10 years ago on a system running Oracle version 7. It was a long time ago, but I think I needed to do this because I wanted to execute a massive update with as many parallel threads as possible in the days when parallel DML wasn’t possible.
The basic trick is simple – you need only remember that a single extent resides within a single file, then you can drive a loop through dba_extents picking all the extents for one file at a time. The code is made a little easier in newer versions of Oracle than it was in the original v7 thanks to the dbms_rowid() package, and the introduction of subquery-factoring.
As a simple demonstration, here’s a little script to build a data set (3 rows per block in an 8KB block size when using freelist management rather than ASSM), running to a couple of dozen extents, then querying the data for a single extent.
create table t1
pctfree 95
pctused 5
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
rownum <= 10000
)
select
rownum id,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 10000
;
column data_object_id new_value m_object
select
data_object_id
from user_objects
where object_name = 'T1'
;
select
extent_id, file_id, block_id, blocks
from
dba_extents
where
owner = 'TEST_USER'
and segment_name = 'T1'
and segment_type = 'TABLE'
order by
extent_id
;
set serveroutput off
set linesize 180
with rowid_range as (
select
/*+ materialize */
dbms_rowid.rowid_create(
1,
&m_object,
file_id,
block_id,0
) low_rowid,
dbms_rowid.rowid_create(
1,
&m_object,
file_id,
block_id+blocks-1,
4095
) high_rowid
from
dba_extents
where
owner = 'TEST_USER'
and segment_name = 'T1'
and segment_type = 'TABLE'
and extent_id = 8
)
select
/*+
gather_plan_statistics
ordered
use_nl(t1)
rowid(t1)
*/
t1.rowid,
t1.small_vc
from
rowid_range rr,
t1
where
t1.rowid between rr.low_rowid and rr.high_rowid
;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
The final lines of the execution plan (10.2.0.3) as pulled from memory on my first test run were as follows (the first 140+ were from the horrendous query against dba_extents):
--------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | --------------------------------------------------------------------------------------------------------------------------------------------- | 143 | NESTED LOOPS | | 1 | 50 | 384 |00:00:00.03 | 150 | 63 | 0 | | 144 | VIEW | | 1 | 2 | 1 |00:00:00.03 | 6 | 1 | 0 | | 145 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660C_88CA86 | 1 | 2 | 1 |00:00:00.03 | 6 | 1 | 0 | |*146 | TABLE ACCESS BY ROWID RANGE | T1 | 1 | 25 | 384 |00:00:00.01 | 144 | 62 | 0 | ---------------------------------------------------------------------------------------------------------------------------------------------
A few points to note:
The critical line of the execution plan is the “table access by rowid range” – this is exactly what we want to see. Further tests with event 10046 enabled at level 8 (and a preceding “flush buffer_cache”) showed that Oracle really was reading only the single extent specified, and was doing it using multi-block reads – here’s a bit of the tidied up trace from a re-run of the query, preceded by a shortened list of extents (check extent 8 in the query output and the block ranges in the trace):
EXTENT_ID FILE_ID BLOCK_ID BLOCKS
---------- ---------- ---------- ----------
0 5 265 128
1 5 3849 128
2 5 3977 128
3 5 4105 128
4 5 4233 128
5 5 4489 128
6 5 4617 128
7 5 4745 128
8 5 4873 128
9 5 5001 128
...
22 5 6665 128
23 5 6793 128
24 5 6921 128
25 5 7049 128
26 5 7177 128
27 rows selected.
WAIT #48: nam='db file scattered read' ela= 48127 file#=5 block#=4873 blocks=57 obj#=13744 tim=141509371101
FETCH #48:c=0,e=94556,p=58,cr=9,cu=1,mis=0,r=1,dep=0,og=1,tim=141509375781
FETCH #48:c=0,e=286,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509392875
FETCH #48:c=0,e=630,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509488422
FETCH #48:c=0,e=211,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509538318
FETCH #48:c=0,e=182,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509575541
FETCH #48:c=0,e=186,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509607322
FETCH #48:c=0,e=193,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509639153
WAIT #48: nam='db file scattered read' ela= 68589 file#=5 block#=4930 blocks=57 obj#=13744 tim=141509740153
FETCH #48:c=15625,e=70478,p=57,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509741827
FETCH #48:c=0,e=236,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509773991
FETCH #48:c=0,e=125,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509806662
FETCH #48:c=0,e=167,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509842109
FETCH #48:c=0,e=174,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509874456
FETCH #48:c=0,e=156,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509908995
FETCH #48:c=0,e=158,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509937588
WAIT #48: nam='db file scattered read' ela= 18046 file#=5 block#=4987 blocks=14 obj#=13744 tim=141509985862
FETCH #48:c=0,e=18687,p=14,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141509986305
FETCH #48:c=0,e=175,p=0,cr=9,cu=0,mis=0,r=25,dep=0,og=1,tim=141510017060
FETCH #48:c=0,e=150,p=0,cr=5,cu=0,mis=0,r=8,dep=0,og=1,tim=141510050453
*** 2010-01-03 10:51:19.656
One important detail, by the way – it may not be extremely obvious in the documentation but the “object id” referenced in the call to dbms_rowid() should be the data_object_id from dba_objects, not the object_id. You might also note that I’ve inserted 4095 as the “highest row in block” value to calculate the high rowid. (There’s an interesting consequence of this hard limit that could cause surprises in data warehouses and other system (e.g. Peoplesoft) that use 32KB block sizes during certain kinds of ETL processes).

Is there a convenient way around using DBA_EXTENTS? I have a few scripts that, on paper, are simply lovely assistants for the busy DBA; in reality, they stink like Grimsby docks. I am thinking of a “not quite up-to-the-minute” pseudo DBA_EXTENTS-like idea. Have you any idea if such a beast could be created, without whacking the DB?
Comment by Nigel — January 6, 2010 @ 4:01 pm UTC Jan 6,2010 |
Nigel,
There was some discussion about damage limitarion for queries relating to the information in dba_extents in a posting I did some time ago.
Ideally I think you’d want a trigger on seg$ which is kept up to date with the current number of extents in a segment – but you can’t create triggers on SYS objects. So maybe you could do something with a log miner session that kept looking for changes to seg$ instead. Access to dba_extents by way of a segment ID is (I think) one of the efficient access paths, so if you saw a segment change you may than have the option for keeping your flat copy in synch fairly efficiently.
Investigation (or alternative ideas) left as exercise for interested readers.
Comment by Jonathan Lewis — January 7, 2010 @ 7:44 am UTC Jan 7,2010 |
Hello, Jonathan,
why do you use explicit subquery factoring materialization in this case? I see no obvious reasons for doing this in your case; actually, IMO subquery factoring should be avoided in production-like code (if possible).
It’s also good to note the 11gR2 out-of-the-box new feature for tasks parallelization – DBMS_PARALLEL_EXECUTE. I haven’t used it yet, but it looks nice and neat.
Comment by Timur Akhmadeev — January 8, 2010 @ 8:03 pm UTC Jan 8,2010 |
[...] 12-How to force Oracle to scan only single datafile for manual parallelization ? Jonathan Lewis-Pseudo Parallel [...]
Pingback by Blogroll Report 01/01/2009 – 08/01/2010 « Coskan’s Approach to Oracle — January 20, 2010 @ 3:46 pm UTC Jan 20,2010 |