Desupport Note (Dec 2022): The basic content of this blog note is correct, but there is an updated version of the note, with an improved script to automate the process, available at this URL.
At one of the presentations I attended at RMOUG this year the presenter claimed that if a row kept increasing in size and had to migrate from block to block as a consequence then each migration of that row would leave a pointer in the previous block so that an indexed access to the row would start at the original table block and have to follow an ever-growing chain of pointers to reach the data.
This is not correct, and it’s worth making a little fuss about the error since it’s the sort of thing that can easily become an urban legend that results in people rebuilding tables “for performance” when they don’t need to.
Oracle behaves quite intelligently with migrated rows. First, the migrated row has a pointer back to the original location and if the row has to migrate a second time the first place that Oracle checks for space is the original block, so the row might “de-migrate” itself; however, even if it can’t migrate back to the original block, it will still revisit the original block to change the pointer in that block to refer to the block it has moved on to – so the row is never more than one step away from its original location. As a quick demonstration, here’s some code to generate and manipulate some data:
rem rem Script: row_migration.sql rem Author: Jonathan Lewis rem Dated: Feb 2014 rem rem Last tested rem 10.2.0.5 rem create table t1 ( id number(6,0), v1 varchar2(1200) ) pctfree 0 ; prompt ========================================== prompt The following code fits 74 rows to a block prompt ========================================== insert into t1 select rownum - 1, rpad('x',100) from all_objects where rownum <= 75; commit; prompt ====================================== prompt Make the first row migrate and dump it prompt ====================================== update t1 set v1 = rpad('x',400) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',2) prompt ====================================== prompt Fill the block the long row is now in, prompt force the row to migrate again, prompt then dump the block again. prompt ====================================== insert into t1 select rownum + 75, rpad('x',100) from all_objects where rownum <= 75; commit; update t1 set v1 = rpad('x',800) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3) prompt ======================================================== prompt Fill the block the long row is now in and shrink the row prompt to see if it returns to its original block. (It won't.) prompt ======================================================== insert into t1 select rownum + 150, rpad('x',100) from all_objects where rownum <= 75; commit; update t1 set v1 = rpad('x',50) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3) prompt ======================================================== prompt Free a lot of space in the first block and force the row prompt to migrate again to see if it migrates back. (It does.) prompt ======================================================== delete from t1 where id between 1 and 20; commit; update t1 set v1 = rpad('x',1200) where id = 0; commit; alter system flush buffer_cache; execute dump_seg('t1',3)
My test database was using 8KB blocks (hence the 74 rows per block), and 1MB uniform extents with freelist management. The procedure dump_seg() takes a segment name as its first parameter and a number of blocks as the second (then the segment type and starting block as the third and fourth) and dumps the first N data blocks of the segment. To demonstrate what goes on, I’ve extracted the content of the first row (id = 0) after each of the four dumps:
After the first update - the column count (cc) is zero and the "next rowid" (nrid) is row 1 of block 0x0140000b tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x2 cc: 0 nrid: 0x0140000b.1 After the second update - the next rowid is row 7 of block 0x0140000c tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x1 cc: 0 nrid: 0x0140000c.7 After the third update (shrinking the row) the row hasn't moved from block 0x0140000c tab 0, row 0, @0xb3 tl: 9 fb: --H----- lb: 0x2 cc: 0 nrid: 0x0140000c.7 After the fourth update (making space, and growing the row too much) the row moves back home tab 0, row 0, @0x4c1 tl: 1208 fb: --H-FL-- lb: 0x2 cc: 2 col 0: [ 1] 80 col 1: [1200] 78 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
My calls to dump blocks included the blocks where the row migrated to, so we’ll have a look at the target locations (as given by the original row location’s nrid) in those blocks over time. First we check block 0x0140000b, row 1 after the first two migrations:
tab 0, row 1, @0x1d7f tl: 414 fb: ----FL-- lb: 0x2 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1: [400] tab 0, row 1, @0x1d7f tl: 2 fb: ---DFL-- lb: 0x1
After the first migration (the row arrives here) we have a “head rowid” (hrid) pointer telling us where the row came from. After the second migration, when the row has moved on, we simply have a typical “deleted stub” – two bytes reserving the row directory entry until the commit has been done and cleaned out.
Then we can examine the second target (0x140000c, row 7) on the second and third and fourth updates:
tab 0, row 7, @0x1966 tl: 814 fb: ----FL-- lb: 0x2 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1: [800] tab 0, row 7, @0xb1 tl: 62 fb: ----FL-- lb: 0x1 cc: 2 hrid: 0x0140000a.0 col 0: [ 1] 80 col 1: [50] tab 0, row 7, @0xb1 tl: 2 fb: ---DFL-- lb: 0x2
As you can see, on arrival this location gets the original rowid as its “head rowid” (hrid), and it knows nothing about the intermediate block where the row was briefly in transit. I’ve copied the length byte (in square brackets) of column 1 in the dumps so that you can see that the row stayed put as it shrank. We can then see on the last update that we are left with a deleted stub in this block as the row migrates back to its original location when we try to extend it beyond the free space in this block.
Migrated rows are only ever one step away from home. It’s not nice to have a lot of them but it’s not necessarily a disaster.
Migrated rows are causing still too many ORA-600 and unpredictable behavior. Maybe that’s more motivating point to get rid of them as any “legendary” performance degradation
Comment by Pavol Babel — February 10, 2014 @ 9:59 pm GMT Feb 10,2014 |
Pavol,
I don’t think I’ve come across any ORA-00600’s due to migrated rows – but possibly that’s simply a case of not working regularly in an environment that makes them more likely. I could imagine that things like CDC, materialized views, or active standby (for example) would be more likely to hit unexpected side effects than some other features.
Do you have any indications of a particular mix of features where migration becomes a significant contributor of ORA-00600’s ? (I haven’t tried a generic search on MoS because the number of bugs you find under ANY heading is too worrying.)
Comment by Jonathan Lewis — February 11, 2014 @ 9:58 am GMT Feb 11,2014 |
I have hit few of them recently, all SRs closed as “not a bug” since we were not able to reproduce issue. alter table move made it always…
Comment by Pavol Babel — February 12, 2014 @ 7:12 pm GMT Feb 12,2014 |
Well, I remember hitting an ORA-600 bug way back on 7.3.4, involving a corner case with migrated rows, after I’d added a large VARCHAR2 column that nearly doubled the row sizes.
I’ve managed to avoid similar problems since, but I remember thinking at the time, “Boy, Oracle did a good job checking for this corner case when inserting a new row into a block, but if it’s a migrated row from another block, they never bothered to check for that possibility here.”
Oh, I think I might remember what it was… if it couldn’t get an ITL entry to insert a row, for a normal insert it does something sensible (going to another block IIRC), but if it’s migrating a row and it couldn’t get an ITL entry in the chosen destination, it would ORA-600. Or some similar scenario like that.
Whatever it was was fixed in 8, and I had no hope of getting a patch out of them for 7.3.4, so I had to reorg to fix it (which I wanted to do anyway once I realized what an unholy mess of migrated rows I’d created). But that there was a fixed bug on it implies that I wasn’t the only one having trouble with migrated rows, that Oracle didn’t initially catch.
But anyway, while I only have modest (and very old) corroboration from my own experience, it doesn’t surprise me in the least that migrated rows could still be, how do you say, dodgy…
Comment by Jason Bucata — February 12, 2014 @ 11:17 pm GMT Feb 12,2014 |
Jason,
A variant of your bug still existed long after 8i – thanks, in particular, to the arrival of ASSM. I wouldn’t be surprised if some of Pavol’s ORA-00600’s relate to that sort of area. Remember this puzzle from 5 years ago.
Comment by Jonathan Lewis — February 13, 2014 @ 8:28 am GMT Feb 13,2014
If your only tool is a pointer, everything looks like a chain.
Comment by jgarry — February 11, 2014 @ 12:30 am GMT Feb 11,2014 |
Thanks for nipping a potential myth in the bud.
Comment by Jared — February 13, 2014 @ 1:39 am GMT Feb 13,2014 |
Hi Jonathan,
I have faced a negative impact of migrated rows recently. During a Datapump export, which was very slow, all workers were performing mostly radnom IO(aka “db file sequential read”). It took a while to address the issue. I checked increased statistics for a datapump worker process (which was exporting the table), here is copy&paste few of them
statistic# 351 IMU undo allocation size 672
statistic# 381 table scan rows gotten 126593
statistic# 382 table scan blocks gotten 25474
statistic# 384 table fetch continued row 9117
statistic# 428 index fetch by key 1
The interesting one was “table fetch continued row”, which is weird for full table scan. As far I know oracle is skipping Head Pieces when performing FTS, since it is sufficient to “read” a migrated row later.
I had set datapump parameter PARALLEL to vlaue > 1 and it helped, random IO disappeared. Then returned the PARALLEL to 1 and set undocumented parameter ACCESS_METHOD=EXTERNAL_TABLE with the same effect.
Then I realized I should expect more people facing similar behaviour and the expectation was correct. http://www.ora-solutions.net/web/2012/09/12/datapump-export-suffering-from-oracle-row-migration/ and even a MOS note DataPump Export (EXPDP) For A Large Table Fails With Error ORA-1555 (Doc ID 1086414.1).
Do you have any clue why is Datapump following migrated row in DIRECT_PATH mode? Could you find any reason for that?
Regards
Pavol Babel
Comment by Pavol Babel — May 13, 2014 @ 8:38 pm BST May 13,2014 |
I’m surprised that parallel > 1 reduced the random I/O and parallel = 1 increased it; I could give you an argument about why exactly the opposite should happen. (viz. if a PX slave reads a head rowpiece it doesn’t know how many further rowpieces there are and which PX slaves might scan the blocks that hold them, so it has to go and find them; whereas if you’re running serially you KNOW that you’re going to get to the next piece).
I don’t know why access_method=external_table should be different, maybe it’s actually a statistical thing that appears as a side effect of Oracle using a different allocation algorithm for the PX slave granules. You could check that by enabling the appropriate _px_trace, or setting event 10391 (or maybe 10392).
Comment by Jonathan Lewis — May 13, 2014 @ 8:54 pm BST May 13,2014 |
Jonathan,
Good point, however I think we can completely forget of parallel at this moment, since the difference in random IO is even between parallel=1 AND access_method=direct path vs. parallel=1 AND access_method=external_table whils exporting single segment.
It is very interesting that direct_path method does not skip migrated rows (aka –H—–) as full table scan does.
Regards
Pavol Babel
Comment by Pavol Babel — May 13, 2014 @ 11:06 pm BST May 13,2014 |
How about this:
Maybe access_method=direct_path follows the code path of “parallel” execution, even though it may be serial. (Compare this with insert /*+ append */ which does serial direct inserts but leads to: “ORA-12838: cannot read/modify an object after modifying it in parallel” if you try to query the data before committing.) Then my comment about parallel execution needing to follow the row because each slave doesn’t know whether it will get to the migrated row makes sense. (Technically, of course, a rowpiece flagged as just H could be ignored completely, the one that has F but not L is the one that HAS to be followed, one with FL could be picked up by whichever slave was scanning the block.)
If access_method=external_table always reads through the buffer cache then it can just follow the standard procedure for migrated rows – pick them up later.
I’ve just looked at the blog link you sent me. The diagnostics there don’t seem to be consistent with any sort of reasonable explanation – re-reading the same block (db file sequential read) looks like a bug – it should be in the cache, why does it have to be read ? Does the parallel code path simply assume that it ALWAYs has to read a migrated block and completely ignore the cached one — it’s possible, and testable, though doesn’t seem sensible. The pattern of row lengths and flags from the “grep tl” looks interesting too – but really needs the byte offset for the rows before it’s possible to comment on it
Comment by Jonathan Lewis — May 14, 2014 @ 11:54 am BST May 14,2014
Jonathan,
first of all it seems you are not very experienced with datapump and it’s export modes:) (just joking )
1) when exporting a single segment in DIRECT_PATH mode, oracle is not able to take advantage of any parallel execution (due some implementation restrictions I think). It is faster as EXTERNAL table mode, however for bigger segments where it is reasonable to use parallel > 1, EXTERNAL can be much faster due parallel processing.
2) Even when a single segment is being exported in parallel mode, oracle prefers to use more datapump workers (serpate DWnn sessions), which are dividing job in similiar way as PX does, rowid range scan of course
3) In my testcase, when PARALLEL=1 ACCESS_MODE=DIRECT_PATH|EXTENRAL_TABLE oracle was exporting only by single worker and no PX processes use as well.
4) Direct path mode FULL SCAN was used in both cases. I was also thinking of difference due “buffered” full scans, however they didn’t kick in and oracle stuck to direct path read (the exported segment was much larger as buffer cache in my case)
5) To be honest, I didn’t read the blog link properly, I saw only migrated rows, datapump export and EXTERNAL_TABLE mode which helped. Unfortunately, some traces are grepped as you suggested, however I’d guess the only reasonable explanation for a db file sequential read is that the blogger didn’t have huge buffer cache, too, and plenty of migrated rows, hence block was several times removed from cache due LRU algorithm and re-read to cache over and over again. I have never seen any code path when oracle was able to bypass the buffer cache with a “db file sequential read” IO.
6) Following SQL is performed by datapump worker to query data from database with addion of v$session_event (~ 1min export activity)
DIRECT_PATH
EXTERNAL_TABLE
Also increasing stat “table fetch continued row” can be observed only for DIRECT_PATH modeI. I tried to open same cursor in PL/SQL, but no table to reproduce random IO.
Regards
Pavol Babel
Comment by Pavol Babel — May 16, 2014 @ 12:09 pm BST May 16,2014
Pavol,
If you enable sql_trace / 10046 etc. or look in v$sql you’ll find that the select statement you quoted for the direct_path access method is parsed but not executed. v$sql shows zero for executes, the trace file goes straight from PARSE: to WAIT with no EXEC: FETCH: lines. So whatever code data_pump is using it’s bypassing the normal statement execution methods. As such it’s not so surprising that it should behave differently from a normal tablescan.
Another interesting detail – if you check the trace file content in the blog you linked to, the average time between the repeated db file sequential reads of the single block is about 22 microseconds – not enough time for the block to be flushed normally. Moreover, when I watched an expdp dump which behaved in this way and kept monitoring the buffer cache, the blocks didn’t arrive in the cache!
Comment by Jonathan Lewis — May 16, 2014 @ 1:48 pm BST May 16,2014
Hi Jonathan, how do you know your actual row number in a block is 74?
Comment by micheliafigo — May 15, 2014 @ 3:20 am BST May 15,2014 |
micheliafigo,
In theory I could have worked it out from the details of column lengths and the various column, row and block overheads – but I think I probably dumped a block and counted them.
Comment by Jonathan Lewis — May 15, 2014 @ 6:46 pm BST May 15,2014 |
Jonathan,
it seems you are right again. Now I’m trying to inspect pstack / oradebug dump shortstack. I have also enabled 10046 trace and seems you are absoutely right (again I didn’ notice attached file on blog). Now I have 20GB buffer cache and two subsequent sequential random reads of the same block,I have to admit I’m simply amazed :)
The subsequent random IO is fast (0.25ms) which is latency of reading block from enterprise disk array (HP XP 24000) DDRAM cache. I also wonder whether such a read is counted as consistent get (som I will try to turn event 10200 as well, howwever kcbgtcr() is present in call stack so I would guess it will count). It seems like OCIPDirPathUnloadStream() has several very interesting magics :)
Comment by Pavol Babel — May 16, 2014 @ 4:37 pm BST May 16,2014 |
When playing with event 10200, I found 6 subsequent physical random reads of same block. Here is copy & paste of two of them. Do we really hit random reads counted as LIO without block arriving to cache? It is ridiculous but it seems you are absolutely right Jonathan (as always :) )
Comment by Pavol Babel — May 16, 2014 @ 5:00 pm BST May 16,2014 |
Hi Jonathan,
still have no idea why is DIRECT_PATH mode in expdp performing random reads, however I am facing similar issue after upgrading to 11.2.0.4 for important customer. It seems like FTS for some table with high amount of migrated rows inside a hash join could report also huge amount of random reads, due “table fetch by continued row”
Please check follwoing tkprof output, simple full scan performs only “direct part read”, in opposite to same table used in hash join. Please note there is more than 1/2 migrated row, in CL_DRAWDOWN table.
Regards
Pavol Babel
Comment by Pavol Babel — July 22, 2014 @ 12:37 am BST Jul 22,2014 |
Pavol,
Just catching up on comments here as quickly as I can:
I think the reason for the difference here is because count(*) only has to count row headers with the “First piece” flag set, so it doesn’t have to worry whether the row is chained or not. A better test would be to select “count(last column)” with a full hint.
Comment by Jonathan Lewis — August 4, 2014 @ 1:19 am BST Aug 4,2014 |
Hello, Jonathan.
Could you, please, clarify your phrase from the reply to comment #4 by Pavol Babel. You wrote “viz. if a PX slave reads a head rowpiece it doesn’t know how many further rowpieces there are and which PX slaves might scan the blocks that hold them, so it has to go and find them; whereas if you’re running serially you KNOW that you’re going to get to the next piece”. Is it connected only to true chained rows in the case of parallel FTS? Or to migrated rows too? In the last case I wonder why a PX slave cares which of his peers scans the only meaningful piece of a row . According to your article https://jonathanlewis.wordpress.com/2011/12/16/i-wish-3/ it is able to distinguish between chained and migrated rows reading appropriate flags.
I am investigating the case of parallel dml (say, merge or delete in) where a source table has a lot of migrated rows. In our database (version 11.2.0.4) a source table is scanned in parallel with lots of db file sequential reads due to migrated rows . There is MOS Note Doc ID 1514011.1 “Performance decrease for parallel DML on compressed tables or regular tables after 11.2 Upgrade”. It says that “this is expected behaviour on 11.2 due to the changes which were made at the way the scan is performed… The performance decreased in 11.2 in comparison to 11.1 due to the change which was made at the scan operation in 11.2 (In 11.2 is used “Head piece Scan” but in 11.1 is used “First piece scan”) and in this specific situation the impact was high due to the presence of massive number of chained rows.” But in the same time there is a registered Bug 17264297 “Serial DML with Parallel scan performs single block reads during full table scan when table has chained rows in 11.2”. The description tell us: “The problem was caused by a fix for PDML (so in the plan the DML operation is below the PX COORDINATOR) that was getting activated even for serial DML with parallel scan”. The bug is fixed and the patch is happlily applied by us. And in the serial case single block reads turned into multiblock reads either direct path or scattered as expected.
So I wonder what is the difference between the two, parallel dml and serial dml with parallel scan, which makes them behave differently while working with migrated rows. Why might the strategy of serial dml with parallel scan case not work in parallel dml case? I am going to make a service request to get some clarifications but maybe I miss something obvious?
Comment by Vasiliy — January 15, 2019 @ 2:02 am GMT Jan 15,2019 |
[…] This note is a follow-up to a recent comment a blog note about Row Migration: […]
Pingback by DML Tablescans | Oracle Scratchpad — January 18, 2019 @ 1:26 pm GMT Jan 18,2019 |
[…] occurred to me that the vague memory might have been due to my blog note from 2014 explaining that this doesn’t happen. If a row migrates (i.e. the whole row gets moved to a […]
Pingback by Row Migration | Oracle Scratchpad — December 5, 2022 @ 12:11 pm GMT Dec 5,2022 |
[…] Row migration (Feb 2014): The original note, also pointing out the “deleted row” stubs left behind after each migration. […]
Pingback by Infrastructure Catalogue | Oracle Scratchpad — December 5, 2022 @ 12:16 pm GMT Dec 5,2022 |