Row Migration

December 3, 2018

Row Migration

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 4:27 pm GMT Dec 3,2018

There’s a little detail of row migration that’s been bugging me for a long time – and I’ve finally found a comment on MOS explaining why it happens. Before saying anything, though, else I’m going to give you a little script (that I’ve run on 12.2.0.1 with an 8KB block size in a tablespace using manual (freelist) space management and system allocated extents) to demonstrate the anomaly.


rem
rem     Script:         migration_itl.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             12.2.0.1
rem     Notes
rem     Under ASSM we can get 733 rows in the block,
rem     using freelist management it goes up to 734
rem

create table t1 (v1 varchar2(4000))
segment creation immediate
tablespace test_8k
pctfree 0
;

insert into t1
select  null from dual connect by level <= 734 -- > comment to avoid wordpress format issue
;

commit;

spool migration_itl.lst

column rel_file_no new_value m_file
column block_no    new_value m_block

select 
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no, 
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from 
        t1
group by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid) 
order by 
        dbms_rowid.rowid_relative_fno(rowid), 
        dbms_rowid.rowid_block_number(rowid)
;

update t1 set v1 = rpad('x',10);
commit;

alter system flush buffer_cache;

alter system dump datafile &m_file block &m_block;

column tracefile new_value m_tracefile

select
        tracefile 
from 
        v$process where addr = (
                select paddr from v$session where sid = (
                        select sid from v$mystat where rownum = 1
                )
        )
;

-- host grep nrid &m_tracefile

spool off

The script creates a single column table with pctfree set to zero, then populates it with 734 rows where every row has a null for its single column. The query using the calls to the dbms_rowid package will show you that all 734 rows are in the same block. In fact the block will be full (leaving a handful of bytes of free space) because even though each row will require only 5 bytes (2 bytes row directory entry, 3 bytes row overhead, no bytes for data) Oracle’s arithmetic will allow for the 11 bytes that is the minimum needed for a row that has migrated – the extra 6 bytes being the “next rowid” (nrid) pointer to where the migrated row now lives. So 734 rows * 11 bytes = 8078, leaving 4 bytes free space with 110 bytes block and transaction layer overhead.

After populating and reporting the table the script then updates every row to grow it by a few bytes, and since there’s no free space every row will migrate to a new location. By dumping the block (flushing the buffer cache first) I can check where each row has migrated to. (If you’re running a UNIX flavour and have access to the trace directory then the commented grep command will give you what you want to see.) Here’s a small extract from the dump on a recent run:

nrid:  0x05c00082.0
nrid:  0x05c00082.1
nrid:  0x05c00082.2
nrid:  0x05c00082.3
...
nrid:  0x05c00082.a4
nrid:  0x05c00082.a5
nrid:  0x05c00082.a6
nrid:  0x05c00083.0
nrid:  0x05c00083.1
nrid:  0x05c00083.2
nrid:  0x05c00083.3
...
...
...
nrid:  0x05c00085.a4
nrid:  0x05c00085.a5
nrid:  0x05c00085.a6
nrid:  0x05c00086.0
nrid:  0x05c00086.1
nrid:  0x05c00086.2
...
nrid:  0x05c00086.3e
nrid:  0x05c00086.3f
nrid:  0x05c00086.40
nrid:  0x05c00086.41

My 734 rows have migrated to fill the next four blocks (23,130) to (23,133) of the table and taken up some of the space in the one after that (23,134). The first four blocks have used up row directory entries 0x00 to oxa6 (0 to 166), and the last block has used up row directory entries 0x00 to 0x41 (0 to 65) – giving us the expected total: 167 * 4 + 66 = 734 rows. Let’s dump one of the full blocks – and extract the interesting bits:

alter system dump datafile 23 block 130;
Block header dump:  0x05c00082
 Object id on Block? Y
 seg/obj: 0x1ba1e  csc:  0x0000000001e0aff3  itc: 169  flg: -  typ: 1 - DATA
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0006.00f.000042c9  0x0240242d.08f3.14  --U-  167  fsc 0x0000.01e0affb
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x04   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x05   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0x06   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
...
0xa6   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa7   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa8   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000
0xa9   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

nrow=167
frre=-1
fsbo=0x160
fseo=0x2ec
avsp=0x18c
tosp=0x18c

tab 0, row 0, @0xfe4
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.0
col  0: [10]  78 20 20 20 20 20 20 20 20 20
tab 0, row 1, @0xfd0
tl: 20 fb: ----FL-- lb: 0x1  cc: 1
hrid: 0x05c00081.1

This block has 169 (0xa9) ITL entries – that’s one for each row migrated into the block (nrow = 167) plus a couple spare. The block still has some free space (avsp = tosp = 0x18c: available space = total space = 396 bytes), but it can’t be used for any more incoming migration because Oracle is unable to create any more ITL entries – it’s reached the ITL limit for 8KB blocks.

So finally we come to the question that’s been bugging me for years – why does Oracle want an extra ITL slot for every row that has migrated into a block? The answer appeared in this sentence from MOS Doc ID: 2420831.1: “Errors Noted in 12.2 and Above During DML on Compressed Tables”

“It is a requirement during processing of parallel transactions that each data block row that does not have a header have a block ITL available.”

Rows that have migrated into a block do not have a row header – check the flag byte (fb) for the two rows I’ve listed, it’s: “—-FL–“ , there is no ‘H’ for header. We have the First and Last row pieces of the row in this block and that’s it. So my original “why” question now becomes “What’s the significance of parallel DML?”

Imagine the general case where we have multiple processes updating rows at random from multiple blocks, and many different processes forced rows to migrate at the same time into the same block. The next parallel DML statement would dispatch multiple parallel execution slaves, which would all be locking rows in their own separate block ranges – but multiple slaves could find that they wanted to lock rows which had all migrated into the same block – so every slave MUST be able to get an ITL entry in that block at the same time; for example, if we have 8 rows that had migrated into a specific block from 8 different places, and 8 parallel execution slaves each followed a pointer from the region they were scanning to update a row that had migrated into this particular block then all 8 slaves would need an ITL entry in the block (and if there were a ninth slave scanning this region of the table we’d need a 9th ITL entry). If we didn’t have enough ITL entries in the block for every single migrated row to be locked by a different process at the same time then (in principle, at least) parallel execution slaves could deadlock each other because they were visiting blocks in a different order to lock the migrated rows. For example:

PQ00 visits and locks a row that migrated to block (23,131)
PQ01 visits and locks a row that migrated to block (23,132)
PQ00 visits and tries to lock a row that migrated to block (23,132) — but if there were no “extra” ITL slots available, it would wait
PQ01 visits and tries to lock a row that migrated to block (23,131) — but there were no “extra” ITL slots available so it would wait, and we’d be in a deadlock.

Oracle’s solution to this threat: when migrating a row to a block add a new ITL if the number of migrated rows exceeds the number of ITL slots + 2.

Footnote 1

The note was about problems with compression for OLTP, but the underlying message was about 4 Oracle errors of type ORA-00600 and ORA-00700, which report the discovery and potential threat of blocks where the number of ITL entries isn’t large enough compared to the number of inward migrated rows. Specifically:

ORA-00600 [PITL1]
ORA-00600 [kdt_bseg_srch_cbk PITL1]
ORA-00700: soft internal error, arguments: [PITL6]
ORA-00700: soft internal error, arguments: [kdt_bseg_srch_cbk PITL5]

Footnote 2

While drafting the SQL script above, I decide to check to see how many other scripts I had already written about migrated rows and itl slots: there were 12 of the former and 10 of the latter and reading through the embedded notes I found that one of the scripts (itl_chain.sql, dated December 2002) included the following:

According to a comment that came from Oracle support via Steve Adams the reason for the extra ITLs is to avoid a risk of parallel DML causing an internal deadlock.

So it turns out that about 16 years ago I knew what the “excess” ITL entries were for but at some point had managed to forget the reason.

Comments (11)

11 Comments »

If other dml statement is executed ,there’s another reason to avoid ‘enq: TX – allocate ITL entry ‘.

Comment by zhwsh — December 5, 2018 @ 2:30 am GMT Dec 5,2018 | Reply
Hi Jonathon

quick question

My 734 rows have migrated to fill the next four blocks (23,130) to (23,133) of the table and taken up some of the space in the one after that (23,134)?

how did you find out the rows are migrated to 130 to 134 from the above trace? Please clarify –Thanks

Comment by Bhavani P Dhulipalla — December 9, 2018 @ 2:48 am GMT Dec 9,2018 | Reply
- Bhavani,
  
  Sorry for the late reply; I thought I’d replied to this last week but I must have forgotten to do so.
  When a row migrates (or chains) Oracle adds a “next rowid” (nrid) to each row. That;s what I grep’ed out from the block dump, and you can see (for example) the nrid value 0x05c00082.0 for the first row in the block. This translates to file 23, block 0x82 = decinal 130, row number 0 within block.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — December 15, 2018 @ 6:55 pm GMT Dec 15,2018 | Reply
The forwarding addresses are ignored. We know that as we continue the full scan, we’ll eventually get to that row so we can ignore the forwarding address and just process the row when we get there. Hence, in a full scan migrated rows don’t cause us to really do any extra work — they are meaningless.

Comment by Jameston — December 16, 2018 @ 1:12 pm GMT Dec 16,2018 | Reply
- What about full table scan using parallel servers? Does oracle ignore forwarding rowid?
  
  I found that using rowid range scan with serial query trying to to direct serial reads have awful performance in case of significant number of migrated rows turning into lots of random db file sequential reads…
  
  Comment by Евгений Пузиков — May 1, 2024 @ 4:27 pm BST May 1,2024 | Reply
  - Thanks for the question.
    
    Is this when you are using a predicate like: “where rowid between {rowid1} and {rowid2}”? If so then (though I hadn’t previously thought about it – so thanks again … for a new idea) I’m not surprised. In that context the rowid is an identifier for the rows you’re interested in, so Oracle has to consider the content of any row in the supplied rowid range – which means following the nrid (next rowid) to the migrated data.
    
    There is also an Exadata feature to consider – a serial smart scan could be using several different processes to scan many 1MB chunks concurrently, and a migrated row could be (probably would be) in a different chunk from its origin. I’m wondering if a process that finds a migrated row blocks the smart scan of that chunk and passes the whole chunk to the database server to sort out following the nrid to get the rest of the row. (There used to be an issue with this type of behaviour when there were chained rows in a chunk, but the code for handling inserts of long rows was modified to increase the chances of the row being somewhere in the same chunk as its origin.)
    
    Regards
    
    Jonathan Lewis
    
    Comment by Jonathan Lewis — May 3, 2024 @ 10:23 am BST May 3,2024 | Reply
    - Yes, precisely, rowid between… with forced direct read in session.
      I need to extract data from a 20TB table on AIX (unfortunately not ExaData) and load it onto a Linux machine. We have our proprietary export-import utility, we run multiple instances using rowid ranges to distribute data and avoid network or single-file write bottlenecks. However, it seems ineffective for tables with approximately 20% migrated rows. Exporting takes around 30 hours instead of the expected 3-4 hours. I’m planning to switch to Data Pum using parallel processing, external tables method, and compression. Hopefully, this approach will yield better results if datapump will be have good parallel distribution and ignore forwarding rowids…
      
      Comment by Евгений Пузиков — May 3, 2024 @ 10:51 am BST May 3,2024
  - There may still be a problem with expdp and direct path reads with chained/migrated rows in place. Here’s a link to a blog note I wrote a little while ago triggered by a problem that had come up on the MOS Community forum a couple of years ago: Hakan Factor | Oracle Scratchpad (wordpress.com) (The forum note has more information about the specific case but requires a MOS account: Problem with migrated rows — oracle-mosc).
    
    You may find that expdp still follows the rowids, although I’ve found a note (actually about chained rows with expdp) that seems to be related and claims to be fixed in 12.1.0.2 – Bug 17293498 : BAD EXPDP PERFORMANCE WITH ACCESS_METHOD=DIRECT_PATH + CHAINED ROWS
    
    The description of the fix says that it caches the single block reads done while in direct path mode – so you’ll still be following the links, but if the same block is “migrated to” multiple times it will only be read once. So that’s not necessarily going to help very much, you’ll still be doing the single block reads.
    
    Regards
    
    Jonathan Lewis
    
    Comment by Jonathan Lewis — May 3, 2024 @ 11:48 am BST May 3,2024 | Reply
    - Dear Jonathan,
      
      I’ve tried expdp with parallel and access_path=external_table and it worked. I managed to export 20Tb table with lots of migrated rows in 4.5 hours.
      
      Regards, Evgeny
      
      Comment by Евгений Пузиков — May 7, 2024 @ 2:56 pm BST May 7,2024
    - Evgeny,
      
      It’s nice to have the feedback. So it sounds as if you’ve got “bulk migration” i.e. several rows in a single block being migrated at the same time, hence ending up in the same target block which is now read once then found in the cache several times.
      
      This is a hint that the application may be inserting short rows and immediately updating them and has a PCTFREE setting that’s too small for the updates. It would be worth checking whether this hypothesis is correct and modifying the value to something more suitable if so.
      
      Regards
      
      Jonathan Lewis
      
      Comment by Jonathan Lewis — May 8, 2024 @ 9:43 am BST May 8,2024
Jameston,

That is generally correct – but not particularly relevant to the question of why row migration can result in large numbers.

Even then there are (inevitably) special cases – for example a parallel update by tablescan will follow the pointer to the row immediately, and I have a blog note that talks about various effects of migrated rows during index creation or rebuild (https://jonathanlewis.wordpress.com/2010/06/08/continued-rows/ ) though that might be out of date by now.

Comment by Jonathan Lewis — December 16, 2018 @ 7:08 pm GMT Dec 16,2018 | Reply