Oracle Scratchpad

January 18, 2019

DML Tablescans

Filed under: deadlocks,Infrastructure,Locks,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:25 pm BST Jan 18,2019

This note is a follow-up to a recent comment a blog note about Row Migration:

So I wonder what is the difference between the two, parallel dml and serial dml with parallel scan, which makes them behave differently while working with migrated rows. Why might the strategy of serial dml with parallel scan case not work in parallel dml case? I am going to make a service request to get some clarifications but maybe I miss something obvious?

The comment also referenced a couple of MoS notes:

  • Bug 17264297 “Serial DML with Parallel scan performs single block reads during full table scan when table has chained rows in 11.2”
  • Doc ID 1514011.1 “Performance decrease for parallel DML on compressed tables or regular tables after 11.2 Upgrade

The latter document included a comment to the effect that 11.2 uses a “Head Piece Scan” while 11.1 uses a “First Piece scan”, which is a rather helpful comment. Conveniently the blog note itself referenced an earlier note on the potential for differentiating between migrated and chained rows through a “flag” byte associated with each row piece. The flag byte has an H bit for the row head piece, an F bit for the row first piece, and L bit for the row last piece and {no bits set} for a row piece in the middle of a chained row.

Side note: A “typical” simple row will be a single row-piece with the H, F and L bits all set; a simple migrated row will start with an “empty” row-piece in one block with the H bit set and a pointer (nrid – next rowid) to a row in another block that will have the F and L bits set and a pointer (hrid – head rowid) back to the head piece. A chained row could start with a row piece holding a few columns and the H and F bits set and a pointer to the next row piece which might lead to a long chain of row pieces with no bits set each pointing to the next row piece until you get to a row piece with the L bit set.  Alternatively you might have row which had migrated and chained – which means it could start with an empty row piece with just the H bit and a pointer to the next row piece, then a row piece with the F bit set, a back pointer to the header, and a next pointer to the next row piece, which could lead to a long chain of row pieces with no bits set until you reach a row piece with the L bit set.

Combining the comments about “head piece” and “first piece” scans with the general principles of DML and locking it’s now possible to start makings some guesses about why the Oracle developers might want updates through tablescans to behave differently for serial and parallel tablescans. There are two performance targets to consider:

  • How to minimise random (single block) I/O requests
  • How to minimise the risk of deadlock between PX server processes.

Assume you’re doing a serial tablescan to find rows to update – assume for simplicity that there are no chained rows in the table. When you hit a migrated row (H bit only) you could follow the next rowid pointer (nrid) to find and examine the row. If you find that it’s a row that doesn’t need to be updated you’ve just done a completely redundant single block read; so it makes sense to ignore row pieces which are “H”-only row pieces and do a table scan based on “F” pieces (which will be FL “whole row” pieces thanks to our assumption of no chained rows). If you find a row which is an F row and it needs to be updated then you can do a single block read using the head rowid pointer (hrid) to lock the head row piece then lock the current row piece and update it; you only do the extra single block read for rows that need updates, not for all migrated rows. So this is (I guess) the “First Piece Scan” referenced in Doc ID 1514011.1. (And, conversely, if you scan the table looking only for row pieces with the H flag set this is probably the “Head Piece Scan”).

But there’s a potential problem with this strategy if the update is a parallel update. Imagine parallel server process p000 is scanning the first megabyte of a table and process p001 is scanning the second megabyte using the “first piece” algorithm.  What happens if p001 finds a migrated row (flags = FL) that needs to be updated and follows its head pointer back into a block in the megabyte being scanned by p000?  What if p000 has been busy updating rows in that block and there are no free ITLs for p001 to acquire to lock the head row piece? You have the potential for an indefinite deadlock.

On the other hand, if the scan is using the “head piece” algorithm p000 would have found the migrated row’s head piece and followed the next rowid pointer into a block in the megabyte being scanned by p001. If the row needs to be updated p000 can lock the head piece and the migrated piece.

At this point you might think that the two situations are symmetrical – aren’t you just as likely to get a deadlock because p000 now wants an ITL entry in a block that p001 might have been updating? Statistically the answer is “probably not”. When you do lots of updates it is possible for many rows to migrate OUT of a block; it is much less likely that you will see many rows migrate INTO a specific block. This means that in a parallel environment you’re more likely to see several PX servers all trying to acquire ITL entries in the same originating block than you are  to see several PX servers trying to acquire ITL entries in the same destination block. There’s also the feature that when a row (piece) migrates into a block Oracle adds an entry to the ITL list if the number of inwards migrated pieces is more than the current number of ITL entries.

Conclusion

It’s all guesswork of course, but I’d say that for a serial update by tablescan Oracle uses the “first piece scan” to minimise random I/O requests while for a parallel update by tablescan Oracle uses the “head piece scan” to minimise the risk of deadlocks – even though this is likely to increase the number of random (single block) reads.

Finally (to avoid ambiguity) if you’ve done an update which does a parallel tablescan but a serial update (by passing rowids to the query co-ordinator) then I’d hope that Oracle would use the “first piece scan” for the parallel tablescan because there’s no risk of deadlock when only the query co-ordinator is the only process doing the locking and updating, which makes it safe to use the minimum I/O strategy. (And a paralle query with serial update happens quite frequently because people forget to enable parallel dml.)

Footnote

While messing around to see what happened with updates and rows that were both migrated and chained I ran the following script to create one nasty row. so that I could dump a few table blocks to check for ITLs, pointers, and locks. The aim was to get a row with a head-only piece (“H” bit), an F-only piece, a piece with no bits set, then an L-only piece. With an 8KB block size and 4,000 byte maximum for varchar2() this is what I did:


rem
rem     Script:         migrated_lock.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2019
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem

create table t1 (
        n1 number,
        l1 varchar2(4000),
        s1 varchar2(200),
        l2 varchar2(4000),
        s2 varchar2(200),
        l3 varchar2(4000),
        s3 varchar2(200)
);

insert into t1 (n1,l1,s1) values(0,rpad('X',4000,'X'),rpad('X',200,'X'));
commit;

insert into t1 (n1,l1) values(1,null);
commit;

update t1 set
        l1 = rpad('A',4000),
        s1 = rpad('A',200),
        l2 = rpad('B',4000),
        s2 = rpad('B',200),
        l3 = rpad('C',4000),
        s3 = rpad('C',200)
where
        n1 = 1
;

commit;

execute dbms_stats.gather_table_stats(user,'t1');

update t1 set
        s1 = lower(s1),
        s2 = lower(s2),
        s3 = lower(s3)
where
        n1 = 1
;

alter system flush buffer_cache;

select
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no,
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from
        t1
group by
        dbms_rowid.rowid_relative_fno(rowid),
        dbms_rowid.rowid_block_number(rowid)
order by
        dbms_rowid.rowid_relative_fno(rowid),
        dbms_rowid.rowid_block_number(rowid)
;

The query with all the calls to dbms_rowid gave me the file and block number of the row I was interested in, so I dumped the block, then read the trace file to find the next block in the chain, and so on. The first block held just the head piece, the second block held the n1 and l1 columns (which didn’t get modified by the update), the third block held the s1 and l2 columns, the last block held the s2, l3 and s3 columns. I had been expecting to see the split as (head-piece(, (n1, l1, s1), (l2, s2), (l3, s3) – but as it turned out the unexpected split was a bonus.

Here are extracts from each of the blocks (in the order they appeared in the chain), showing the ITL information and the “row overhead” information. If you scan through the list you’ll see that three of the 4 blocks have an ITL entry for transaction id (xid) of 8.1e.df3, using three consecutive undo records in undo block 0x0100043d. My update has locked 3 of the 4 rowpieces – the header and the two that have changed. It didn’t need to “lock” the piece that didn’t change. (This little detail was the bonus of the unexpected split.)


Block 184
---------
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.36  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.2e  ----    1  fsc 0x0000.00000000

...

tab 0, row 1, @0xf18
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x00800089.0



Block 137       (columns n1, l1 - DID NOT CHANGE so no ITL entry acquired)
---------       (the lock byte relates to the previous, not cleaned, update) 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.35  --U-    1  fsc 0x0000.005beb39
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xfcb
tl: 4021 fb: ----F--- lb: 0x1  cc: 2
hrid: 0x008000b8.1
nrid:  0x00800085.0



Block 133 (columns s1, l2)
--------------------------
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.34  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.2f  ----    1  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xf0b
tl: 4213 fb: -------- lb: 0x2  cc: 2
nrid:  0x008000bc.0



Block 188 (columns s2, l3, s3)
------------------------------
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.33  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.30  ----    1  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xe48
tl: 4408 fb: -----L-- lb: 0x2  cc: 3

Note, by the way, how there are nrid (next rowid) entries pointing forward in every row piece (except the last), but it’s only the “F” (First) row-piece has the hrid (head rowid) pointer pointing backwards.

 

6 Comments »

  1. Hi, Jonathan.

    I am the author of the initial question. Nice guesswork! So if I am very-very unlucky I can get a deadlock and a rollback as a result doing the only parallel dml (and there is another dml on the same table in the same time) some day, can’t I? I’ll think how to build a testcase considering the feature you mentioned about spawning additional ITL entries.

    Actually, my case was different. Sorry if I wasn’t clear. I meant a case like _merge into DestinationTable using SourceTable…_ It’s SourceTable which is scanned with single block reads due to its migrated rows. If it were DestinationTable I would tolerate it as I guessed that it could be a concurrency issue between PX slaves which you described so clearly in this post. So we can guess that a PX slave is spawned with only one scan strategy in mind either Head Piece scans or First Piece scans and it is not able to switch between the two on demand. Or the developers just cut some corners while solving the issue with First Piece scans in a PDML case.

    Comment by Василий Антонов — January 18, 2019 @ 5:50 pm BST Jan 18,2019 | Reply

    • I wrote So if I am very-very unlucky I can get a deadlock and a rollback as a result doing the only parallel dml (and there is another dml on the same table in the same time) some day, can’t I?

      No I can’t. I have realized my mistake. I can’t do anything else while doing a parallel dml by design. And vice versa, I can’t do a parallel dml while doing anyting else. I have always connected this restriction with parallel insert (or serial append) only for no obvious reason (in the same time I have had a strong habit to commit after every parallel dml for the obvious reason :)).

      So the feature you mentioned (spawning additional ITL slots for immigrated rows) prevents deadlocks in parallel dml for the head piece scan strategy. I assume that the number of ITL slots should be strictly more than the number of immigrated rows so there is always enough room in the ITL table of a block for “host” PX slave and all possible “guest” PX slaves forwarded by head pieces.

      Comment by Vasiliy Antonov — January 21, 2019 @ 3:44 am BST Jan 21,2019 | Reply

  2. Hi,

    > So this is (I guess) the “First Piece Scan” referenced in Doc ID 1514011.1. (And, conversely, if you scan the table looking only for row pieces with the H flag set this is probably the “Head Piece Scan”).
    > Note, by the way, how there are nrid (next rowid) entries pointing forward in every row piece (except the last), but it’s only the “F” (First) row-piece has the hrid (head rowid) pointer pointing backwards.

    So I guess there is no corresponding optimization for chained rows (unless there is a p(rev)rid too?). Oracle plays the odds that locking is most likely to occur in F-pieces. If its not in the “F”-piece Oracle needs to single block read forward along the chain to the “”/”L”-pieces and presumably skips these during the main scan. So Oracle says if you design your table with massive chaining (not migrating) and want to do DML its your own fault. Better use a talespace with bigger blocks or split(redesign) your table.

    regards,

    Racer I.

    Comment by Racer I. — January 21, 2019 @ 6:56 am BST Jan 21,2019 | Reply

  3. Hi,

    This may make chained rows even more expensive, since each non-“H”-piece adds another ITL slot to its block.

    regards,

    Racer I.

    Comment by Racer I. — January 21, 2019 @ 7:22 am BST Jan 21,2019 | Reply

    • Racer I,

      24 bytes per ITL slot does not seem to be big overhead for a row which spans several Kb (>= 4Kb, 8Kb, 16Kb depending on the size of db block). And if there is heavy chaining in your table by design then the blocks contain only few rows.

      Comment by Vasiliy Antonov — January 21, 2019 @ 7:47 am BST Jan 21,2019 | Reply

  4. Hi,

    > update which does a parallel tablescan but a serial update (by passing rowids to the query co-ordinator) then I’d hope that Oracle would use the “first piece scan” for the parallel tablescan

    > So we can guess that a PX slave is spawned with only one scan strategy in mind either Head Piece scans or First Piece scans

    Maybe worth checking (could be the same or different cases) if the presence of DML (parallel or not, on the same or a different object and with/without the fix for Bug 17264297) “poisons” the scanning PX-slaves into the “H”-strategy (for any object). Well, should affect only objects with actual migrating/chaining.

    regards,

    Racer I.

    Comment by Racer I. — January 21, 2019 @ 7:37 am BST Jan 21,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.