Oracle Scratchpad

January 18, 2019

DML Tablescans

Filed under: deadlocks,Infrastructure,Locks,Oracle,Parallel Execution,Performance — Jonathan Lewis @ 1:25 pm BST Jan 18,2019

This note is a follow-up to a recent comment a blog note about Row Migration:

So I wonder what is the difference between the two, parallel dml and serial dml with parallel scan, which makes them behave differently while working with migrated rows. Why might the strategy of serial dml with parallel scan case not work in parallel dml case? I am going to make a service request to get some clarifications but maybe I miss something obvious?

The comment also referenced a couple of MoS notes:

  • Bug 17264297 “Serial DML with Parallel scan performs single block reads during full table scan when table has chained rows in 11.2”
  • Doc ID 1514011.1 “Performance decrease for parallel DML on compressed tables or regular tables after 11.2 Upgrade

The latter document included a comment to the effect that 11.2 uses a “Head Piece Scan” while 11.1 uses a “First Piece scan”, which is a rather helpful comment. Conveniently the blog note itself referenced an earlier note on the potential for differentiating between migrated and chained rows through a “flag” byte associated with each row piece. The flag byte has an H bit for the row head piece, an F bit for the row first piece, and L bit for the row last piece and {no bits set} for a row piece in the middle of a chained row.

Side note: A “typical” simple row will be a single row-piece with the H, F and L bits all set; a simple migrated row will start with an “empty” row-piece in one block with the H bit set and a pointer (nrid – next rowid) to a row in another block that will have the F and L bits set and a pointer (hrid – head rowid) back to the head piece. A chained row could start with a row piece holding a few columns and the H and F bits set and a pointer to the next row piece which might lead to a long chain of row pieces with no bits set each pointing to the next row piece until you get to a row piece with the L bit set.  Alternatively you might have row which had migrated and chained – which means it could start with an empty row piece with just the H bit and a pointer to the next row piece, then a row piece with the F bit set, a back pointer to the header, and a next pointer to the next row piece, which could lead to a long chain of row pieces with no bits set until you reach a row piece with the L bit set.

Combining the comments about “head piece” and “first piece” scans with the general principles of DML and locking it’s now possible to start makings some guesses about why the Oracle developers might want updates through tablescans to behave differently for serial and parallel tablescans. There are two performance targets to consider:

  • How to minimise random (single block) I/O requests
  • How to minimise the risk of deadlock between PX server processes.

Assume you’re doing a serial tablescan to find rows to update – assume for simplicity that there are no chained rows in the table. When you hit a migrated row (H bit only) you could follow the next rowid pointer (nrid) to find and examine the row. If you find that it’s a row that doesn’t need to be updated you’ve just done a completely redundant single block read; so it makes sense to ignore row pieces which are “H”-only row pieces and do a table scan based on “F” pieces (which will be FL “whole row” pieces thanks to our assumption of no chained rows). If you find a row which is an F row and it needs to be updated then you can do a single block read using the head rowid pointer (hrid) to lock the head row piece then lock the current row piece and update it; you only do the extra single block read for rows that need updates, not for all migrated rows. So this is (I guess) the “First Piece Scan” referenced in Doc ID 1514011.1. (And, conversely, if you scan the table looking only for row pieces with the H flag set this is probably the “Head Piece Scan”).

But there’s a potential problem with this strategy if the update is a parallel update. Imagine parallel server process p000 is scanning the first megabyte of a table and process p001 is scanning the second megabyte using the “first piece” algorithm.  What happens if p001 finds a migrated row (flags = FL) that needs to be updated and follows its head pointer back into a block in the megabyte being scanned by p000?  What if p000 has been busy updating rows in that block and there are no free ITLs for p001 to acquire to lock the head row piece? You have the potential for an indefinite deadlock.

On the other hand, if the scan is using the “head piece” algorithm p000 would have found the migrated row’s head piece and followed the next rowid pointer into a block in the megabyte being scanned by p001. If the row needs to be updated p000 can lock the head piece and the migrated piece.

At this point you might think that the two situations are symmetrical – aren’t you just as likely to get a deadlock because p000 now wants an ITL entry in a block that p001 might have been updating? Statistically the answer is “probably not”. When you do lots of updates it is possible for many rows to migrate OUT of a block; it is much less likely that you will see many rows migrate INTO a specific block. This means that in a parallel environment you’re more likely to see several PX servers all trying to acquire ITL entries in the same originating block than you are  to see several PX servers trying to acquire ITL entries in the same destination block. There’s also the feature that when a row (piece) migrates into a block Oracle adds an entry to the ITL list if the number of inwards migrated pieces is more than the current number of ITL entries.

Conclusion

It’s all guesswork of course, but I’d say that for a serial update by tablescan Oracle uses the “first piece scan” to minimise random I/O requests while for a parallel update by tablescan Oracle uses the “head piece scan” to minimise the risk of deadlocks – even though this is likely to increase the number of random (single block) reads.

Finally (to avoid ambiguity) if you’ve done an update which does a parallel tablescan but a serial update (by passing rowids to the query co-ordinator) then I’d hope that Oracle would use the “first piece scan” for the parallel tablescan because there’s no risk of deadlock when only the query co-ordinator is the only process doing the locking and updating, which makes it safe to use the minimum I/O strategy. (And a paralle query with serial update happens quite frequently because people forget to enable parallel dml.)

Footnote

While messing around to see what happened with updates and rows that were both migrated and chained I ran the following script to create one nasty row. so that I could dump a few table blocks to check for ITLs, pointers, and locks. The aim was to get a row with a head-only piece (“H” bit), an F-only piece, a piece with no bits set, then an L-only piece. With an 8KB block size and 4,000 byte maximum for varchar2() this is what I did:


rem
rem     Script:         migrated_lock.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2019
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem

create table t1 (
        n1 number,
        l1 varchar2(4000),
        s1 varchar2(200),
        l2 varchar2(4000),
        s2 varchar2(200),
        l3 varchar2(4000),
        s3 varchar2(200)
);

insert into t1 (n1,l1,s1) values(0,rpad('X',4000,'X'),rpad('X',200,'X'));
commit;

insert into t1 (n1,l1) values(1,null);
commit;

update t1 set
        l1 = rpad('A',4000),
        s1 = rpad('A',200),
        l2 = rpad('B',4000),
        s2 = rpad('B',200),
        l3 = rpad('C',4000),
        s3 = rpad('C',200)
where
        n1 = 1
;

commit;

execute dbms_stats.gather_table_stats(user,'t1');

update t1 set
        s1 = lower(s1),
        s2 = lower(s2),
        s3 = lower(s3)
where
        n1 = 1
;

alter system flush buffer_cache;

select
        dbms_rowid.rowid_relative_fno(rowid)    rel_file_no,
        dbms_rowid.rowid_block_number(rowid)    block_no,
        count(*)                                rows_starting_in_block
from
        t1
group by
        dbms_rowid.rowid_relative_fno(rowid),
        dbms_rowid.rowid_block_number(rowid)
order by
        dbms_rowid.rowid_relative_fno(rowid),
        dbms_rowid.rowid_block_number(rowid)
;

The query with all the calls to dbms_rowid gave me the file and block number of the row I was interested in, so I dumped the block, then read the trace file to find the next block in the chain, and so on. The first block held just the head piece, the second block held the n1 and l1 columns (which didn’t get modified by the update), the third block held the s1 and l2 columns, the last block held the s2, l3 and s3 columns. I had been expecting to see the split as (head-piece(, (n1, l1, s1), (l2, s2), (l3, s3) – but as it turned out the unexpected split was a bonus.

Here are extracts from each of the blocks (in the order they appeared in the chain), showing the ITL information and the “row overhead” information. If you scan through the list you’ll see that three of the 4 blocks have an ITL entry for transaction id (xid) of 8.1e.df3, using three consecutive undo records in undo block 0x0100043d. My update has locked 3 of the 4 rowpieces – the header and the two that have changed. It didn’t need to “lock” the piece that didn’t change. (This little detail was the bonus of the unexpected split.)


Block 184
---------
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.36  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.2e  ----    1  fsc 0x0000.00000000

...

tab 0, row 1, @0xf18
tl: 9 fb: --H----- lb: 0x2  cc: 0
nrid:  0x00800089.0



Block 137       (columns n1, l1 - DID NOT CHANGE so no ITL entry acquired)
---------       (the lock byte relates to the previous, not cleaned, update) 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.35  --U-    1  fsc 0x0000.005beb39
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xfcb
tl: 4021 fb: ----F--- lb: 0x1  cc: 2
hrid: 0x008000b8.1
nrid:  0x00800085.0



Block 133 (columns s1, l2)
--------------------------
Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.34  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.2f  ----    1  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xf0b
tl: 4213 fb: -------- lb: 0x2  cc: 2
nrid:  0x008000bc.0



Block 188 (columns s2, l3, s3)
------------------------------
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00b.00000ee1  0x01000bc0.036a.33  C---    0  scn  0x00000000005beb39
0x02   0x0008.01e.00000df3  0x0100043d.0356.30  ----    1  fsc 0x0000.00000000
0x03   0x0000.000.00000000  0x00000000.0000.00  C---    0  scn  0x0000000000000000

...

tab 0, row 0, @0xe48
tl: 4408 fb: -----L-- lb: 0x2  cc: 3

Note, by the way, how there are nrid (next rowid) entries pointing forward in every row piece (except the last), but it’s only the “F” (First) row-piece has the hrid (head rowid) pointer pointing backwards.

 

May 9, 2016

RI Locks

Filed under: deadlocks,Indexing,IOT,Locks,Oracle,trace files,Troubleshooting — Jonathan Lewis @ 12:24 pm BST May 9,2016

RI = Referential Integrity: also known informally as parent/child integrity, and primary (or unique) key/foreign key checking.

I’m on a bit of a roll with things that I must have explained dozens or even hundreds of times in different environments without ever formally explaining them on my blog. Here’s a blog item I could have done with to response to  a question that came up on the OTN database forum over the weekend.

What happens in the following scenario:


-- session 1

create table parent (
        id        number(8,0),
        constraint par_pk primary key(id)
);

create table child  (
        id_p      number(8,0) not null references parent,
        id_c      number(8,0) not null,
        constraint child_pk primary key(id_p, id_c)
)
;

insert into parent values(1);

-- session 2
insert into child values(1,1);

Since the parent row corresponding to the child row doesn’t (yet) seem to exist as far as session 2 is concerned you might expect session 2 to respond immediately with an error message like:

ERROR at line 1:
ORA-02291: integrity constraint (TEST_USER.SYS_C0017926) violated - parent key not found

In fact, although the end-user is not allowed to see the uncommitted parent row, the user’s process can see the uncommitted row and will wait until session 1 commits or rolls back – so if you examine v$lock for the current locks for the two sessions you’d see something like this:

  1  select  sid, type, id1, id2, lmode, request, ctime, block
  2  from    V$lock
  3  where   sid in (select sid from V$session where username = 'TEST_USER')
  4  and     type != 'AE'
  5  order by
  6*         sid, type desc
  7  /

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
         3 TX     327709      12584          6          0        283          1
           TM     143734          0          2          0        283          0
           TM     143732          0          3          0        283          0

       250 TX     589829      12877          6          0        240          0
           TX     327709      12584          0          4        240          0
           TM     143734          0          3          0        240          0
           TM     143732          0          3          0        240          0


7 rows selected.

In the above, SID 250 is session 2: it’s holding a transaction lock (TX) in mode 6 because it has acquired an undo segment and has generated some undo, it’s also waiting for a transaction lock in mode 4 (share) and – checking id1 and id2 – we can see that the transaction table entry it’s waiting for is held by session 3 in mode 6 (and we also note that the lock held by session 3 is marked as a blocker).

If session 3 commits (thus releasing the transaction lock) session 250 will continue processing the insert; if session 3 rolls back session 250 will raise error ORA-02291 and roll back its insert statement. (Note: if this were a multi-statement transaction it would only be the insert into child that would be rolled back; that’s another one of those details that is important but often isn’t stated explicitly, leaving people believing that the entire transaction would be rolled back.)

Updates and deletes can produce the same effects. Imagine that we have just created the two tables, and then run the following:


-- session 1
insert into parent values(1);
commit;
delete from parent where id = 1;

-- session 2
insert into child values(1,1);

Again session 2 will wait for session 1 to commit or roll back. In this case if session 1 commits session 2 will raise Oracle error ORA-02291, if session 1 rolls back session 2 will continue with the insert.

Deadlocks

Whenever you can demonstrate a way of producing a wait chain you can also manage to produce a deadlock. Consider the following (starting, again, from empty tables);


-- (1) session 1
insert into parent values(1);

-- (2) session 2
insert into parent values(2);

-- (3) session 1
insert into child values(2,2);

-- (4)session 2
insert into child values(1,1);

Session 1 will start waiting for session 2 to commit (or rollback) at step 3, then session 2 will start to wait for session 1 at step 4 – with the result that session 1 will recognise the deadlock after about three seconds and rollback its last statement, raising exception ORA-00060 and dumping a trace file. (Note: session 1 will not, as many people think, roll back the entire transaction, it will only roll back the statement that allowed the deadlock to develop). Session 2 will still be waiting for session 1 to commit or rollback its insert into parent. Contrary to the popular claim, Oracle will not “resolve” the deadlock, it will simply break the deadlock leaving one session waiting for the other session to respond appropriately to the deadlock error.

For reference, here’s the deadlock graph (from a 12c trace file) produced by session 1 (SID = 3) for this demo:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00010017-000026C7-00000000-00000000          6       3     X             33     250           S
TX-000A000D-000026F8-00000000-00000000         33     250     X              6       3           S

session 3: DID 0001-0006-00000004       session 250: DID 0001-0021-00000041
session 250: DID 0001-0021-00000041     session 3: DID 0001-0006-00000004

Rows waited on:
  Session 3: no row
  Session 250: no row

When you see a deadlock graph with TX waits of type S (share, mode 4) it’s a very good bet that the wait has something to do with indexes – which may mean referential integrity as discussed here, but may mean collisions on primary keys, and may mean something to do with simple collisions on index-organized tables. You’ll notice that the “Rows waited on:” section shows no row – unfortunately in earlier versions of Oracle you may find a spurious row entry here because the wait information from some other (block) wait has been left in the relevant columns in v$session.

October 22, 2014

Deadlocks

Filed under: deadlocks,Locks,Oracle — Jonathan Lewis @ 6:34 pm BST Oct 22,2014

A recent question on the OTN forum asked about narrowing down the cause of deadlocks, and this prompted me to set up a little example. Here’s a deadlock graph of a not-quite-standard type:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00040001-000008EC-00000000-00000000         50     249     X             48       9           X
TX-000A001F-000008BC-00000000-00000000         48       9     X             50     249           S


My session (the one that dumped the trace file) is 249, and I was blocked by session 9. The slight anomaly, of course, is that I was waiting on a TX lock in mode 4 (Share) rather than the more common mode 6 (eXclusive).

There are plenty of notes on the web these days to tell you that TX mode 4 relates in some way to a unique index (or some associated referential integrity) or an ITL wait. (Inevitably there are a couple of other less frequently occurring and less well documented reasons, such as waits for tablespaces to change state but I’m going to ignore those for now). The question is, how do I tell whether this example is related to uniqueness (indexing) or ITLs ? For recent versions of Oracle the answer is in the rest of the trace file which now holds the “Current Wait Stack” and “Session Wait History” for the session that dumped the trace file.

Reading down my trace file, past the line which says “Information for THIS session”, I eventually get to this:


    Current Wait Stack:
     0: waiting for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=80 seq_num=81 snap_id=1
 

So it didn’t take me long to find out I had an ITL problem (which should be a pretty rare occurrence in newer versions of Oracle); but there’s more:

...

    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 9, ser: 40192
      Dumping final blocker:
        inst: 1, sid: 9, ser: 40192
    There are 2 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 357, ser: 7531
      wait event: 'enq: TX - allocate ITL entry'

...

    Session Wait History:
        elapsed time of 0.000035 sec since current wait
     0: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39
        wait_id=79 seq_num=80 snap_id=1
        wait times: snap=5.002987 sec, exc=5.002987 sec, total=5.002987 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000047 sec of elapsed time
     1: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=78 seq_num=79 snap_id=1
        wait times: snap=1 min 4 sec, exc=1 min 4 sec, total=1 min 4 sec
        wait times: max=1 min 4 sec
        wait counts: calls=22 os=22
        occurred after 0.000032 sec of elapsed time

...
     8: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39
        wait_id=71 seq_num=72 snap_id=1
        wait times: snap=5.001902 sec, exc=5.001902 sec, total=5.001902 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000042 sec of elapsed time
     9: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=70 seq_num=71 snap_id=1
        wait times: snap=4.005342 sec, exc=4.005342 sec, total=4.005342 sec
        wait times: max=4.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000031 sec of elapsed time

...

    Sampled Session History of session 249 serial 3931
    ---------------------------------------------------

    The history is displayed in reverse chronological order.

    sample interval: 1 sec, max history 120 sec
    ---------------------------------------------------
      [9 samples,                                          11:14:50 - 11:14:58]
        waited for 'enq: TX - allocate ITL entry', seq_num: 81
          p1: 'name|mode'=0x54580004
          p2: 'usn<= 8 sec (still in wait)
      [5 samples,                                          11:14:45 - 11:14:49]
        waited for 'enq: TX - allocate ITL entry', seq_num: 80
          p1: 'name|mode'=0x54580004
          p2: 'usn<<16 | slot'=0x5000c
          p3: 'sequence'=0xa39
          time_waited: 5.002987 sec (sample interval: 4 sec)
...

The little report that follows the initial wait state shows that the situation was a little messy – session 9 was my first and last blocker, but there was another session tangled up in the chain of waits, session 357.

Following this there’s a set of entries from my v$session_wait_history – and if you look carefully at the slot and sequence that appears on the second line of each wait you’ll notice that my waits have been alternating between TWO other sessions/transactions before I finally crashed.

Finally there’s a set of entries for my session extracted from v$active_session_history. (Question: I’m only allowed to query v$active_session_history if I’ve licensed the Diagnostic Pack – so should I shut my eyes when I get to this part of the trace file?) This breakdown also shows my session alternating between waits on the two different blockers, giving me a pretty good post-event breakdown of what was going on around the time of the deadlock.

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am BST Mar 13,2014

Here’s an example of a nasty accident that can be seen in a slightly unusual output from v$lock (on 11.2.0.4)
(more…)

April 13, 2013

Deadlocks

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 11:36 am BST Apr 13,2013

Here’s a deadlock graph that might cause a little confusion:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
 Resource Name          process session holds waits  process session holds waits
 TX-001a0015-00014787        34      90     X             32       3           S
 TX-00190008-0000601b        32       3     X             34      90           S

 session 90: DID 0001-0022-00000327      session 3: DID 0001-0020-000009E9
 session 3: DID 0001-0020-000009E9       session 90: DID 0001-0022-00000327

 Rows waited on:
  Session 90: obj - rowid = 00030CE5 - AAAwzlAAGAABDiuAAA
  (dictionary objn - 199909, file - 6, block - 276654, slot - 0)
  Session 3: obj - rowid = 00030CE5 - AAAwzlAAGAABp8gAAA
  (dictionary objn - 199909, file - 6, block - 433952, slot - 0)

(more…)

March 20, 2013

Lock Bug

Filed under: Bugs,deadlocks,Locks,Oracle — Jonathan Lewis @ 12:08 pm BST Mar 20,2013

Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
(more…)

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm BST Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

(more…)

August 29, 2011

Deadlock

Filed under: deadlocks,Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:25 pm BST Aug 29,2011

Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

(more…)

April 8, 2011

Deadlock

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:57 pm BST Apr 8,2011

Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):
(more…)

Powered by WordPress.com.