Oracle Scratchpad

October 22, 2014

Deadlocks

Filed under: deadlocks,Locks,Oracle — Jonathan Lewis @ 6:34 pm BST Oct 22,2014

A recent question on the OTN forum asked about narrowing down the cause of deadlocks, and this prompted me to set up a little example. Here’s a deadlock graph of a not-quite-standard type:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00040001-000008EC-00000000-00000000         50     249     X             48       9           X
TX-000A001F-000008BC-00000000-00000000         48       9     X             50     249           S


My session (the one that dumped the trace file) is 249, and I was blocked by session 9. The slight anomaly, of course, is that I was waiting on a TX lock in mode 4 (Share) rather than the more common mode 6 (eXclusive).

There are plenty of notes on the web these days to tell you that TX mode 4 relates in some way to a unique index (or some associated referential integrity) or an ITL wait. (Inevitably there are a couple of other less frequently occurring and less well documented reasons, such as waits for tablespaces to change state but I’m going to ignore those for now). The question is, how do I tell whether this example is related to uniqueness (indexing) or ITLs ? For recent versions of Oracle the answer is in the rest of the trace file which now holds the “Current Wait Stack” and “Session Wait History” for the session that dumped the trace file.

Reading down my trace file, past the line which says “Information for THIS session”, I eventually get to this:


    Current Wait Stack:
     0: waiting for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=80 seq_num=81 snap_id=1
 

So it didn’t take me long to find out I had an ITL problem (which should be a pretty rare occurrence in newer versions of Oracle); but there’s more:

...

    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 9, ser: 40192
      Dumping final blocker:
        inst: 1, sid: 9, ser: 40192
    There are 2 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 357, ser: 7531
      wait event: 'enq: TX - allocate ITL entry'

...

    Session Wait History:
        elapsed time of 0.000035 sec since current wait
     0: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39
        wait_id=79 seq_num=80 snap_id=1
        wait times: snap=5.002987 sec, exc=5.002987 sec, total=5.002987 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000047 sec of elapsed time
     1: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=78 seq_num=79 snap_id=1
        wait times: snap=1 min 4 sec, exc=1 min 4 sec, total=1 min 4 sec
        wait times: max=1 min 4 sec
        wait counts: calls=22 os=22
        occurred after 0.000032 sec of elapsed time

...
     8: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0x5000c, sequence=0xa39
        wait_id=71 seq_num=72 snap_id=1
        wait times: snap=5.001902 sec, exc=5.001902 sec, total=5.001902 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000042 sec of elapsed time
     9: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=70 seq_num=71 snap_id=1
        wait times: snap=4.005342 sec, exc=4.005342 sec, total=4.005342 sec
        wait times: max=4.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000031 sec of elapsed time

...

    Sampled Session History of session 249 serial 3931
    ---------------------------------------------------

    The history is displayed in reverse chronological order.

    sample interval: 1 sec, max history 120 sec
    ---------------------------------------------------
      [9 samples,                                          11:14:50 - 11:14:58]
        waited for 'enq: TX - allocate ITL entry', seq_num: 81
          p1: 'name|mode'=0x54580004
          p2: 'usn<= 8 sec (still in wait)
      [5 samples,                                          11:14:45 - 11:14:49]
        waited for 'enq: TX - allocate ITL entry', seq_num: 80
          p1: 'name|mode'=0x54580004
          p2: 'usn<<16 | slot'=0x5000c
          p3: 'sequence'=0xa39
          time_waited: 5.002987 sec (sample interval: 4 sec)
...

The little report that follows the initial wait state shows that the situation was a little messy – session 9 was my first and last blocker, but there was another session tangled up in the chain of waits, session 357.

Following this there’s a set of entries from my v$session_wait_history – and if you look carefully at the slot and sequence that appears on the second line of each wait you’ll notice that my waits have been alternating between TWO other sessions/transactions before I finally crashed.

Finally there’s a set of entries for my session extracted from v$active_session_history. (Question: I’m only allowed to query v$active_session_history if I’ve licensed the Diagnostic Pack – so should I shut my eyes when I get to this part of the trace file?) This breakdown also shows my session alternating between waits on the two different blockers, giving me a pretty good post-event breakdown of what was going on around the time of the deadlock.

9 Comments »

  1. It is really nice that oracle added snapshots of v$active_session_history to systemstate dump, which use to be part of a deadlock tracefile. I think it has been introduced somewhere in 10gR2 and is really helpful in many situations, for example instance hung (then just login with prelim option and take systemstate dump, optionaly with short stack trace of every session).

    Regards Pavol

    Comment by Pavol Babel — October 22, 2014 @ 10:57 pm BST Oct 22,2014 | Reply

  2. […] Deadlock trace (Oct 2014): an explanation of how you can find out more details about the source of a deadlock from the deadlock trace file. […]

    Pingback by Lock Catalogue | Oracle Scratchpad — February 19, 2022 @ 10:34 am GMT Feb 19,2022 | Reply

  3. Hello Jonathan

    As far as I know table blocks always have at least 2 ITL entries (even when INITRANS is set to 1). If this is correct, how can 2 sessions deadlock on ITL entries as indicated by the trace file from which you show an excerpt? All scenarios I can think of involve at least 3 sessions.

    Interestingly I found the following test from Christian Antognini https://antognini.ch/2013/05/itl-deadlocks-script/. There three sessions are used to produce an an ITL deadlock. I played a bit around with the test case and I can produce situations where three sessions appear in the trace file as well as cases where only two sessions appear in the trace file. The latter can happen when I omit the last step, i.e. “UPDATE t SET n=n*10 WHERE n = 4;” in ‘this’ session / session 1. In this case however, although Oracle RDBMS reports an ORA-00060, I believe there is no real deadlock situation in that session 1 (‘this’ session) is not blocked and once session 1 terminates the transaction, the other two sessions 2 and 3 would be unblocked as well. Is it possible that Oracle RDBMS reports a deadlock for a mere blocking scenario, i.e. without having a “real” deadlock and if so could this be considered a bug?

    thank you
    kind regards
    Martin

    Comment by Martin Maletinsky — October 29, 2022 @ 12:46 pm BST Oct 29,2022 | Reply

    • Martin,

      Thanks for the comment. My first thought when I read it was that I really ought to kept a note of the script I used to generate the demo, and mentioned the version of Oracle I was using.

      Once upon a time tables really did get only one slot in the ITL if you set initrans to 1, but I finally found a note that said this changed in 9i – so that wouldn’t be an explanation for a two-session deadlock in 2014. Then I realised that the example was a three session deadlock anyway.

      I finally found the script I’d used to create the model. According to the notes it was one I’d written in 2002 for 8i, then had to edit for 9i because of the minimum 2 ITL slots, then had to edit for 10g because 10g ignores maxtrans. And then the notes say that the code needs to be updated to make a deadlock certain because of the way luck of process ids that dictate the ITL slots chosen to wait on.

      The last time I tested it was 10.1.0.2 – and there have been changes in the algorithm Oracle uses to pick a starting ITL slot, how long it waits on each ITL slot, how many times it cycles through the slots, and which slot it chooses for its final “indefinite” wait. So I’d probably have to do a bit more work to produce a bullet-proof case – though my notes also describe something that (at the time) I thought was guaranteed to produce the deadlock.

      Regards
      Jonathan Lewis

      Note to self for future reference: the script was named c_itl_dead.sql

      P.S. I’ll have to take a little time to think about, and model for, your comment regarding whether or not this is a “real” deadlock. Given all the changes in the “test and wait” algorithm it’s possible that sometimes it is a “real” deadlock and other times it’s a see-saw (as with my comments in the note) that can resolve if the third transaction gets out of the way.

      Comment by Jonathan Lewis — October 31, 2022 @ 9:59 am GMT Oct 31,2022 | Reply

      • Martin,

        I couldn’t resist the temptation, so I set up a model to force a deadlock using 3 sessions and 3 blocks, using 3 different rows in each block, and no space for adding ITLs.

        I got a “real” deadlock on one session and the other two sessions (eventually) went into an indefinite wait for that session to commit or rollback. The wait patterns were not quite what I expected, and I will try to write up the model and results sometime over the next few days.

        Regards
        Jonathan Lewis

        Comment by Jonathan Lewis — October 31, 2022 @ 12:54 pm GMT Oct 31,2022 | Reply

      • Hello Jonathan,

        Thank you for reply.

        Shouldn’t we expect in a real three session deadlock to see all three sessions in the deadlock graph in the ORA-00060 trace file, while the excerpt you show in the post above contains only two sessions in the deadlock graph?

        As mentioned previously, using the script from Christian Antognini I can generate three session deadlock graphs as well as two session deadlock graphs as follows:

        Deadlock graph:
        
                                                  ------------Blocker(s)-----------  ------------Waiter(s)------------
        Resource Name                             process session holds waits serial  process session holds waits serial
        TX-0055000A-000085A8-00000000-00000000         89     911     X         9189     102      18           S  48727
        TX-00630018-00007185-00000000-00000000        102      18     X        48727     124     742           S  65486
        TX-004F001F-00007C50-00000000-00000000        124     742     X        65486      89     911           S   9189
        
        --
        
        Deadlock graph:
                                                  ------------Blocker(s)-----------  ------------Waiter(s)------------
        Resource Name                             process session holds waits serial  process session holds waits serial
        TX-0057000F-00003849-00000000-00000000         85     195     X        42254     102      14           S  17598
        TX-00470021-0000463F-00000000-00000000        102      14     X        17598      85     195           S  42254
        

        The second case may appear when I omit the last step in the script from Christian (where I believe there is no ‘real’ deadlock, but the blocking could be resolved if the first session got out of the way).

        kind regards
        Martin

        Comment by Martin Maletinsky — October 31, 2022 @ 5:49 pm GMT Oct 31,2022 | Reply

  4. Martin,

    I haven’t looked at Chris’ script, so I won’t comment directly. What I will say about my model is that when the two-session deadlock graph appears it is a real deadlock.

    Stripped to a minimum the need for a very careful use of words is based on the fact that a session will cycle through the ITL slots waiting for each one in turn. This means there is scope for a 3-way deadlock to reveal (temporarily) a 2-way deadlock as all three sessions switch ITLs at slightly different moments.

    It appears that Oracle doesn’t always report a deadlock while there is scope for sessions to move on to a different ITL so, after the 2-way deadlock is seen and one session has rolled back its last update, the other two sessions can cycle in and out of a deadlock (part of the time they will be waiting for each other’s ITL slot, part of the time they will be in a simple wait for the ITL belonging to the session that has reported the deadlock if that session doesn’t do anything in response to the deadlock exception).

    Regards
    Jonathan Lewis

    Comment by Jonathan Lewis — October 31, 2022 @ 6:45 pm GMT Oct 31,2022 | Reply

    • Hi Jonathan

      Maybe, if I understand your reasoning correctly, we both have a slightly different perception what a “real deadlock” would be.

      I am thinking of a situation where three sessions S1, S2 and S3 modify rows in two blocks A and B with two ITL slots (and no room for further expansion). I’ll denote the ITL slots with A.1, A.2, B.1 and B.2 correspondingly.

      If I understand your comment correctly, you refer to the following situation

      Within a transaction T1, S1 has modified rows in both blocks and is using ITL entries A.1 and B.1
      Within a transaction T2, S2 has modified rows in block A and is therefore using ITL entry A.2 (as it is the only one which remains available)
      Within a transaction T3, S3 has modified rows in block B and is therefore using ITL entry B.2 (as it is the only one which remains available)

      (Still within transaction T2) session S2 wants to modify a row in block B and needs to wait as no ITL entry is available in that block -> S2 is blocked, waiting in a round-robin fashion on T1/S1 (ITL entry B.1) and T3/S3 (ITL entry B.2)

      (Still within transaction T3) session S3 wants to modify a row in block A and needs to wait as no ITL entry is available in that block -> S3 is blocked, waiting in a round-robin fashion on T1/S1 (ITL entry A.1) and T2/S2 (ITL entry A.2)

      While S2 and S3 cycle through the ITL entries the following situation may occur at one point in time:
      S2 waits for T3/S3 (ITL entry B.2) and at the same time S3 waits for T2/S2 (ITL entry A.2)

      If I understand you correctly, you would call this situation a “real deadlock” as the two sessions block each other mutually?

      My understanding of this situation is that this situation is not a “real deadlock” because if S2 and S3 would continue cycling through the ITL slots, there is the possibility that both could progress in their work once S1 terminates transaction T1.
      Here I see a conceptual difference from what I would call a “real” 2 session deadlock (e.g. when two sessions want to modify the same two rows, and each of the sessions starts modifying one of those rows and than finds itself blocked on the other row). The difference I see is that in this second case, there is no chance that any of the two sessions could progress in the future, unless one of the two transactions is terminated (i.e. by an ORA-00060).

      Please let me know which other scenario you were thinking of in case I misunderstood the reasoning in your previous post.

      Also if you have a test case / script to show case your model, I would greatly appreciate if you could share it.

      thank you
      kind regards
      Martin

      Comment by Martin Maletinsky — November 1, 2022 @ 5:58 pm GMT Nov 1,2022 | Reply

  5. Martin,

    I said I was using 3 blocks in my demo to produce a “real” deadlock; and it really is a deadlock because none of the 3 sessions I use can go forward and one of them has to raise ORA-00060 before any further change can take place.

    I also said in the previous comment that: “it’s possible that sometimes it is a “real” deadlock and other times it’s a see-saw (as with my comments in the note) that can resolve if the third transaction gets out of the way.”, which is exactly what happens when the initial deadlock is broken, and is the phenomenon that I am not labelling as a “real” deadlock.

    I said I’d try to write up the model and results in a few days.

    Regards
    Jonathan Lewis

    Comment by Jonathan Lewis — November 2, 2022 @ 1:21 pm GMT Nov 2,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: