Oracle Scratchpad

April 8, 2011

Deadlock

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:57 pm BST Apr 8,2011

Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):

[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00010006-00002ade        16      34     X             12      50           S
TX-00050029-000037ab        12      50     X             16      34           S
session 34: DID 0001-0010-00000021	session 50: DID 0001-000C-00000024
session 50: DID 0001-000C-00000024	session 34: DID 0001-0010-00000021
Rows waited on:
Session 50: no row
Session 34: no row
Information on the OTHER waiting sessions:
Session 50:
  pid=12 serial=71 audsid=1560855 user: 52/TEST_USER
  O/S info: user: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
            program: sqlplus.exe
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  update t1 set n3 = 99 where id = 100
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update t1 set n3 = 99 where id = 200

The anomaly is that the waiters are both waiting on S (share) mode locks for a TX enqueue.

It’s fairly well known that Share (and Share sub exclusive, SSX) lock waits for TM locks are almost a guarantee of a missing “foreign key index”; and it’s also fairly well known that Share lock waits for TX locks can be due to bitmap collisions, issues with ITL (interested transaction list) waits, various RI (referential integrity) collisions including simultaneous inserts of the same primary key.

A cause for TX/4 waits that is less well known or overlooked (because a feature is less-well used) is simple data collisions on IOTs (index organized tables). In the example above t1 is an IOT with a primary key of id. Session 34 and 50 have both tried to update the rows with ids 100 and 200 – in the opposite order. If this were a normal heap table the deadlock graph would be showing waits for eXclusive TX locks, because it’s an IOT (and therefore similar in some respects to a primary key wait) we see waits for Share TX locks.

8 Comments »

  1. TM-enqueus for foreign key locking seems to be a well known fact. What I personally would like to know, what other reasons can cause TM-Locks.
    A few times already I had issues with them, not involving foreign keys.

    Sligtly more on topic, did anyone know, that a single user connected session can cause a deadlock?

    on 10.2.0.3 and 11.1.0.7 verified:

    DROP TABLE TESTSZ;
    CREATE TABLE TESTSZ
    ( I NUMBER,
      K VARCHAR2(50));
    INSERT INTO TESTSZ
    SELECT MOD(ROWNUM,1000), LPAD('X',40,'Y') FROM DUAL CONNECT BY ROWNUM < 1000000; 
    COMMIT;
    LOCK TABLE TESTSZ IN EXCLUSIVE MODE;
    INSERT INTO TESTSZ X
    WITH X1 AS (SELECT /*+ no_PARALLEL(T) MATERIALIZE*/ 2000, K FROM TESTSZ T
           WHERE I=1)
    SELECT * FROM X1;
    rollback;
    LOCK TABLE TESTSZ IN EXCLUSIVE MODE;
    set timing on
    INSERT INTO TESTSZ X
    WITH X1 AS (SELECT /*+ PARALLEL(T,4) MATERIALIZE*/ 2000, K FROM TESTSZ T
           WHERE I=1)
    SELECT * FROM X1;
    rollback; 
    

    Result of the last insert statement:

    INSERT INTO TESTSZ X
    *
    FEHLER in Zeile 1:
    ORA-12801: error signaled in parallel query server P000
    ORA-00060: deadlock detected while waiting for resource
    

    Before anyone complains there are more than one session involved,
    yes, but the importance is only one "user" connected session.

    Moving back to TM locks, in the case above with the factored subquery
    in combination with parallel query, the deadlock occurs while the sessions
    are trying to get TM-locks.

    SID	PROGRAM				LAST_CALL_ET	BLOCKING_SESSION	EVENT			SECONDS_IN_WAIT	
    116	sqlplus.exe			48					PX Deq: Parse Reply	3	WAITING	
    117	oracle@cyber-ora10 (P005)	48					PX Deq: Execution Msg	3	WAITING	
    128	oracle@cyber-ora10 (P004)	48					PX Deq: Execution Msg	3	WAITING	
    115	oracle@cyber-ora10 (P003)	48					PX Deq: Execution Msg	3	WAITING	
    129	oracle@cyber-ora10 (P002)	48		116			enq: TM - contention	48	WAITING		
    120	oracle@cyber-ora10 (P001)	48		116			enq: TM - contention	48	WAITING		
    136	oracle@cyber-ora10 (P000)	48		116			enq: TM - contention	48	WAITING	
    	
    

    To finish, Oracle support aggreed, that this is not supposed to happen. The workaround is to use the inline hint.

    Comment by Stefan — April 9, 2011 @ 9:06 am BST Apr 9,2011 | Reply

    • Stefan,

      Thanks for that, and sorry about the delay in follow.

      The deadlock message looks as if it’s not the right one – the wait when I copied your test case was about 60 seconds (a little longer) which suggests it’s actually an “assumed distributed deadlock” (which is consistent with the whole parallel DML operation). Commenting out your “lock table in exclusive mode” obviously avoided the deadlock.

      As far as I know, once you’ve covered the usual FK sinners, it’s only oddities of table locking for parallel, distributed, or autonomous transactions that can cause TM deadlocks.

      Comment by Jonathan Lewis — April 20, 2011 @ 2:21 pm BST Apr 20,2011 | Reply

      • Thanks for the answer.
        It’s funny, how I did not look any further, once I found a perceived consistent explanation.
        But I should have noticed, in a very simple test, standard deadlock detection already signals after 3 seconds.

        But now, out of curiosity I changed on my test system the setting of distributed_lock_timeout parameter and retested after instance bounce.

        On 10.2.0.4, parameter change had no effect:

        SQL> select version from v$instance;
        
        VERSION
        ----------
        10.2.0.4.0
        
        SQL> select name, value from v$parameter where name='distributed_lock_timeout';
        
        NAME                      VALUE
        ------------------------- -----
        distributed_lock_timeout  180
        
        
        SQL> LOCK TABLE TESTSZ IN EXCLUSIVE MODE;
        
        Tabelle(n) wurde(n) gesperrt.
        
        SQL> set timing on
        SQL> INSERT INTO TESTSZ X
          2   WITH X1 AS
          3    (SELECT /*+ PARALLEL(T,4) MATERIALIZE*/ 2000, K FROM TESTSZ T WHERE I=1)
          4   SELECT * FROM X1;
        INSERT INTO TESTSZ X
        *
        FEHLER in Zeile 1:
        ORA-12801: error signaled in parallel query server P000
        ORA-00060: deadlock detected while waiting for resource
        
        Abgelaufen: 00:01:04.66
        

        So there must be some more to this, than I currently undertstand. Bu whatever it is, I don’t think it is that terribly important, so I leave it as it is.

        Comment by Stefan — April 22, 2011 @ 12:08 pm BST Apr 22,2011 | Reply

        • “But whatever it is, I don’t think it is that terribly important, so I leave it as it is.”

          Good strategic choice – there are plenty of things which are interesting oddities, but once we have a good enough idea of what’s happening it’s often time to take a quick note and move on. I often find that after collecting “partial solutions” over a period of time the moment comes when three of four pieces of information happen to fit together and give me some insight into a bigger design issue.

          In passing I ran the test one more time with wait state tracing enabled (and a change of distributed lock timeout) and every single PX slave dumped the following trace (with different process and session ids for the waiter, of course) after waiting 20 times on a 3-second TM/3 enqueue:

          DEADLOCK DETECTED ( ORA-00060 )
          [Transaction Deadlock]
          The following deadlock is not an ORACLE error. It is a
          deadlock due to user error in the design of an application
          or from issuing incorrect ad-hoc SQL. The following
          information may aid in determining the deadlock:
          Deadlock graph:
                                 ---------Blocker(s)--------  ---------Waiter(s)---------
          Resource Name          process session holds waits  process session holds waits
          TM-000131f8-00000000        12      50     X             18      30          SX
          session 50: DID 0001-000C-0000001B	session 30: DID 0001-000C-0000001B
          Rows waited on:
          Session 30: no row
          Information on the OTHER waiting sessions:
          End of information on OTHER waiting sessions.
          Current SQL statement for this session:
          INSERT INTO TESTSZ X
          WITH X1 AS (SELECT /*+ PARALLEL(T,4) materialize */ 2000, K FROM TESTSZ T
          WHERE I=1)
          SELECT * FROM X1
          
          

          Comment by Jonathan Lewis — April 23, 2011 @ 8:31 am BST Apr 23,2011

  2. [...] Deadlocks are such an important topic that they routinely attract attention of world’s best professional and Jonathan Lewis is no exception. [...]

    Pingback by Log Buffer #216, A Carnival of the Vanities for DBAs | The Pythian Blog — April 15, 2011 @ 11:12 am BST Apr 15,2011 | Reply

  3. [...] time now, I have been thinking about writing some blog items, but never did. A recent blog entry on http://jonathanlewis.wordpress.com/2011/04/08/deadlock prompted me to give it a [...]

    Pingback by TM enqueues and deadlock with one user session « perfexpert2 — April 16, 2011 @ 5:37 pm BST Apr 16,2011 | Reply

  4. Hi Jonathan,

    the same deadlock tree appears if the reason is ITL contention on heap table:

    Deadlock graph:
                           ---------Blocker(s)--------  ---------Waiter(s)---------
    Resource Name          process session holds waits  process session holds waits
    TX-000a000f-00001880        83     152     X             85     253           S
    TX-00120017-000018cb        85     253     X             82     111           S
    TX-00210009-00000cde        82     111     X             84     209           S
    TX-001f001a-00000cde        84     209     X             83     152           S
    
    session 152: DID 0001-0053-00005149     session 253: DID 0001-0055-000046C4
    session 253: DID 0001-0055-000046C4     session 111: DID 0001-0052-0001929D
    session 111: DID 0001-0052-0001929D     session 209: DID 0001-0054-000049EF
    session 209: DID 0001-0054-000049EF     session 152: DID 0001-0053-00005149
    
    Rows waited on:
      Session 152: no row
      Session 253: no row
      Session 111: no row
      Session 209: no row
    
    
     Session Wait History:
            elapsed time of 0.000029 sec since current wait
         0: waited for 'enq: TX - allocate ITL entry'
    ...
    

    Best regards,
    Martin

    Comment by Martin Decker — May 11, 2011 @ 9:14 am BST May 11,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,003 other followers