Oracle Scratchpad

September 12, 2014

Unusual Deadlock

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 1:08 pm BST Sep 12,2014

Prompted by a question on OTN I came up with a strategy for producing an ORA-00060 deadlock that didn’t produce a deadlock graph (because there isn’t one) and didn’t get reported in the alert log (at least, not when tested on 11.2.0.4). It’s a situation that shouldn’t arise in a production system because it’s doing the sorts of things that you shouldn’t do in a production system: but if you’re trying to do some maintenance or upgrades while keeping the system live it could possibly happen.

Here’s the starting code:

rem
rem     Script:         libcache_deadlock.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sept 2014
rem
rem     Last tested
rem             11.2.0.4
rem

drop procedure p2;
drop procedure p1;

drop table t1 purge;
create table t1 (n1 number);

insert into t1 values(1);
insert into t1 values(3);

create or replace procedure p1
as
begin
        update t1 set n1 = 2 where n1 = 1;
        dbms_lock.sleep(10);
        update t1 set n1 = 4 where n1 = 3;
end;
/

create or replace procedure p2
as

        procedure q
        is
                pragma autonomous_transaction;
        begin
                execute immediate 'drop procedure p1';
        end;

begin

        update t1 set n1 = 4 where n1 = 3;
        q;

end;
/

Of course you’re asking for trouble if you start doing DDL as part of your production code; and you’re asking for trouble if you start playing around with autonomous transactions; and if you use one to do the other you’re almost guaranteed to hit a problem eventually. All it takes in this case is a simple sequence of actions followed by a short wait (ca. 10 seconds):

  • Session 1: execute p1
  • Session 2: wait a couple of seconds, then execute p2

I got the following result from session 2:


BEGIN p2; END;

*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "TEST_USER.P2", line 8
ORA-06512: at "TEST_USER.P2", line 14
ORA-06512: at line 1

While this doesn’t show up in the alert log, I do get a trace file dumped for the session; in fact I got a trace file from both processes. The trace from the process that reported the deadlock started like this:


DEADLOCK DETECTED

  Performing diagnostic dump and signaling ORA-00060

  Complete deadlock information is located in the trace file of process (pid: 8, osid: 17861, DIA0)
    trace file: /u01/app/oracle/diag/rdbms/test/test/trace/test_dia0_17861.trc

-------------------------------------------------------------------------------

*** 2014-09-09 12:44:13.427
-------------------------------------------------------------------------------
HUNG PROCESS DIAGNOSTIC DUMP BEGIN:

    dump requested by process (pid: 8, osid: 17861, DIA0)
    trace file: /u01/app/oracle/diag/rdbms/test/test/trace/test_dia0_17861.trc
-------------------------------------------------------------------------------

The other process didn’t mention a deadlock, but started at the “HUNG PROCESS DIAGNOSTIC” line – the trace file had been triggered by the diag process. Of course, the diag process also dumped a trace file, summarising the situation; and this started with the following important note:


*** 2014-09-09 12:44:13.426
-------------------------------------------------------------------------------

DEADLOCK DETECTED (id=0xf678cfe3)

Chain Signature: 'enq: TX - row lock contention'<='library cache pin' (cycle)
Chain Signature Hash: 0x39f9868d

The rest of the trace file told me what the other two processes had been doing when the ORA-00060 was initiated, but the point I want to pick up here is that we have a very brief summary in the “Chain Signature” that tells us we’ve had a collision between some DML (enq: TX) and some sort of cursor-like problem (library cache pin) and not a simple data cross-over.

If you’ve looked at the original OTN posting you’ll see that the Chain Signature in that case is “Chain Signature: ‘library cache lock'<=’library cache pin’ (cycle)”, which indicates a collision restricted entirely to the library cache (the lock suggests someone is using a package/cursor while the pin suggests that another session is trying to destroy/(re)compile it – and they’re each trying to do it to each other’s package! (In the past when I’ve seen a deadlock of this type it’s been reported as ORA-04020 rather than ORA-00060.)

I haven’t yet worked out the exact detail of how the OP got into their specific deadlock but (especially in view of their closing comments) I don’t think it’s something I need to spend more time on.

Update (April 2020)

Prompted  by a tweet I made about this note Nenad Noveljic supplied a link to another variation on the theme with a deadlock between a library cache lock and a row cache lock.

It’s quite hard to avoid thinking about gaps in a note if you visit it some time after it was originally written. In this case a possible explanation of why the ORA-00060 (data deadlock) is reported when I would have expected an ORA-04020 (library cache / rowcache deadlock)occurred to me.  Clearly there is a deadlock, so one or other has to be reported, but how does Oracle choose which one?

The answer is (possibly): the session executing p1 gets to the point where it tries to update a row in t1 that the other session has updated, so it’s response is “I can’t update the row – who’s locking it?”. The first session (executing p1) can then see that the second session (executing p2) has been waiting (and waiting for more than the typical 3 seconds) – so maybe the first session doesn’t really care what the second session is waiting for, it just knows that the second session has been waiting long enough for the first session to trigger a deadlock signal, so it sends a signal the diag process to say “Hey, diag, I’m in a deadlock waiting for data” and that’s why we see the ORA-00060. (Maybe).

 

2 Comments »

  1. Here’s the deadlock information:

        Oracle session identified by:
        {
                    instance: 1 (xxxxx_site1.xxxxx)
                       os id: 18215
                  process id: 74, oracle@xxxxxxxx (TNS V1-V3)
                  session id: 223
            session serial #: 5922
                 module name: 0 (sqlplus@xxxxxxxx (TNS V1-V3))
        }
        is waiting for 'library cache pin' with wait info:
        {
                          p1: 'handle address'=0x89708ec8
                          p2: 'pin address'=0x985cc650
                          p3: '100*mode+namespace'=0x10cb700010003
                time in wait: 7.677183 sec
               timeout after: 14 min 52 sec
                     wait id: 24
                    blocking: 1 session
              current sql_id: 3541508764
                 current sql: 
                wait history:
                  * time between current wait and wait #1: 0.001272 sec
                  1.       event: 'PGA memory operation'
                     time waited: 0.000013 sec
                         wait id: 23               p1: ''=0x10000
                                                   p2: ''=0x1
                  * time between wait #1 and #2: 0.005500 sec
                  2.       event: 'PGA memory operation'
                     time waited: 0.000012 sec
                         wait id: 22               p1: ''=0x20000
                                                   p2: ''=0x1
                  * time between wait #2 and #3: 0.001196 sec
                  3.       event: 'PGA memory operation'
                     time waited: 0.000010 sec
                         wait id: 21               p1: ''=0x30000
                                                   p2: ''=0x1
        }
        and is blocked by
     => Oracle session identified by:
        {
                    instance: 1 (xxxxx_site1.xxxxx)
                       os id: 17414
                  process id: 73, oracle@xxxxxxxx (TNS V1-V3)
                  session id: 220
            session serial #: 13750
                 module name: 0 (sqlplus@xxxxxxxx (TNS V1-V3))
        }
        which is waiting for 'enq: TX - row lock contention' with wait info:
        {
                          p1: 'name|mode'=0x54580006
                          p2: 'usn<<16 | slot'=0x20017
                          p3: 'sequence'=0x260
                time in wait: 0.583306 sec
               timeout after: never
                     wait id: 153
                    blocking: 1 session
              current sql_id: 3484937314
                 current sql: UPDATE T1 SET N1 = 4 WHERE N1 = 3
                wait history:
                  * time between current wait and wait #1: 0.000607 sec
                  1.       event: 'PL/SQL lock timer'
                     time waited: 10.001058 sec
                         wait id: 152              p1: 'duration'=0x0
                  * time between wait #1 and #2: 0.000464 sec
                  2.       event: 'db file sequential read'
                     time waited: 0.002698 sec
                         wait id: 151              p1: 'file#'=0x1
                                                   p2: 'block#'=0x642f
                                                   p3: 'blocks'=0x1
                  * time between wait #2 and #3: 0.003230 sec
                  3.       event: 'PGA memory operation'
                     time waited: 0.000020 sec
                         wait id: 150              p1: ''=0x30000
                                                   p2: ''=0x1
        }
        and is blocked by the session at the start of the chain.
    
    

    I think Oracle treats it as a deadlock, because two sessions are waiting on each other. Had only one session waited it would have ended with ORA-04020.

    The problem is that drop procedure probably requested and obtained the *shared* object library cache lock and proceeded to the update, which shouldn’t have happened.

    Had the session requested an *exclusive* lock this could have never happened. Possibly, this could have resulted in ORA-04020, if the process was unwilling to wait.

    I think I could prove this with the debugger. It’s, though, very similar to the case in the article you referenced in the update.

    Comment by Nenad Noveljic — April 28, 2020 @ 3:55 pm BST Apr 28,2020 | Reply

  2. […] An unusual deadlock (Sept 2014): between a library cache lock and a normal DML (TX) lock. (Includes a link to an example of a deadlock between a rowcache lock and a library cache lock.) […]

    Pingback by Lock Catalogue | Oracle Scratchpad — February 19, 2022 @ 10:34 am GMT Feb 19,2022 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: