Oracle Scratchpad

August 29, 2011

Deadlock

Filed under: deadlocks,Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:25 pm BST Aug 29,2011

Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

The modes and types give us some clues about what’s going on: TX/4 is typically about indexes involved in referential integrity (though there are a couple of more exotic reasons such as wait for ITLs, Freelists or tablespace status change); conversion of a TM lock from mode 3 to mode 5 is only possible (as far as I know) in the context of missing foreign key indexes when you delete a parent row.

Here’s a simple data set to help demonstrate the type of thing that could have caused this deadlock:

drop table child;
drop table parent;

create table parent (
	id		number(4),
	name		varchar2(10),
	constraint par_pk
		primary key (id)
)
;

create table child(
	id_p	number(4),
	id	number(4),
	name	varchar2(10),
	constraint chi_pk
		primary key (id, id_p),
	constraint chi_fk_par
		foreign key(id_p)
		references parent
		on delete cascade
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1, 1, 'Simon');
insert into child values(2, 1, 'Janet');

commit;

Note that I have define the primary key on the child the “wrong way round”, so that the foreign key doesn’t have a supporting index. Note also that the foreign key constraint is defined as ‘on delete cascade’ – this isn’t a necessity, but it means I won’t have to delete child rows explicitly in my demo.

Now we take the following steps:

Session 1: delete from parent where id = 1;

This will delete the child row – temporarily taking a mode 4 (S) lock on the child table – then delete the parent row. Both tables will end up locked in mode 3.

Session 2: insert into child values (1,2,'Sally');

This will lock the parent table in mode 2, lock the child table in mode 3, then wait with a TX mode 4 for session 1 to commit or rollback. If session 1 commits it will raise Oracle error: “ORA-02291: integrity constraint (TEST_USER.CHI_FK_PAR) violated – parent key not found”; if session 1 rolls back the insert will succeed.

Session 1: delete from parent where id = 2;

This will attempt to lock the child table in mode 4, find that there it already has the child locked in mode three (which is incompatible with mode 4) and therefore attempt to convert to mode 5 (SSX). This will make it queue, waiting for session 2 to commit.
Three seconds later session 2 (the first to start waiting) will timeout and report a deadlock with the follow deadlock graph:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-00015818-00000000        14     371    SX             17     368    SX   SSX
TX-0009000e-000054ae        17     368     X             14     371           S
session 371: DID 0001-000E-00000018	session 368: DID 0001-0011-00000005
session 368: DID 0001-0011-00000005	session 371: DID 0001-000E-00000018
Rows waited on:
Session 368: no row
Session 371: no row
Session 368:
  pid=17 serial=66 audsid=2251285 user: 52/TEST_USER
  O/S info: user: HP-LAPTOPV1\jonathan, term: HP-LAPTOPV1, ospid: 2300:3528, machine: WORKGROUP_JL\HP-LAPTOPV1
            program: sqlplus.exe
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  delete from parent where id = 2
End of information on OTHER waiting sessions.
Current SQL statement for this session:
insert into child values(1,2,'new')

You’ll notice that there are no rows waited for – session 1 isn’t waiting for a row it’s waiting for a table and session 2 isn’t waiting for a table row it’s waiting for an index entry.

Footnote: There are several variations on the theme of one session inserting child rows when the other session has deleted (or inserted) the parent. The uncommitted parent change is an easy cause of the TX/4; the delete with unindexed foreign key is a necessary cause of the SX -> SSX.

8 Comments »

  1. > … session 1 (the first to start waiting) …
    Isn’t session 2 the first to start waiting?
    And is this deterministic – the first session to start waiting will be the one detecting the deadlock? As far as I know (from somewhere), every session timing out on an enqueue wait will check for a deadlock before going to sleep again. Now, in almost all cases the first session to start waiting will be the first to time-out, but there’s a difference between being runnable and actually running, so it seems possible that the second session to start waiting might end up being the one detecting the deadlock.
    Not that this matters for the topic of the post, I’m just curious.

    Comment by Flado — August 30, 2011 @ 10:31 am BST Aug 30,2011 | Reply

    • Flado, correct – session 2 is the first to start waiting, and it is the one that will raise the deadlock. (Now corrected.)

      I would like to think that the “1″ was just a case of missing the key while typing, and not noticing the error.
      Technically I suppose it is possible for the two waiters to start running in the wrong order – but since they go off the run queue in order they will presumable go back on the run queue in order: but on a multi-CPU machine is there one run queue that services all CPUs or one runqueue per CPU ? In the latter case if process one goes onto the run queue of CPU A and process two goes onto the run queue of CPU B then I guess process two could get to the top of its run queue before process one.

      Comment by Jonathan Lewis — September 1, 2011 @ 4:16 pm BST Sep 1,2011 | Reply

  2. It is the session(session 2) doing the insert that will time out and will report the deadlock error

    
    mhouri.world> insert into child values (1,2,'Sally');
    insert into child values (1,2,'Sally')
                *
    ERROR at line 1:
    ORA-00060: deadlock detected while waiting for resource
    

    Best regards

    Comment by Houri Mohamed — August 31, 2011 @ 8:20 am BST Aug 31,2011 | Reply

  3. [...] From Jonathan Lewis’ Oracle Scratchpad blog. [...]

    Pingback by Deadlock – All Things Oracle — September 8, 2011 @ 5:10 pm BST Sep 8,2011 | Reply

  4. What would be a reason for ITL waits on an index segment?. Especially if the index is on a child table on a FK constraint column.
    This ITL waits showingup while concurrent inserts into the child table.

    Also can FK indexes can be reverse key? Would that have any negative impact (TM locks )

    Thanks
    Balaji.

    Comment by Balaji — October 16, 2011 @ 5:00 pm BST Oct 16,2011 | Reply

    • Balaji,

      If all you were doing was inserting rows into a child table with an enabled foreign key that was protected by a suitable index, I wouldn’t have expected to see ITL waits. As far as I know if there are no free ITL entries in a leaf block the process that wants one will simply intitiate a leaf block split. I could probably come up with a couple of scenarios to check if necessary – e.g. two sessions wanting to insert for the same parent key simultaneously when there are no free ITLs (though I might expect a buffer busy wait or “enq: TX – index contention”); or two leaf blocks under the same branch block having to split simuiltanously and causing contention for the branch ITL entry (although again I think I’d expect “Enq: TX – index contention”).

      I haven’t tested FKs with reverse key indexes – I can think on no reason why they would cause any problems. It’s easy to test though, and you could apply the test case to different versions of Oracle because it’s often the case that a detail that ought to work simply doesn’t work on the first release. (e.g. you couldn’t do index fast full scans on bitmap indexes in the first release where index fast full scans appeared.)

      Comment by Jonathan Lewis — November 13, 2011 @ 12:36 pm BST Nov 13,2011 | Reply

      • I’ve just had an exchange of email with Tom Kyte about a thread on AskTom that quoted the previous comment, specifically the bit:


        “As far as I know if there are no free ITL entries in a leaf block the process that wants one will simply intitiate a leaf block split.”

        When I wrote this I was writing in the context of the question, of course, which was all about inserting rows, not about updating or deleting rows; so when Tom emailed me about it my immediate reply was that it didn’t apply to deletes or updates (and rememmber that an update is implemented at the index level as a delete followed by an insert).

        However, as I sat down to write a short note to add to this blog, I suddenly realised that I wasn’t sure that I had told Tom the truth – so I did a quick search of my working directory for any scripts with the words “index split delete”, and found a script called (appropriately enough) ind_split_del.sql that I had written a few years ago to demonstrate a couple of variations of how a block split could occur during a delete that wasn’t able to find or create an ITL entry in the leaf block. (I may write this up some day – it’s another of the edge cases where an index can become larger than it should because of concurrent activity.)

        If you’re wondering how Oracle can split a leaf block when it can’t find an ITL to change the block, by the way, you need to know that Oracle reserves the first ITL entry in a leaf block for “service” transactions – viz: block splits. (See this blog note for a couple of comments about service transactions and service ITLs as they relate to branch blocks)

        This does leave unanswered the question of how you can get ITL waits on an index – but I haven’t tested many variations of deletions, or mixing inserts and deletes; and it’s always possible that the waits appear if two deletes start on the same leaf block at the same time and both need to split the block. There may be cases were the waits relate to branch block contention if two sessions try to split two different leaf blocks under the same branch at the same time. Given that we know what it means to wait for an ITL, it’s relatively easy to come up with scenarios where they may occur – unfortunately it takes time (and sometimes a lot of patience) to create and run tests for those scenarios.

        Comment by Jonathan Lewis — November 28, 2011 @ 5:26 pm BST Nov 28,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,528 other followers