Deadlock Detection

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm GMT Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted

                                                   Session 2
                                                   update jobs set status = 'STARTING' where id = 99;
                                                   1 row updated

Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2

                                                   Session 2
                                                   delete from t1 where id between 1 and 1000000;
                                                   -- session 2 is now waiting on session 1

Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected
(with rollback of the most recent statement - i.e. the update)

On receiving the ORA-00060 error most of the applications that I’ve seen would then crash session 1, resulting in an implicit rollback of the million row delete (the worst case I’ve seen in s production system: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would report “Unexpected Oracle error” to the front-end and retry the most recent action (the update that rolled back), at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and forth every 3 seconds until someone noticed what was going on. [Update: In March 2014 I published this example of a flaw in some Oracle internal code that behaves in exactly this way.]

In what way has the deadlock been “resolved” ?

In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.

Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.

Update (March 2024)

A new feature in Oracle 23c gives us an option for “proper” deadlock resolution. It is now possible to set three levels of priority for transactions and limits on the amount of time that transactions running at higher levels are prepared to wait for transactions running at lower levels before sniping them. For more details see this blog note by FatDBA.

Further Update (March 2024)

It’s too easy to fall into the trap of “narrow thinking”. If I say “deadlock” most people are likely to think of TM or TX enqueues colliding and transactions being deadlocked. There are, of course, all sorts of other Oracle activities involving enqueues which means all sorts of deadlocks other than transaction deadlocks (for example this one between a row cache lock and a library cache lock). So my statement about “Oracle detects but doesn’t resolve deadlocks” demonstrates the fallacy of generalising from the particular. This was brought to my attention following a recent (March 2024) tweet where I referenced this blog note – and got a reply pointing out that Hang Manager (renamed Blocker Resolver in 23c) was able to resolve deadlocks in RAC systems and had been able to in many versions of Oracle.

Comments (13)

13 Comments »

I had to use serious googlefu to track down:

“The “London bus” effect is well-known to many IPKat readers: you wait patiently in line for a bus. After peering into a bus-free horizon for what feels like an eternity, you discern a distant red speck, crawling at an almost inconceivably slow pace towards you. At last the distant speck comes close enough for you to see not one bus, but two or even three, travelling together in what appears to be a convoy.”

Comment by Mark Brady — February 22, 2013 @ 6:37 pm GMT Feb 22,2013 | Reply
Hmm… I don’t see why the two session would see-saw back and fore.

When session 1 gets an implicit rollback of the million row update, session 2 would immediately be able to do its delete operation and exclusively lock the rows in t1. If session 1 would retry the delete, it would have to wait until session 2 commits or issues a rollback. That would be a wait, but not a deadlock state.

If you meant that session 1 would ignore the rollback of its delete statement and rerun the update statement, the application would be severely broken regarding its transaction management, but it still would not result in another deadlock, since session 1 would have to wait for session 2 to commit, since session 2 still has the row(s) locked in table jobs.

In this sense Oracle does resolve the deadlock situation, but of course I agree, that it won’t resolve the reason for running into a deadlock for us. :-)

Comment by Marcus Mönnig — February 23, 2013 @ 10:30 am GMT Feb 23,2013 | Reply
- Marcus,
  
  I’ve added one critical detail to the description – the session that gets the ORA-00060 rolls back the one statement that it was waiting on (there’s an internal savepoint created before each DML statement).
  
  It’s only if the client code fails to handle the ORA-00060 and the crashes itself that the rollback of the entire transaction would occur. (Of course, some application might actually detect ORA-00060 – either as an expected or unexpected error – and issue an explicit rollback of the transaction.) Session 2 is waiting on the transaction lock – so session 1 has to issue a commit; or rollback; or crash so that pmon can pick up the accident and rollback the transaction.
  
  The see-saw happens if the session receiving the ORA-00060 detects it and responds to it by retrying the single statement.
  
  Session 1 retries the update and starts to wait on session 2. Session 2 gets ORA-00060 and rolls back its delete statement (which doesn’t release the transaction lock that it is holding and that session 1 is waiting on). In this case, of course, session 2 didn’t actually get started on deleting data because it was waiting for session 1.
  
  Session2 retries its delete and starts to wait on session 1. Session 1 gets an ORA-00060 and rolls back its update statement (which doesn’t release the transaction lock that is is holding and that session 2 is waiting on).
  
  Session 1 retries …
  
  Looking at this example, by the way, the error in the front-end code is that session 1 didn’t update the correct JOB row before it got started (or maybe it accidentally issued a commit on that update). If it had done so session 2 would have been waiting safely and would not have got into the deadlock. Possibly, of course, the code to set the JOB to “Starting” should have checked that the record was in a valid state before doing so – and that might pre-empt any problems of Session 1 updating correctly, but committing when it shouldn’t have.
  
  Big jobs need to be protected from accidents, because you can lose a lot of resources and time if a big job rolls back and has to be restarted.
  
  Comment by Jonathan Lewis — February 23, 2013 @ 10:51 am GMT Feb 23,2013 | Reply
  - Jonathan,
    
    thanks for the additional explanation. I’m glad I commented, because I never understood the “Oracle rolls back the _last statement_” statement in the Oracle documentation or ML notes.
    
    I’ve never _seen_ this actually happen. What I’ve always seen was a (usually implicit) rollback from the session after it received an ORA-60 on Oracle’s rollback of the last statement. From the client technologies I have to deal with, a retry of the last statement that was rolled back by Oracle would in my opinion need some very strange coding, like, e.g. in PL/SQL, a catch all (or at least ORA-60) exception handler around each statement with a goto label back to the statement.
    
    Comment by Marcus Mönnig — February 23, 2013 @ 11:47 am GMT Feb 23,2013 | Reply
    - Retrying the last DML that was rolled back, in event of ORA-0060 is taken care of by the database and not the application I think. Because, as you say, that would require some weird coding.
      
      Comment by stevesmith — November 30, 2016 @ 5:07 am GMT Nov 30,2016
As you say – buses. I’ve just been dealing with some deadlock issues and along comes your post. I’d always assumed that Oracle would fail the last session making a lock request i.e. the one that completed the cycle of locks that is a deadlock – session 2 in your example. But a quick test by myself verifies what you have stated – it is actually the other session that gets an error back on its SQL statement (session 1), and the session that caused the deadlock (session 2) carries on waiting for the lock it wants. I’ll have to remember that for future reference.

John

Comment by John Brady — March 8, 2013 @ 10:38 am GMT Mar 8,2013 | Reply
Not all Deadlocks are created the same…

I’ve blogged about deadlocks in Oracle at least once before. I said then that although the following message in deadlock trace files is usually true, it isn’t always. The following deadlock is not an Oracle error. Deadlocks of this type can b…

Trackback by Doug's Oracle Blog — March 11, 2013 @ 2:04 am GMT Mar 11,2013 | Reply
Excellent explanation! Just wanted to point out something that probably makes it a bit unclear:

“At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update”

The *update* that will get rolled back is of only 1 row. The million row *delete* will stay untouched at this point.

Comment by Todor Botev — April 28, 2013 @ 12:59 pm BST Apr 28,2013 | Reply
- Todor,
  
  Thanks for highlighting my error – I’ve made a couple of changes to the note to clarify things.
  
  The update to the one row will rollback automatically with the ORA-00060 – but as I pointed out, many applications will crash the session because of the ORA-00060, and that crash is the thing that also rolls back the 1,000,000 delete. (Technically, of course, it’s probably smon that will handle the rollback, not the session itself if the session crashes – although some applications may decide to issue an explicit rollback before terminating.)
  
  Comment by Jonathan Lewis — June 9, 2013 @ 12:04 pm BST Jun 9,2013 | Reply
[…] but I might be incorrectly remembering that rule. Oracle Database really did not completely resolve the deadlock, nor did it “kill” session 3 (as too many books and blog article state would happen) […]

Pingback by Oracle Enqueue Rules Leading to Deadlock Changes Again in 12.1? | Charles Hooper's Oracle Notes — August 1, 2013 @ 9:27 pm BST Aug 1,2013 | Reply
[…] I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error. […]

Pingback by Shrink Space | Oracle Scratchpad — March 13, 2014 @ 7:08 am GMT Mar 13,2014 | Reply
[…] https://jonathanlewis.wordpress.com/2013/02/22/deadlock-detection/ […]

Pingback by Deadlock parser – parsing lmd0 trace files | recurrent null — April 19, 2014 @ 5:40 pm BST Apr 19,2014 | Reply
[…] Deadlock Detection (Feb 2013): Oracle does not resolve deadlocks, it merely detects them and makes it possible for the application to resolve them. […]

Pingback by Lock Catalogue | Oracle Scratchpad — February 19, 2022 @ 10:34 am GMT Feb 19,2022 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

February 22, 2013