Oracle Scratchpad

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm GMT Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted

                                                   Session 2
                                                   update jobs set status = 'STARTING' where id = 99;
                                                   1 row updated

Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2

                                                   Session 2
                                                   delete from t1 where id between 1 and 1000000
                                                   -- session 2 is now waiting on session 1

Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected
(with rollback of the most recent statement - i.e. the update)

On receiving the ORA-00060 error most of the applications that I’ve seen would then crash session 1, resulting in an implicit rollback of the million row delete (worst case I’ve seen: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would log “Unexpected Oracle error” to the front-end and retry the most recent action (the update that rolled back) at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and fore every 3 seconds until someone noticed what was going on.

In what way has the deadlock been “resolved” ?

In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.

Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.

11 Comments »

  1. I had to use serious googlefu to track down:

    “The “London bus” effect is well-known to many IPKat readers: you wait patiently in line for a bus. After peering into a bus-free horizon for what feels like an eternity, you discern a distant red speck, crawling at an almost inconceivably slow pace towards you. At last the distant speck comes close enough for you to see not one bus, but two or even three, travelling together in what appears to be a convoy.”

    Comment by Mark Brady — February 22, 2013 @ 6:37 pm GMT Feb 22,2013 | Reply

  2. Hmm… I don’t see why the two session would see-saw back and fore.

    When session 1 gets an implicit rollback of the million row update, session 2 would immediately be able to do its delete operation and exclusively lock the rows in t1. If session 1 would retry the delete, it would have to wait until session 2 commits or issues a rollback. That would be a wait, but not a deadlock state.

    If you meant that session 1 would ignore the rollback of its delete statement and rerun the update statement, the application would be severely broken regarding its transaction management, but it still would not result in another deadlock, since session 1 would have to wait for session 2 to commit, since session 2 still has the row(s) locked in table jobs.

    In this sense Oracle does resolve the deadlock situation, but of course I agree, that it won’t resolve the reason for running into a deadlock for us. :-)

    Comment by Marcus Mönnig — February 23, 2013 @ 10:30 am GMT Feb 23,2013 | Reply

    • Marcus,

      I’ve added one critical detail to the description – the session that gets the ORA-00060 rolls back the one statement that it was waiting on (there’s an internal savepoint created before each DML statement).

      It’s only if the client code fails to handle the ORA-00060 and the crashes itself that the rollback of the entire transaction would occur. (Of course, some application might actually detect ORA-00060 – either as an expected or unexpected error – and issue an explicit rollback of the transaction.) Session 2 is waiting on the transaction lock – so session 1 has to issue a commit; or rollback; or crash so that pmon can pick up the accident and rollback the transaction.

      The see-saw happens if the session receiving the ORA-00060 detects it and responds to it by retrying the single statement.

      Session 1 retries the update and starts to wait on session 2. Session 2 gets ORA-00060 and rolls back its delete statement (which doesn’t release the transaction lock that it is holding and that session 1 is waiting on). In this case, of course, session 2 didn’t actually get started on deleting data because it was waiting for session 1.

      Session2 retries its delete and starts to wait on session 1. Session 1 gets an ORA-00060 and rolls back its update statement (which doesn’t release the transaction lock that is is holding and that session 2 is waiting on).

      Session 1 retries …

      Looking at this example, by the way, the error in the front-end code is that session 1 didn’t update the correct JOB row before it got started (or maybe it accidentally issued a commit on that update). If it had done so session 2 would have been waiting safely and would not have got into the deadlock. Possibly, of course, the code to set the JOB to “Starting” should have checked that the record was in a valid state before doing so – and that might pre-empt any problems of Session 1 updating correctly, but committing when it shouldn’t have.

      Big jobs need to be protected from accidents, because you can lose a lot of resources and time if a big job rolls back and has to be restarted.

      Comment by Jonathan Lewis — February 23, 2013 @ 10:51 am GMT Feb 23,2013 | Reply

      • Jonathan,

        thanks for the additional explanation. I’m glad I commented, because I never understood the “Oracle rolls back the _last statement_” statement in the Oracle documentation or ML notes.

        I’ve never _seen_ this actually happen. What I’ve always seen was a (usually implicit) rollback from the session after it received an ORA-60 on Oracle’s rollback of the last statement. From the client technologies I have to deal with, a retry of the last statement that was rolled back by Oracle would in my opinion need some very strange coding, like, e.g. in PL/SQL, a catch all (or at least ORA-60) exception handler around each statement with a goto label back to the statement.

        Comment by Marcus Mönnig — February 23, 2013 @ 11:47 am GMT Feb 23,2013 | Reply

  3. As you say – buses. I’ve just been dealing with some deadlock issues and along comes your post. I’d always assumed that Oracle would fail the last session making a lock request i.e. the one that completed the cycle of locks that is a deadlock – session 2 in your example. But a quick test by myself verifies what you have stated – it is actually the other session that gets an error back on its SQL statement (session 1), and the session that caused the deadlock (session 2) carries on waiting for the lock it wants. I’ll have to remember that for future reference.

    John

    Comment by John Brady — March 8, 2013 @ 10:38 am GMT Mar 8,2013 | Reply

  4. Not all Deadlocks are created the same…

    I’ve blogged about deadlocks in Oracle at least once before. I said then that although the following message in deadlock trace files is usually true, it isn’t always. The following deadlock is not an Oracle error. Deadlocks of  this type can b…

    Trackback by Doug's Oracle Blog — March 11, 2013 @ 2:04 am GMT Mar 11,2013 | Reply

  5. Excellent explanation! Just wanted to point out something that probably makes it a bit unclear:

    “At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update”

    The *update* that will get rolled back is of only 1 row. The million row *delete* will stay untouched at this point.

    Comment by Todor Botev — April 28, 2013 @ 12:59 pm GMT Apr 28,2013 | Reply

    • Todor,

      Thanks for highlighting my error – I’ve made a couple of changes to the note to clarify things.

      The update to the one row will rollback automatically with the ORA-00060 – but as I pointed out, many applications will crash the session because of the ORA-00060, and that crash is the thing that also rolls back the 1,000,000 delete. (Technically, of course, it’s probably smon that will handle the rollback, not the session itself if the session crashes – although some applications may decide to issue an explicit rollback before terminating.)

      Comment by Jonathan Lewis — June 9, 2013 @ 12:04 pm GMT Jun 9,2013 | Reply

  6. […] but I might be incorrectly remembering that rule.  Oracle Database really did not completely resolve the deadlock, nor did it “kill” session 3 (as too many books and blog article state would happen) […]

    Pingback by Oracle Enqueue Rules Leading to Deadlock Changes Again in 12.1? | Charles Hooper's Oracle Notes — August 1, 2013 @ 9:27 pm GMT Aug 1,2013 | Reply

  7. […] I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error. […]

    Pingback by Shrink Space | Oracle Scratchpad — March 13, 2014 @ 7:08 am GMT Mar 13,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,308 other followers