By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:
“Oracle detects and resolves deadlocks automatically.”
Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.
Consider the following example (which, I have to admit, I wrote without access to a live instance):
Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted
Session 2
update jobs set status = 'STARTING' where id = 99;
1 row updated
Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2
Session 2
delete from t1 where id between 1 and 1000000
-- session 2 is now waiting on session 1
Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected
At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update (worst case I’ve seen: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would log “Unexpected Oracle error” to the front-end and retry the most recent action (at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and fore every 3 seconds until someone noticed what was going on).
In what way has the deadlock been “resolved” ?
In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.
Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.

I had to use serious googlefu to track down:
“The “London bus” effect is well-known to many IPKat readers: you wait patiently in line for a bus. After peering into a bus-free horizon for what feels like an eternity, you discern a distant red speck, crawling at an almost inconceivably slow pace towards you. At last the distant speck comes close enough for you to see not one bus, but two or even three, travelling together in what appears to be a convoy.”
Comment by Mark Brady — February 22, 2013 @ 6:37 pm UTC Feb 22,2013 |
Hmm… I don’t see why the two session would see-saw back and fore.
When session 1 gets an implicit rollback of the million row update, session 2 would immediately be able to do its delete operation and exclusively lock the rows in t1. If session 1 would retry the delete, it would have to wait until session 2 commits or issues a rollback. That would be a wait, but not a deadlock state.
If you meant that session 1 would ignore the rollback of its delete statement and rerun the update statement, the application would be severely broken regarding its transaction management, but it still would not result in another deadlock, since session 1 would have to wait for session 2 to commit, since session 2 still has the row(s) locked in table jobs.
In this sense Oracle does resolve the deadlock situation, but of course I agree, that it won’t resolve the reason for running into a deadlock for us. :-)
Comment by Marcus Mönnig — February 23, 2013 @ 10:30 am UTC Feb 23,2013 |
Marcus,
I’ve added one critical detail to the description – the session that gets the ORA-00060 rolls back the one statement that it was waiting on (there’s an internal savepoint created before each DML statement).
It’s only if the client code fails to handle the ORA-00060 and the crashes itself that the rollback of the entire transaction would occur. (Of course, some application might actually detect ORA-00060 – either as an expected or unexpected error – and issue an explicit rollback of the transaction.) Session 2 is waiting on the transaction lock – so session 1 has to issue a commit; or rollback; or crash so that pmon can pick up the accident and rollback the transaction.
The see-saw happens if the session receiving the ORA-00060 detects it and responds to it by retrying the single statement.
Session 1 retries the update and starts to wait on session 2. Session 2 gets ORA-00060 and rolls back its delete statement (which doesn’t release the transaction lock that it is holding and that session 1 is waiting on). In this case, of course, session 2 didn’t actually get started on deleting data because it was waiting for session 1.
Session2 retries its delete and starts to wait on session 1. Session 1 gets an ORA-00060 and rolls back its update statement (which doesn’t release the transaction lock that is is holding and that session 2 is waiting on).
Session 1 retries …
Looking at this example, by the way, the error in the front-end code is that session 1 didn’t update the correct JOB row before it got started (or maybe it accidentally issued a commit on that update). If it had done so session 2 would have been waiting safely and would not have got into the deadlock. Possibly, of course, the code to set the JOB to “Starting” should have checked that the record was in a valid state before doing so – and that might pre-empt any problems of Session 1 updating correctly, but committing when it shouldn’t have.
Big jobs need to be protected from accidents, because you can lose a lot of resources and time if a big job rolls back and has to be restarted.
Comment by Jonathan Lewis — February 23, 2013 @ 10:51 am UTC Feb 23,2013 |
Jonathan,
thanks for the additional explanation. I’m glad I commented, because I never understood the “Oracle rolls back the _last statement_” statement in the Oracle documentation or ML notes.
I’ve never _seen_ this actually happen. What I’ve always seen was a (usually implicit) rollback from the session after it received an ORA-60 on Oracle’s rollback of the last statement. From the client technologies I have to deal with, a retry of the last statement that was rolled back by Oracle would in my opinion need some very strange coding, like, e.g. in PL/SQL, a catch all (or at least ORA-60) exception handler around each statement with a goto label back to the statement.
Comment by Marcus Mönnig — February 23, 2013 @ 11:47 am UTC Feb 23,2013 |
As you say – buses. I’ve just been dealing with some deadlock issues and along comes your post. I’d always assumed that Oracle would fail the last session making a lock request i.e. the one that completed the cycle of locks that is a deadlock – session 2 in your example. But a quick test by myself verifies what you have stated – it is actually the other session that gets an error back on its SQL statement (session 1), and the session that caused the deadlock (session 2) carries on waiting for the lock it wants. I’ll have to remember that for future reference.
John
Comment by John Brady — March 8, 2013 @ 10:38 am UTC Mar 8,2013 |
Not all Deadlocks are created the same…
I’ve blogged about deadlocks in Oracle at least once before. I said then that although the following message in deadlock trace files is usually true, it isn’t always. The following deadlock is not an Oracle error. Deadlocks of this type can b…
Trackback by Doug's Oracle Blog — March 11, 2013 @ 2:04 am UTC Mar 11,2013 |
Excellent explanation! Just wanted to point out something that probably makes it a bit unclear:
“At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update”
The *update* that will get rolled back is of only 1 row. The million row *delete* will stay untouched at this point.
Comment by Todor Botev — April 28, 2013 @ 12:59 pm UTC Apr 28,2013 |