Here’s an example of a nasty accident that can be seen in a slightly unusual output from v$lock (on 11.2.0.4).
Rather conveniently it demonstrates a comment I’ve often made about deadlocks not being “resolved” automatically, with the result that incorrectly written applications can end up with two sessions taking it in turns (see-sawing) to block each other.
select sid, type, id1, id2, lmode, request, ctime, block from V$lock where sid in ( select sid from V$session where username = 'TEST_USER' ) order by sid, type desc ; SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK ---------- -- ---------- ---------- ---------- ---------- ---------- ---------- 145 TX 589833 7518 6 0 489 0 TM 92275 0 2 6 489 1 SK 7 25165955 6 0 489 0 AE 100 0 4 0 582 0 148 TX 524308 7383 6 0 490 0 TM 92275 0 2 6 490 1 SK 7 25173379 6 0 490 0 AE 100 0 4 0 998 0
You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.
The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:
SID SEQ# EVENT# EVENT P1 P2 P3 WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO ----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- -------------------------- 145 1 798 Wait for shrink lock 0 0 0 10 99954 70137 2 235 enq: TM - contention 1414332422 92275 0 600 6002022 101 3 798 Wait for shrink lock 0 0 0 10 100723 69335 4 235 enq: TM - contention 1414332422 92275 0 600 6001589 81 5 798 Wait for shrink lock 0 0 0 10 100500 69542 6 235 enq: TM - contention 1414332422 92275 0 600 6002352 86 7 798 Wait for shrink lock 0 0 0 10 100618 69145 8 235 enq: TM - contention 1414332422 92275 0 600 6001545 144 9 798 Wait for shrink lock 0 0 0 10 100996 69445 10 235 enq: TM - contention 1414332422 92275 0 600 6002744 310
The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.
The alert log, by the way, looked like this:
Wed Mar 12 12:53:27 2014 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc. Wed Mar 12 12:53:30 2014 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc. Wed Mar 12 12:53:40 2014 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc. Wed Mar 12 12:53:43 2014 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc. ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.
Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.
Strategy Note
If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact“, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on the indexes on the same table at the same time.
Footnote:
This article was prompted by the recent appearance of bug 18388128 on MoS, which describes a slightly difference scenario and also shows an example of the deadlock graph you might see (endlessly repeated).
Update (Dec 2019)
I’ve just been prompted to revisit this note. Following up the bug number I see that it’s reported as fixed in 12.2, with various backports for 11.2.0.3 (where it was first discovered) and 11.2.0.4. Strangely I couldn’t spot a patch for 12.1, though.
Good find, JL!
And (agreeing) regarding “Oracle resolves deadlocks automatically,” it would be far better for folks trying to convey what Oracle does about deadlocks to start such claims with “Oracle breaks deadlocks automatically.”
Then perhaps it would be easier to get people to take seriously exactly the state of the youngest lock holder’s session, resolution of which has become the application’s province after Oracle has intervened to break the deadlock.
Here’s hoping the bit of this application (Oracle RDBMS) that needs to avoid or handle that deadlock is well contained enough that it can be fixed and back ported quickly and reliably!
Comment by rsiz — March 13, 2014 @ 10:59 am GMT Mar 13,2014 |
[…] 3 seconds until someone noticed what was going on. [Update Jan 2016 – I should have linked to this example of an Oracle internal error that does exactly this a long time […]
Pingback by Deadlock Detection | Oracle Scratchpad — January 22, 2016 @ 11:59 am GMT Jan 22,2016 |
[…] Shrink Space bug (Mar 2014): a bug, fixed in 12.2, demonstrating that Oracle does not “resolve” deadlocks, it just makes it possible for the application to resolve the deadlock. […]
Pingback by Lock Catalogue | Oracle Scratchpad — February 19, 2022 @ 10:34 am GMT Feb 19,2022 |