Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
-- created a table t1 to work with, no data needed, column definitions don't matter. Session 1: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 2: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 3: lock table t1 in exclusive mode; -- mode 6, goes into wait for session 1 Session 1 (again): lock table t1 in share row exclusive mode; -- mode 5, should move to converter queue, wait for session 2
The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).
Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 11.2.0.3; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !
My problem is this – the instance that gives the deadlock error is 64-bit 11.2.0.2 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:
- Windows vs. Linux
- 32 bit vs. 64 bit
- RAC vs. non-RAC
- Specific version of Oracle
I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 11.2.0.3 RAC, or 11.2.0.2 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 11.2.0.2 on RAC on Linux might like to confirm – or contradict – my result.)
Update 8:30 p.m. GMT
The results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 11.2.0.2 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.
Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.
Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 11.2.0.3 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.
I was unable to reproduce this behaviour neither on 11.2.0.2 non-RAC, nor 11.2.0.3 RAC (both 64-bit Linux).
Comment by Oleksandr alesinskyy — March 20, 2013 @ 1:06 pm GMT Mar 20,2013 |
Oleksandr,
Thanks for that note. When you ran the 11.2.0.3 RAC test were the first two sessions running on the same node or different nodes ? The results from Justin Mitchell below indicate that it might make a difference.
Comment by Jonathan Lewis — March 20, 2013 @ 8:29 pm GMT Mar 20,2013 |
Hi Jonathan,
two node rac 11.2.0.3.5 (latest PSU) OEL 6.2 64bit and session 1 gets
Not sure how to format code so only conclusion showed .
Regards
GregG
Comment by goryszewskig — March 20, 2013 @ 1:36 pm GMT Mar 20,2013 |
I was able to get the same deadlock scenario reproduced in a 11.2.0.3 on a 3 node OEL RAC as well. However, the placement of the sessions seems to make a difference as well:
session1@node1
session2@node1
session3@node1
= Deadlock scenario as above
session1@node1
session2@node1
session3@node2
= Deadlock scenario as above
session1@node1
session2@node2
session3@node2
= No deadlock, session 1 waits on session 2
I’ll try to get some additional testing in on some other systems a bit later.
Comment by Justin Mitchell — March 20, 2013 @ 1:51 pm GMT Mar 20,2013 |
Testing on a 11.2.0.2 2-node RAC system yielded the same results as my earlier 11.2.0.3 test, with deadlocks when session 1 and session 2 are on the same node, and waits when session 1 and 2 are on different nodes.
Comment by Justin Mitchell — March 21, 2013 @ 1:25 am GMT Mar 21,2013 |
I was unable to reproduce this on 11.2.0.2.2 non-RAC on HP-UX Itanium.
Comment by David Leinbach — March 20, 2013 @ 1:57 pm GMT Mar 20,2013 |
I have trialled your experiment on Oracle 11.2.0.1 non RAC, 64bit OEL 5.5 and can confirm that no deadlock was observed, session 1 is waiting on session 2.
Comment by Tony Sleight — March 20, 2013 @ 3:43 pm GMT Mar 20,2013 |
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production
on OEL 5 ( Linux 2.6.18-164.el5 #1 SMP )
non-RAC
no deadlock:
Comment by Wolfgang — March 20, 2013 @ 4:17 pm GMT Mar 20,2013 |
On 11.2.0.2 non-RAC 64-bit Linux I get a deadlock.
Same outcome with sessions on the same nodes on 11.2.0.3. RAC, also on 64-bit Linux
Comment by Dom Brooks — March 20, 2013 @ 8:35 pm GMT Mar 20,2013 |
No deadlock observed on a NON RAC 10.2.0.4.0 database
Comment by Houri — March 21, 2013 @ 2:00 pm GMT Mar 21,2013 |
I’m not Oleksandr Alesinskyy, but we are also running RAC 11.2.0.3 on OEL5.6 (2 node)
Deadlock if all is done on same RAC node
No deadlock if session 2 is started on the other RAC node.
Comment by Wessel van Norel — March 21, 2013 @ 5:00 pm GMT Mar 21,2013 |
ORA-00060: deadlock detected while waiting for resource
Two node RAC running on SunOS 5.10 Generic_144488-17 sun4v sparc SUNW,T5440.
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Solaris: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production
Comment by nakinov — April 5, 2013 @ 10:03 am BST Apr 5,2013 |
I am sorry Jonathan, I forgot to write that the test I did was all on same instance. However I did two other test by mixing the instances and the results are bellow.
Test#1 – All on same instance
— ORA-00060: deadlock detected while waiting for resource
Instance#1
lock table t1 in row exclusive mode;
Instance#1
lock table t1 in row exclusive mode;
Instance#1
lock table t1 in exclusive mode;
Instance#1
lock table t1 in share row exclusive mode;
Test#2 – Mixing instances
— No ORA-600
Instance#1
lock table t1 in row exclusive mode;
Instance#2
lock table t1 in row exclusive mode;
Instance#1
lock table t1 in exclusive mode;
Instance#1
lock table t1 in share row exclusive mode;
Test#3 – Mixing instances
— ORA-00060: deadlock detected while waiting for resource
Instance#1
lock table t1 in row exclusive mode;
Instance#2
lock table t1 in row exclusive mode;
Instance#2
lock table t1 in exclusive mode;
Instance#1
lock table t1 in share row exclusive mode;
Comment by nakinov — April 5, 2013 @ 10:21 am BST Apr 5,2013 |