Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 220.127.116.11 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
-- created a table t1 to work with, no data needed, column definitions don't matter. Session 1: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 2: lock table t1 in row exclusive mode; -- mode 3, lock acquired Session 3: lock table t1 in exclusive mode; -- mode 6, goes into wait for session 1 Session 1 (again): lock table t1 in share row exclusive mode; -- mode 5, should move to converter queue, wait for session 2
The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).
Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 18.104.22.168; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !
My problem is this – the instance that gives the deadlock error is 64-bit 22.214.171.124 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:
- Windows vs. Linux
- 32 bit vs. 64 bit
- RAC vs. non-RAC
- Specific version of Oracle
I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 126.96.36.199 RAC, or 188.8.131.52 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 184.108.40.206 on RAC on Linux might like to confirm – or contradict – my result.)
Update 8:30 p.m. GMT
The results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 220.127.116.11 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.
Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.
Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 18.104.22.168 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.