Oracle Scratchpad

May 2, 2013

v$lock

Filed under: Locks,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 7:03 pm UTC May 2,2013

The problem of slow queries on v$lock just came up again on the OTN database forum, so I thought I’d better push out a post that’s been hanging around on my blog for the last few months. This is actually mentioned in MOS in note 1328789.1: “Query Against v$lock Run from OEM Performs Slowly” which points out that it is basically a problem of bad statistics and all you have to do is collect the stats.

However, because the view is a messy union and join of several dynamic performance views sitting on top of a load of x$ structures, the advisory from MOS is to call the procedure  dbms_stats.gather_fixed_objects_stats. This is not a nice thing to do on a busy production system, especially if it has a large number of users and a large shared pool – and doing it at a representative run-time is important if you’re going to do it at all.

There is an alternative. Here’s the basic execution plan (on my 11.2.0.2) for “select * from v$lock”:

-----------------------------------------------------
| Id  | Operation              | Name       | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT       |            |     1 |
|   1 |  HASH JOIN             |            |     1 |
|   2 |   VIEW                 | GV$_LOCK   |    10 |
|   3 |    UNION-ALL           |            |       |
|   4 |     FILTER             |            |       |
|   5 |      VIEW              | GV$_LOCK1  |     2 |
|   6 |       UNION-ALL        |            |       |
|   7 |        FIXED TABLE FULL| X$KDNSSF   |     1 |
|   8 |        FIXED TABLE FULL| X$KSQEQ    |     1 |
|   9 |     FIXED TABLE FULL   | X$KTADM    |     1 |
|  10 |     FIXED TABLE FULL   | X$KTATRFIL |     1 |
|  11 |     FIXED TABLE FULL   | X$KTATRFSL |     1 |
|  12 |     FIXED TABLE FULL   | X$KTATL    |     1 |
|  13 |     FIXED TABLE FULL   | X$KTSTUSC  |     1 |
|  14 |     FIXED TABLE FULL   | X$KTSTUSS  |     1 |
|  15 |     FIXED TABLE FULL   | X$KTSTUSG  |     1 |
|  16 |     FIXED TABLE FULL   | X$KTCXB    |     1 |
|  17 |   MERGE JOIN CARTESIAN |            |   100 |
|  18 |    FIXED TABLE FULL    | X$KSUSE    |     1 |
|  19 |    BUFFER SORT         |            |   100 |
|  20 |     FIXED TABLE FULL   | X$KSQRS    |   100 |
-----------------------------------------------------

Note, particularly, the Cartesian merge join at line 17, which assumes there will be one row from v$session (x$ksuse) joined to 100 rows from v$resource (x$ksqrs – the structure used to represent any resources that you want to lock). The big UNION ALL is then all the different types of locks (enqueues) that you might attach to a resource. The estimates relating to these two structures are the most significant problem – v$session always has FAR more than one row in it, and v$resource isn’t small: in my little system the Cartesian join produced about 325,000 rows, and that was just after starting up the database.

Having identified a couple of critical tables, I decided to see what would happen if I just collected stats on these two objects rather than doing the whole system, and the following little piece of pl/sql did what I wanted:

begin
        dbms_stats.gather_table_stats('SYS','x$ksuse',method_opt=>'for all columns size 1');
        dbms_stats.gather_table_stats('SYS','x$ksqrs',method_opt=>'for all columns size 1');
end;
/

The effect of the stats was to change the plan to the following which, while it might not be the absolute best in all cases, is certainly better than the previous one (it’s possible that you may also find that it helps to collect stats on x$ksqeq (the “generic enqueues” structure) which is likely to be the next most significant in terms of number of rows):

------------------------------------------------------
| Id  | Operation               | Name       | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT        |            |   133 |
|   1 |  HASH JOIN              |            |   133 |
|   2 |   HASH JOIN             |            |    10 |
|   3 |    VIEW                 | GV$_LOCK   |    10 |
|   4 |     UNION-ALL           |            |       |
|   5 |      FILTER             |            |       |
|   6 |       VIEW              | GV$_LOCK1  |     2 |
|   7 |        UNION-ALL        |            |       |
|   8 |         FIXED TABLE FULL| X$KDNSSF   |     1 |
|   9 |         FIXED TABLE FULL| X$KSQEQ    |     1 |
|  10 |      FIXED TABLE FULL   | X$KTADM    |     1 |
|  11 |      FIXED TABLE FULL   | X$KTATRFIL |     1 |
|  12 |      FIXED TABLE FULL   | X$KTATRFSL |     1 |
|  13 |      FIXED TABLE FULL   | X$KTATL    |     1 |
|  14 |      FIXED TABLE FULL   | X$KTSTUSC  |     1 |
|  15 |      FIXED TABLE FULL   | X$KTSTUSS  |     1 |
|  16 |      FIXED TABLE FULL   | X$KTSTUSG  |     1 |
|  17 |      FIXED TABLE FULL   | X$KTCXB    |     1 |
|  18 |    FIXED TABLE FULL     | X$KSUSE    |   252 |
|  19 |   FIXED TABLE FULL      | X$KSQRS    |  1328 |
------------------------------------------------------

Thanks to Timur Akhmadeev who recently published a note pointing out that you could collect stats on individual X$ tables. Do make sure you test this on your specific version of Oracle, though, and don’t use the production system as your first test case.

April 13, 2013

Deadlocks

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 11:36 am UTC Apr 13,2013

Here’s a deadlock graph that might cause a little confusion:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
 Resource Name          process session holds waits  process session holds waits
 TX-001a0015-00014787        34      90     X             32       3           S
 TX-00190008-0000601b        32       3     X             34      90           S

 session 90: DID 0001-0022-00000327      session 3: DID 0001-0020-000009E9
 session 3: DID 0001-0020-000009E9       session 90: DID 0001-0022-00000327

 Rows waited on:
  Session 90: obj - rowid = 00030CE5 - AAAwzlAAGAABDiuAAA
  (dictionary objn - 199909, file - 6, block - 276654, slot - 0)
  Session 3: obj - rowid = 00030CE5 - AAAwzlAAGAABp8gAAA
  (dictionary objn - 199909, file - 6, block - 433952, slot - 0)

Both sessions are holding X and waiting S on a TX lock. There are several well-known reasons why you might see a share (mode 4) lock on a transaction table slot: collisions on bitmap indexes, unique indexes, index organized tables, or referential integrity checks are the commonest “external” examples, and problems with interested transaction lists (ITLs) or freelists are the common “internal” ones – so the presence of the waits in share mode shouldn’t, of themselves, a source of confusion.

The confusion is in the reported rowids. If you try to interpret them as real rowids you may be unlucky and discover that they seem to be related to the SQL reported for the deadlocked sessions when really the information they hold is garbage. (Just to avoid confusion, let me make it clear that there are cases where the rowids reported definitely WILL be garbage; on the other hand, there may be some scenarios where the rowids are relevant – although I haven’t done any exhaustive check to see if there really are such scenarios.)

When I see a deadlock graph on transaction locks and the waits are for S mode I tend to assume that the information about the rows waited on is probably misleading; when the slot number for the rowid is zero this increases my confidence that the rowid is rubbish. (Zero is a legal value for a rowid slot, of course, so a zero doesn’t prove that the rowid is rubbish, it’s just a coincidence that allows me to continue following a hypothesis.)

The problem is that Oracle doesn’t waste resources tidying up after itself, and in the case of deadlock graphs this laziness shows up in the rowids reported. The trace is simply reporting whatever happens to be in the row_wait_obj#, row_wait_file#, row_wait_block# and row_wait_row# columns of v$session; and if the waiting process hasn’t updated these columns with current row information you could be looking at the details of the last row (or block) that the session waited for. Here’s the description of a test to demonstrate the behaviour:

create table t1 (n1 number, n2 number);
insert into t1 values(1,1);
create unique index t1_i1 on t1(n1);
create unique index t1_i2 on t1(n2);

session 1:	insert into t1 values(2,11);
session 2:	insert into t1 values(3,21);
session 1:	insert into t1 values(4,21);
session 2:	insert into t1 values(5,11);

With this table, and sequence of events following it, session 1 raised a deadlock error, and dumped the following trace:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-000a0021-0000c4aa        16      93     X             13      90           S
TX-00020020-0000fdb9        13      90     X             16      93           S
session 93: DID 0001-0010-00000057	session 90: DID 0001-000D-000000F2
session 90: DID 0001-000D-000000F2	session 93: DID 0001-0010-00000057
Rows waited on:
Session 90: obj - rowid = 00000009 - AAAAAJAABAAAQJcAAA
  (dictionary objn - 9, file - 1, block - 66140, slot - 0)
Session 93: obj - rowid = 0002E7DC - AAAufaAAFAAAAAJAAA
  (dictionary objn - 190428, file - 5, block - 9, slot - 0)

Object 9 is the I_FILE#_BLOCK# index in the data dictionary – and session 90 is definitely not doing anything with that object in this transaction.
Object 190428 is another table in the test schema, but session 93 didn’t access it in this transaction, and the block referenced is the segment header block, not a block that could hold a row.

In fact, just before I started the sequence of inserts I ran this query from a third session (connected as sys) with the following results:

select
	sid,
	row_wait_obj#, row_wait_file#, row_wait_block#, row_wait_row#
from
	v$session
where
	sid in (90,93)
;

       SID ROW_WAIT_OBJ# ROW_WAIT_FILE# ROW_WAIT_BLOCK# ROW_WAIT_ROW#
---------- ------------- -------------- --------------- -------------
        90             9              1           66140             0
        93        190428              5               9             0

The “rows” reported for the deadlock simply echoed the values that were already in the row_wait columns before the test started. (The sessions were waiting on “SQL*Net message from client” at the time.)

Footnote:

Session 90 created the tables and indexes, that’s probably why it happened to have an outstanding reference to the i_file#_block# index.
Session 93 had just run a script to drop all the objects in the schema, which may explain why it happened to have an outstanding reference to a segment header block
If the row_wait_obj# had been set to -1 for either session then the deadlock graph would have reported “No row” for that session.

March 20, 2013

Lock Bug

Filed under: Bugs,deadlocks,Locks,Oracle — Jonathan Lewis @ 12:08 pm UTC Mar 20,2013

Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:


-- created a table t1 to work with, no data needed, column definitions don't matter.

Session 1:
lock table t1 in row exclusive mode;        -- mode 3, lock acquired

Session 2:
lock table t1 in row exclusive mode;        -- mode 3, lock acquired

Session 3:
lock table t1 in exclusive mode;            -- mode 6, goes into wait for session 1

Session 1 (again):
lock table t1 in share row exclusive mode;  -- mode 5, should move to converter queue, wait for session 2

The mode 3 to mode 5 conversion is what happens if you have a foreign key referential integrity constraint without a covering index and run code that does something like “delete child rows for parent X, delete parent X” (which also happens under the covers if your foreign key is declared as “on delete cascade”).

Notice the “should” on the lock conversion line – this is exactly what happens in 10.2.0.3 and 11.2.0.3; but on the system I was using session 1 got an immediate deadlock (ORA-00060) error – in the absence of a deadlock scenario !

My problem is this – the instance that gives the deadlock error is 64-bit 11.2.0.2 running RAC on Linux (OEL 5); the instances that don’t give the error are 32-bit, non-RAC, running on Windows XP Pro. SO is the anomaly due to some difference in:

  • Windows vs. Linux
  • 32 bit vs. 64 bit
  • RAC vs. non-RAC
  • Specific version of Oracle

I’m inclined to think it’s a version dependent bug, but it’s possible that it’s a necessary side effect of RAC. So if you’ve got 11.2.0.3 RAC, or 11.2.0.2 non-RAC, or the terminal releases of 10g and 11.1 whether RAC or non-RAC, I’d like to hear from you which versions – if any – produce the same deadlock and which don’t. (And the scientific method being what it is, anyone with 11.2.0.2 on RAC on Linux might like to confirm – or contradict – my result.)

Update 8:30 p.m. GMT

The results in so far tend to support the idea that this may be a RAC-related issue. Justin Mitchell’s result is particularly interesting, and I confirmed on my 11.2.0.2 RAC system that if the two starting sessions are on different nodes then I don’t get the deadlock behaviour.

Users of RAC will probably be aware of the fact that v$lock.block commonly sets itself to 2 (potential blocker) on RAC the moment you acquire the lock; and this is what happens to both the “row exclusive” locks. When we request the exclusive lock, both TM locks change from block=2 to block=1 if they are on the same node; if they are on different nodes then only one of them changes.

Unfortunately Oleksandr didn’t see a deadlock when he tried RAC 11.2.0.3 – and since he didn’t make any comment about using multiple nodes we will have to wait to see if he can repeat the test and tell us whether his results are consistent with Justin’s, or whether they suggest that there are further considerations in effect.

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm UTC Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

Session 1
delete from t1 where id between 1 and 1000000;
1000000 rows deleted

                                                   Session 2
                                                   update jobs set status = 'STARTING' where id = 99;
                                                   1 row updated

Session 1
update jobs set status = 'FINISHED' where id = 99;
-- session 1 is now waiting on session 2

                                                   Session 2
                                                   delete from t1 where id between 1 and 1000000
                                                   -- session 2 is now waiting on session 1

Session 1 (some time within the next 3 seconds)
ORA-00060: deadlock detected

At this point most of the applications that I’ve seen would crash session 1, resulting in an implicit rollback of the million row update (worst case I’ve seen: session 2 ran for 10 seconds and resulted in session 1 crashing and rolling back for 3.5 hours); some applications would log “Unexpected Oracle error” to the front-end and retry the most recent action (at which point Session 2 would receive an ORA-00060 error and the two sessions would see-saw back and fore every 3 seconds until someone noticed what was going on).

In what way has the deadlock been “resolved” ?

In this specific case I think I’d probably want a supervisor (person or program) to log and kill session 2 and allow session 1 to retry its second update – and then I’d want to find out why this sequence of events had happened at all.

Deadlocks are NOT resolved automatically by Oracle, they are merely reported so that the client code can decide how to resolve them.

September 9, 2011

Row Lock Waits

Filed under: Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm UTC Sep 9,2011

Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.

Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?

A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.

There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)

August 29, 2011

Deadlock

Filed under: deadlocks,Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:25 pm UTC Aug 29,2011

Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

(more…)

April 8, 2011

Deadlock

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:57 pm UTC Apr 8,2011

Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):
(more…)

June 21, 2010

Lock Modes

Filed under: Infrastructure,Locks — Jonathan Lewis @ 7:05 pm UTC Jun 21,2010

I usually think about locks (and the meaning of the lmode and request columns in v$lock) in terms of numbers, and never seem to remember which number goes with which name – apart from mode 6 = exclusive. So I’ve finally put up a little list somewhere I’ll always be able to find it if I need it.
(more…)

February 15, 2010

Lock Horror

Filed under: Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 10:34 pm UTC Feb 15,2010

Here’s an extract from a  simple script I’ve run from time to time over the last 10 years to create a little testbed to check up on “foreign key locking”. I haven’t run it for some time (my notes say the last version I ran it on until very recently was 10.1.0.2). All it does is create a pair of tables with a foreign key constraint between them and an index protecting that constraint from the foreign key locking problem:

(more…)

April 19, 2009

Locking SQL

Filed under: Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:11 pm UTC Apr 19,2009

A question that comes up fairly regularly on OTN Database Forum and the comp.databases.oracle.server newsgroup is one that asks some variant of:

“When a session has some rows locked in a table, how can I find the statement that locked those rows ?”

“I have several sessions waiting for a TX lock, and have the SID of the blocking session, how do I find the blocking SQL ?”

(more…)

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 1,391 other followers