Oracle Scratchpad

March 11, 2015

Flashback Logging

Filed under: Flashback,Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 3:21 pm BST Mar 11,2015

One of the waits that is specific to ASSM (automatic segment space management) is the “enq: FB – contention” wait. You find that the “FB” enqueue has the following description and wait information when you query v$lock_type, and v$event_name:


SQL> execute print_table('select * from v$lock_type where type = ''FB''')
TYPE                          : FB
NAME                          : Format Block
ID1_TAG                       : tablespace #
ID2_TAG                       : dba
IS_USER                       : NO
DESCRIPTION                   : Ensures that only one process can format data blocks in auto segment space managed tablespaces

SQL> execute print_table('select * from v$event_name where name like ''enq: FB%''')
EVENT#                        : 806
EVENT_ID                      : 1238611814
NAME                          : enq: FB - contention
PARAMETER1                    : name|mode
PARAMETER2                    : tablespace #
PARAMETER3                    : dba
WAIT_CLASS_ID                 : 1893977003
WAIT_CLASS#                   : 0
WAIT_CLASS                    : Other

This tells us that a process will acquire the lock when it wants to format a batch of blocks in a segment in a tablespace using ASSM – and prior experience tells us that this is a batch of 16 consecutive blocks in the current extent of the segment; and when we see a wait for an FB enqueue we can assume that two session have simultaneously tried to format the same new batch of blocks and one of them is waiting for the other to complete the format. In some ways, this wait can be viewed (like the “read by other session” wait) in a positive light – if the second session weren’t waiting for the first session to complete the block format it would have to do the formatting itself, which means the end-user has a reduced response time. On the other hand the set of 16 blocks picked by a session is dependent on its process id, so the second session might have picked a different set of 16 blocks to format, which means in the elapsed time of one format call the segment could have had 32 blocks formatted – this wouldn’t have improved the end-user’s response time, but it would mean that more time would pass before another session had to spend time formatting blocks. Basically, in a highly concurrent system, there’s not a lot you can do about FB waits (unless, of course, you do some clever partitioning of the hot objects).

There is actually one set of circumstances where you can have some control of how much time is spent on the wait, but before I mention it I’d like to point out a couple more details about the event itself. First, the parameter3/id2_tag is a little misleading: you can work out which blocks are being formatted (if you really need to), but the “dba” is NOT a data block address (which you might think if you look at the name and a few values). There is a special case when the FB enqueue is being held while you format the blocks in the 64KB extents that you get from system allocated extents, and there’s probably a special case (which I haven’t bothered to examine) if you create a tablespace with uniform extents that aren’t a multiple of 16 blocks, but in the general case the “dba” consists of two parts – a base “data block address” and a single (hex) digit offset identifying which batch of 16 blocks will be formatted.

For example: a value of 0x01800242 means start at data block address 0x01800240, count forward 2 * 16 blocks then format 16 blocks from that point onwards. Since the last digit can only range from 0x0 to 0xf this means the first 7 (hex) digits of a “dba” can only reference 16 batches of 16 blocks, i.e. 256 blocks. It’s not coincidence (I assume) that a single bitmap space management block can only cover 256 blocks in a segment – the FB enqueue is tied very closely to the bitmap block.

So now it’s time to ask why this discussion of the FB enqueue appears in an article titled “Flashback Logging”. Enable the 10704 trace at level 10, along with the 10046 trace at level 8 and you’ll see. Remember that Oracle may have to log the old version of a block before modifying it and if it’s a block that’s being reused it may contribute to “physical reads for flashback new” – here’s a trace of a “format block” event:


*** 2015-03-10 12:50:35.496
ksucti: init session DID from txn DID:
ksqgtl:
        ksqlkdid: 0001-0023-00000014

*** 2015-03-10 12:50:35.496
*** ksudidTrace: ksqgtl
        ktcmydid(): 0001-0023-00000014
        ksusesdi:   0000-0000-00000000
        ksusetxn:   0001-0023-00000014
ksqgtl: RETURNS 0
WAIT #140627501114184: nam='db file sequential read' ela= 4217 file#=6 block#=736 blocks=1 obj#=192544 tim=1425991835501051
WAIT #140627501114184: nam='db file sequential read' ela= 674 file#=6 block#=737 blocks=1 obj#=192544 tim=1425991835501761
WAIT #140627501114184: nam='db file sequential read' ela= 486 file#=6 block#=738 blocks=1 obj#=192544 tim=1425991835502278
WAIT #140627501114184: nam='db file sequential read' ela= 522 file#=6 block#=739 blocks=1 obj#=192544 tim=1425991835502831
WAIT #140627501114184: nam='db file sequential read' ela= 460 file#=6 block#=740 blocks=1 obj#=192544 tim=1425991835503326
WAIT #140627501114184: nam='db file sequential read' ela= 1148 file#=6 block#=741 blocks=1 obj#=192544 tim=1425991835504506
WAIT #140627501114184: nam='db file sequential read' ela= 443 file#=6 block#=742 blocks=1 obj#=192544 tim=1425991835504990
WAIT #140627501114184: nam='db file sequential read' ela= 455 file#=6 block#=743 blocks=1 obj#=192544 tim=1425991835505477
WAIT #140627501114184: nam='db file sequential read' ela= 449 file#=6 block#=744 blocks=1 obj#=192544 tim=1425991835505985
WAIT #140627501114184: nam='db file sequential read' ela= 591 file#=6 block#=745 blocks=1 obj#=192544 tim=1425991835506615
WAIT #140627501114184: nam='db file sequential read' ela= 449 file#=6 block#=746 blocks=1 obj#=192544 tim=1425991835507157
WAIT #140627501114184: nam='db file sequential read' ela= 489 file#=6 block#=747 blocks=1 obj#=192544 tim=1425991835507684
WAIT #140627501114184: nam='db file sequential read' ela= 375 file#=6 block#=748 blocks=1 obj#=192544 tim=1425991835508101
WAIT #140627501114184: nam='db file sequential read' ela= 463 file#=6 block#=749 blocks=1 obj#=192544 tim=1425991835508619
WAIT #140627501114184: nam='db file sequential read' ela= 685 file#=6 block#=750 blocks=1 obj#=192544 tim=1425991835509400
WAIT #140627501114184: nam='db file sequential read' ela= 407 file#=6 block#=751 blocks=1 obj#=192544 tim=1425991835509841

*** 2015-03-10 12:50:35.509
ksqrcl: FB,16,18002c2
ksqrcl: returns 0

Note: we acquire the lock (ksqgtl), read 16 blocks by “db file sequential read”, write them to the flashback log (buffer), format them in memory, release the lock (ksqrcl). That lock can be held for quite a long time – in this case 13 milliseconds. Fortunately the all the single block reads after the first have been accelerated by O/S prefetching, your timings may vary.

The higher the level of concurrent activity the more likely it is that processes will collide trying to format the same 16 blocks (the lock is exclusive, so the second will request and wait, then find that the blocks are already formatted when it finally get the lock). This brings me to the special case where waits for the FB enqueue waits might have a noticeable impact … if you’re running parallel DML and Oracle decides to use “High Water Mark Brokering”, which means the parallel slaves are inserting data into a single segment instead of each using its own private segment and leaving the query co-ordinator to clean up round the edges afterwards. I think this is most likely to happen if you have a tablespace using fairly large extents and Oracle thinks you’re going to process a relatively small amount of data (e.g. small indexes on large tables) – the trade-off is between collisions between processes and wasted space from the private segments.

October 22, 2014

Deadlocks

Filed under: deadlocks,Locks,Oracle — Jonathan Lewis @ 6:34 pm BST Oct 22,2014

A recent question on the OTN forum asked about narrowing down the cause of deadlocks, and this prompted me to set up a little example. Here’s a deadlock graph of a not-quite-standard type:


Deadlock graph:
                                          ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name                             process session holds waits  process session holds waits
TX-00040001-000008EC-00000000-00000000         50     249     X             48       9           X
TX-000A001F-000008BC-00000000-00000000         48       9     X             50     249           S


My session (the one that dumped the trace file) is 249, and I was blocked by session 9. The slight anomaly, of course, is that I was waiting on a TX lock in mode 4 (Share) rather than the more common mode 6 (eXclusive).

There are plenty of notes on the web these days to tell you that this wait relates in some way to a unique index (or some associated referential integrity) or an ITL wait. (Inevitably there are a couple of other less frequently occurring and less well documented reasons, such as waits for tablespaces to change state but I’m going to ignore those for now). The question is, how do I tell whether this example is related to uniqueness (indexing) or ITLs ? For recent versions of Oracle the answer is in the rest of the trace file which now hold the recent wait history for the session that dumped the trace file.

Reading down my trace file, past the line which says “Information for THIS session”, I eventually get to this:


    Current Wait Stack:
     0: waiting for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn<<16 | slot=0xa001f, sequence=0x8bc
        wait_id=80 seq_num=81 snap_id=1
 

So it didn’t take me long to find out I had an ITL problem (which should be a pretty rare occurrence in newer versions of Oracle); but there’s more:

...

    There is at least one session blocking this session.
      Dumping 1 direct blocker(s):
        inst: 1, sid: 9, ser: 40192
      Dumping final blocker:
        inst: 1, sid: 9, ser: 40192
    There are 2 sessions blocked by this session.
    Dumping one waiter:
      inst: 1, sid: 357, ser: 7531
      wait event: 'enq: TX - allocate ITL entry'

...

    Session Wait History:
        elapsed time of 0.000035 sec since current wait
     0: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn&lt;&lt;16 | slot=0x5000c, sequence=0xa39
        wait_id=79 seq_num=80 snap_id=1
        wait times: snap=5.002987 sec, exc=5.002987 sec, total=5.002987 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000047 sec of elapsed time
     1: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn&lt;&lt;16 | slot=0xa001f, sequence=0x8bc
        wait_id=78 seq_num=79 snap_id=1
        wait times: snap=1 min 4 sec, exc=1 min 4 sec, total=1 min 4 sec
        wait times: max=1 min 4 sec
        wait counts: calls=22 os=22
        occurred after 0.000032 sec of elapsed time

...
     8: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn&lt;&lt;16 | slot=0x5000c, sequence=0xa39
        wait_id=71 seq_num=72 snap_id=1
        wait times: snap=5.001902 sec, exc=5.001902 sec, total=5.001902 sec
        wait times: max=5.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000042 sec of elapsed time
     9: waited for 'enq: TX - allocate ITL entry'
        name|mode=0x54580004, usn&lt;&lt;16 | slot=0xa001f, sequence=0x8bc
        wait_id=70 seq_num=71 snap_id=1
        wait times: snap=4.005342 sec, exc=4.005342 sec, total=4.005342 sec
        wait times: max=4.000000 sec
        wait counts: calls=2 os=2
        occurred after 0.000031 sec of elapsed time

...

    Sampled Session History of session 249 serial 3931
    ---------------------------------------------------

    The history is displayed in reverse chronological order.

    sample interval: 1 sec, max history 120 sec
    ---------------------------------------------------
      [9 samples,                                          11:14:50 - 11:14:58]
        waited for 'enq: TX - allocate ITL entry', seq_num: 81
          p1: 'name|mode'=0x54580004
          p2: 'usn&lt;= 8 sec (still in wait)
      [5 samples,                                          11:14:45 - 11:14:49]
        waited for 'enq: TX - allocate ITL entry', seq_num: 80
          p1: 'name|mode'=0x54580004
          p2: 'usn&lt;&lt;16 | slot'=0x5000c
          p3: 'sequence'=0xa39
          time_waited: 5.002987 sec (sample interval: 4 sec)
...

The little report that follows the initial wait state shows that the situation was a little messy – session 9 was my first and last blocker, but there was another session tangled up in the chain of waits, session 357.

Following this there’s a set of entries from my v$session_wait_history – and if you look carefully at the slot and sequence that appears on the second line of each wait you’ll notice that my waits have been alternating between TWO other sessions/transactions before I finally crashed.

Finally there’s a set of entries for my session extracted from v$active_session_history. (Question: I’m only allowed to query v$active_session_history if I’ve licensed the Diagnostic Pack – so should I shut my eyes when I get to this part of the trace file ;) This breakdown also shows my session alternating between waits on the two different blockers, giving me a pretty good post-event breakdown of what was going on around the time of the deadlock.

September 24, 2014

Lock Time

Filed under: AWR,Locks,Oracle,Statspack — Jonathan Lewis @ 12:42 pm BST Sep 24,2014

Here’s a little detail I was forced to re-learn yesterday; it’s one of those things where it’s easy to say “yes, obviously” AFTER you’ve had it explained so I’m going to start by posing it as a question. Here are two samples of PL/SQL that using locking to handle a simple synchronisation mechanism; one uses a table as an object that can be locked, the other uses Oracle’s dbms_lock package. I’ve posted the code for each fragment, and a sample of what you see in v$lock if two sessions execute the code one after the other:

Table locking – the second session to run this code will wait for the first session to commit or rollback:


begin
        lock table t1 in exclusive mode;
end;
/

ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00007FF409E57BF8 00007FF409E57C58         15 TM     157778          0          0          6         65          0
00007FF409E57BF8 00007FF409E57C58        125 TM     157778          0          6          0         91          1

Using dbms_lock.

variable m_handle       varchar2(255);

declare
        n1              number;
begin

        dbms_lock.allocate_unique(
                lockname        => 'Synchronize',
                lockhandle      => :m_handle
        );

        dbms_output.put_line(:m_handle);

        n1 := dbms_lock.request(
                lockhandle              => :m_handle,
                lockmode                => dbms_lock.x_mode,
                timeout                 => dbms_lock.maxwait,
                release_on_commit       => true         -- the default is false !!
        );

        dbms_output.put_line(n1);

end;
/

ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
000000008ED8F738 000000008ED8F790         15 UL 1073742427          0          0          6         42          0
000000008ED902B0 000000008ED90308        125 UL 1073742427          0          6          0        103          1

The big question is this – although the two code fragments produce the same effects in terms of lock waits and the reports from v$lock, what’s the big difference in the way that they are reported in the AWR report.

The high-level difference appears in the Time Model stats. Here are two extracts showing the difference:

Using dbms_lock.


Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time                                 65.4         99.9
PL/SQL execution elapsed time                            63.8         97.4

-> Captured SQL account for    2.8% of Total DB Time (s):              65
-> Captured PL/SQL account for   99.4% of Total DB Time (s):              65

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            63.7              1         63.72   97.3     .0     .0 10u1qbw4a27sp
Module: SQL*Plus
declare n1 number; begin dbms_lock.allocate_unique
( lockname => 'Synchronize', lockhandle
 => :m_handle ); dbms_output.put_line(:m_handle); n
1 := dbms_lock.request( lockhandle => :m_handle,

Table locking method:

Time Model Statistics

Statistic Name                                       Time (s) % of DB Time
------------------------------------------ ------------------ ------------
sql execute elapsed time                                 95.5         99.9
DB CPU                                                    0.9           .9
parse time elapsed                                        0.1           .1
hard parse elapsed time                                   0.1           .1
PL/SQL execution elapsed time                             0.1           .1

SQL ordered by Elapsed Time

-> Captured SQL account for   99.6% of Total DB Time (s):              96
-> Captured PL/SQL account for   98.7% of Total DB Time (s):              96

        Elapsed                  Elapsed Time
        Time (s)    Executions  per Exec (s)  %Total   %CPU    %IO    SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
            93.9              1         93.88   98.3     .0     .0 8apkdghttmndx
Module: SQL*Plus
begin lock table t1 in exclusive mode; end;

            93.9              1         93.88   98.3     .0     .0 29fwr53agvbc0
Module: SQL*Plus
LOCK TABLE T1 IN EXCLUSIVE MODE

The time spent waiting for the table lock is reported purely as SQL execution time in the Time Model stats; but the time spent waiting for the user-defined lock is reported as SQL execution time AND as PL/SQL execution time. I had forgotten this yesterday so, as I skipped through the various headline figures of an hourly snapshot, I was amazed to see the Time Model stats reporting 33,000 seconds of PL/SQL and 66,000 seconds of SQL – how on earth do you manage to do that much PL/SQL on any Oracle system. (To add to the embarrassment, it had only been a few moments earlier that I’d gone through the Top 5 Timed Events and said something like: “you can ignore all time spent on ‘enq: UL – contention’, it’s probably just synchronisation code”.

In this case the SQL ordered by Elapsed Time gives you a strong visual clue about what’s going on – but it won’t always be that obvious.

Bottom Line:

PL/SQL execution time includes the time spent waitng for UL locks, don’t forget that you may need to subtract wait time for ‘enq: UL – contention’ from the PL/SQL time before you start to worry about how much actual work you’re doing in PL/SQL.

 

July 13, 2014

Deferrable RI – 2

Filed under: Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 7:46 pm BST Jul 13,2014

A question came up on Oracle-L recently about possible locking anomalies with deferrable referential integrity constraints.

An update by primary key is taking a long time; the update sets several columns, one of which is the child end of a referential integrity constraint. A check on v$active_session_history shows lots of waits for “enq: TX – row lock contention” in mode 4 (share), and many of these waits also identify the current object as the index that has been created to avoid the “foreign key locking” problem on this constraint (though many of the waits show the current_obj# as -1). A possible key feature of the issue is that foreign key constraint is defined as “deferrable initially deferred”. The question is, could such a constraint result in TX/4 waits.

My initial thought was that if the constraint was deferrable it was unlikely, there would have to be other features coming into play.

Of course, when the foreign key is NOT deferrable it’s easy to set up cases where a TX/4 appears: for example you insert a new parent value without issuing a commit then I insert a new matching child, at that point my session will wait for your session to commit or rollback. If you commit my insert succeeds if you rollback my session raises an error (ORA-02291: integrity constraint (schema_name.constraint_name) violated – parent key not found). But if the foreign key is deferred the non-existence (or potential existence, or not) of the parent should matter.  If the constraint is deferrable, though, the first guess would be that you could get away with things like this so long as you fixed up the data in time for the commit.

I was wrong. Here’s a little example:


create table parent (
	id	number(4),
	name	varchar2(10),
	constraint par_pk primary key (id)
)
;

create table child(
	id_p	number(4)
		constraint chi_fk_par
		references parent
		deferrable initially deferred,
	id	number(4),
	name	varchar2(10),
	constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;

begin
	dbms_stats.gather_table_stats(user,'parent');
	dbms_stats.gather_table_stats(user,'child');
end;
/

pause Press return

update child set id_p = 3 where id_p = 2 and id = 2;

If you don’t do anything after the pause and before the insert then the update will succeed – but fail on a subsequent commit unless you insert parent 3 before committing. But if you take advantage of the pause to use another session to insert parent 3 first, the update will then hang waiting for the parent insert to commit or rollback – and what happens next may surprise you. Basically the deferrability doesn’t protect you from the side effects of conflicting transactions.

The variations on what can happen next (insert the parent elsewhere, commit or rollback) are interesting and left as an exercise.

I was slightly surprised to find that I had had a conversation about this sort of thing some time ago, triggered by a comment to an earlier post. If you want to read a more thorough investigation of the things that can happen and how deferrable RI works then there’s a good article at this URL.

 

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am BST Mar 13,2014

Here’s a lovely effect looking at v$lock (on 11.2.0.4)

select
        sid, type, id1, id2, lmode, request, ctime, block
from
        V$lock
where   sid in (
                select  sid
                from    V$session
                where   username = 'TEST_USER'
        )
order by
        sid, type desc
;

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
       145 TX     589833       7518          6          0        489          0
           TM      92275          0          2          6        489          1
           SK          7   25165955          6          0        489          0
           AE        100          0          4          0        582          0

       148 TX     524308       7383          6          0        490          0
           TM      92275          0          2          6        490          1
           SK          7   25173379          6          0        490          0
           AE        100          0          4          0        998          0

You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.

The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:


  SID  SEQ#     EVENT# EVENT                            P1         P2         P3  WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO
----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- --------------------------
  145     1        798 Wait for shrink lock              0          0          0         10           99954                      70137
          2        235 enq: TM - contention     1414332422      92275          0        600         6002022                        101
          3        798 Wait for shrink lock              0          0          0         10          100723                      69335
          4        235 enq: TM - contention     1414332422      92275          0        600         6001589                         81
          5        798 Wait for shrink lock              0          0          0         10          100500                      69542
          6        235 enq: TM - contention     1414332422      92275          0        600         6002352                         86
          7        798 Wait for shrink lock              0          0          0         10          100618                      69145
          8        235 enq: TM - contention     1414332422      92275          0        600         6001545                        144
          9        798 Wait for shrink lock              0          0          0         10          100996                      69445
         10        235 enq: TM - contention     1414332422      92275          0        600         6002744                        310

The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.

The alert log, by the way, looked like this:


Wed Mar 12 12:53:27 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:30 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
Wed Mar 12 12:53:40 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:43 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.

I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.

Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.

Strategy Note

If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on the indexes on the same table at the same time.

Footnote:

This article was prompted by the recent appearance of bug 18388128 on MoS.

May 2, 2013

v$lock

Filed under: Locks,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 7:03 pm BST May 2,2013

The problem of slow queries on v$lock just came up again on the OTN database forum, so I thought I’d better push out a post that’s been hanging around on my blog for the last few months. This is actually mentioned in MOS in note 1328789.1: “Query Against v$lock Run from OEM Performs Slowly” which points out that it is basically a problem of bad statistics and all you have to do is collect the stats.

(more…)

April 13, 2013

Deadlocks

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 11:36 am BST Apr 13,2013

Here’s a deadlock graph that might cause a little confusion:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
 Resource Name          process session holds waits  process session holds waits
 TX-001a0015-00014787        34      90     X             32       3           S
 TX-00190008-0000601b        32       3     X             34      90           S

 session 90: DID 0001-0022-00000327      session 3: DID 0001-0020-000009E9
 session 3: DID 0001-0020-000009E9       session 90: DID 0001-0022-00000327

 Rows waited on:
  Session 90: obj - rowid = 00030CE5 - AAAwzlAAGAABDiuAAA
  (dictionary objn - 199909, file - 6, block - 276654, slot - 0)
  Session 3: obj - rowid = 00030CE5 - AAAwzlAAGAABp8gAAA
  (dictionary objn - 199909, file - 6, block - 433952, slot - 0)

(more…)

March 20, 2013

Lock Bug

Filed under: Bugs,deadlocks,Locks,Oracle — Jonathan Lewis @ 12:08 pm BST Mar 20,2013

Here’s an oddity that I ran into a little while ago while trying to prepare a sample trace file showing a particular locking pattern; it was something that I’d done before, but trace files can change with different versions of Oracle so I decided to use a copy of 11.2.0.2 that happened to be handy at the time to check if anything had changed since the previous (11gR1) release. I never managed to finish the test; here are the steps I got through:
(more…)

February 22, 2013

Deadlock Detection

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm BST Feb 22,2013

By some strange coincidence, the “London Bus” effect perhaps, there have been three posts on the OTN database forum in the last couple of days relating to deadlocks; and they have prompted me to indulge in a little rant about the myth of Oracle and deadlock detection; it’s the one that goes:

“Oracle detects and resolves deadlocks automatically.”

Oracle may detect deadlocks automatically, but it doesn’t resolve them, it simply reports them (by raising error ORA-00060 and rolling back one statement) then leaves the deadlocked sessions stuck until the session that received the report resolves the problem or an external agent resolves the problem.

Consider the following example (which, I have to admit, I wrote without access to a live instance):

(more…)

September 9, 2011

Row Lock Waits

Filed under: Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:19 pm BST Sep 9,2011

Here’s one I keep forgetting – and spending 15 minutes trying to think of the answer before getting to the “deja vu” point again. I’ve finally decided that I’ve got to write the answer down because that will save me about 14 minutes the next time I forget.

Q. In a Statspack or AWR report there is a section titles “Segments by Row Lock Waits”. Why could an index be subject to a Row Lock Wait ?

A. Try inserting into a table from two different sessions (without committing) two rows with the same primary key. The second insert will wait on event enq: TX – row lock contention, and show up in v$lock with a lock request for a TX lock in mode 4. When you issue a commit or rollback on the first session, and the second statement errors or completes (depending on whether you commit or rollback the first session) it will increase the value for row lock waits in v$segstat (and v$segment_statistics) for the index by 1.

There are variations on the theme, of course, but the key feature is uniqueness with one session waiting for another session to commit or rollback on a conflicting value. This includes cases of foreign key constraint checking such as inserting a child for a parent that has been deleted but not committed (and there’s an interesting anomaly with that scenario which – in 10g, at least – reports more row lock waits on the parent PK than you might expect.)

August 29, 2011

Deadlock

Filed under: deadlocks,Indexing,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:25 pm BST Aug 29,2011

Here’s a deadlock graph the appeared on Oracle-L and OTN a couple of days ago.

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-001a0002-0002a0fe       196     197     X            166    1835           S
TM-0000c800-00000000       166    1835    SX            196     197    SX   SSX

It’s a little unusual because instead of the common TX mode 6 (eXclusive) crossover we have one TX and one TM lock, the TX wait is for mode 4 (S) and the TM wait is for a conversion from 3 (SX) to 5 (SSX).

(more…)

April 8, 2011

Deadlock

Filed under: deadlocks,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 5:57 pm BST Apr 8,2011

Here’s an example of a slightly less common data deadlock graph (dumped from 11gR2, in this case):
(more…)

June 21, 2010

Lock Modes

Filed under: Infrastructure,Locks — Jonathan Lewis @ 7:05 pm BST Jun 21,2010

I usually think about locks (and the meaning of the lmode and request columns in v$lock) in terms of numbers, and never seem to remember which number goes with which name – apart from mode 6 = exclusive. So I’ve finally put up a little list somewhere I’ll always be able to find it if I need it.
(more…)

February 15, 2010

Lock Horror

Filed under: Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 10:34 pm BST Feb 15,2010

Here’s an extract from a  simple script I’ve run from time to time over the last 10 years to create a little testbed to check up on “foreign key locking”. I haven’t run it for some time (my notes say the last version I ran it on until very recently was 10.1.0.2). All it does is create a pair of tables with a foreign key constraint between them and an index protecting that constraint from the foreign key locking problem:

(more…)

April 19, 2009

Locking SQL

Filed under: Locks,Oracle,Troubleshooting — Jonathan Lewis @ 6:11 pm BST Apr 19,2009

A question that comes up fairly regularly on OTN Database Forum and the comp.databases.oracle.server newsgroup is one that asks some variant of:

“When a session has some rows locked in a table, how can I find the statement that locked those rows ?”

“I have several sessions waiting for a TX lock, and have the SID of the blocking session, how do I find the blocking SQL ?”

(more…)

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,390 other followers