Oracle Scratchpad

November 14, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 6:16 pm GMT Nov 14,2014

If you start moving objects around to try and reclaim space in a tablespace there are all sorts of little traps that make it harder than you might hope to get the maximum benefit with the minimum effort.  I’ve written a couple of notes in the past about how to proceed and, more recently, one of the difficulties involved. This is just a brief note about a couple of ideas to make life a little easier.

  • Check that you’ve emptied the recyclebin before you start
  • Before you try moving/rebuilding an object check that the total free space “below” that object is greater than  the size of the object or you’ll find that parts of the object move “up” the tablespace.
  • Before moving a table, mark its indexes as unusable. If you do this then (in recent versions of Oracle) the default behaviour is for the index space be freed as the segment vanishes and you may find that the extents of the table can move further “down” the tablespace.  (If you’ve kept tables and indexes in different tablespaces this is irrelevant, of course).
  • When you move an object think a little carefully about whether specifying an minimum initial extent size would help or hinder the move.
  • Don’t assume that moving the “highest” object first is the best strategy – work out where you expect the final tablespace HWM to be and you may find that it makes more sense to move other objects that are above the point first.
  • Moving objects with several small (64KB) extents first may allow you to free up larger (1MB, 8MB) gaps that can be used by other larger objects”””””
  • Creating a new tablespace and moving objects to it from the old tablespace “top down” may be the quickest way to proceed.  Work towards dropping the old tablespace HWM regularly.

 

November 7, 2014

Quiz night

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 6:37 pm GMT Nov 7,2014

Prompted by an email from Yves Colin (who’ll be presenting on the Tuesday of UKOUG Tech14) I was prompted to dig out a little script I wrote some years ago and re-run an old test, leading to this simple question:  what’s the largest size array insert that Oracle will handle ?

If you’re tempted to answer, watch out – it’s not exactly a trick question, but there is a bit of a catch.

Answer:

There is an internal limit of 255 on the size of array that Oracle can insert as a unit. I leave it as an exercise to the reader to decide whether or not this makes any significant difference to performance, since the effects of row size, number of indexes maintained, and disk I/O requirements may make the effects of the limit virtually undetectable. To demonstrate the limit all we need do is insert a few hundred rows into a table and do a block dump, optionally followed by a dump of the redo log.


create table t1 (n1 number not null) segment creation immediate;

alter system switch logfile;

insert into t1 select rownum from all_objects where rownum <= 1000;
commit;

execute dump_log

start rowid_count t1

/*
   FILE_NO   BLOCK_NO ROWS_IN_BLOCK
---------- ---------- -------------
         5        180           660
         5        184           340

2 rows selected.


ROWS_IN_BLOCK     BLOCKS
------------- ----------
          340          1
          660          1

2 rows selected.
*/

alter system flush buffer_cache;

-- SQL> alter system dump datafile 5 block 180;

The dump_log procedure is simply a pl/sql wrapper for a call to ‘alter system dump logfile {current log}'; I the script rowid_count.sql extracts the file and block numbers from rowids in the given table and aggregates them in different ways.  The reason for running the script is to find a table block with a lot of rows in it; the block I dumped actually held the first 660 rows of the insert. Here’s a tiny extract from the block dump (with one little comment added):


tab 0, row 0, @0x1904			-- 6,400 dec
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 02
tab 0, row 1, @0x190a
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 03
tab 0, row 2, @0x1910
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1
col  0: [ 2]  c1 04
tab 0, row 3, @0x1916
tl: 6 fb: --H-FL-- lb: 0x1  cc: 1

This shows the first four rows inserted, and you can see that the offset to the first row is 6,400 bytes, and then each subsequent row is 6 bytes further down the block (when the numbers get larger the row lengths will increase to 7 bytes). The positioning of these rows is, at first sight, a little odd – you might have guessed that they would either start at offset “zero” and work down to offset “8K”, or start at “8K” and work backwards up to zero – why is the first row 3/4 of the way down the block ?

Rather than show you more row dumps, I’ll give you a carefully selected extract from the row directory:


0x12:pri[0]     offs=0x1904
0x14:pri[1]     offs=0x190a

...

0x20c:pri[253]  offs=0x1f8a
0x20e:pri[254]  offs=0x1f91		End of first 255 rows
0x210:pri[255]  offs=0x120e		Start of second 255 rows
0x212:pri[256]  offs=0x1215

...

0x40a:pri[508]  offs=0x18f6
0x40c:pri[509]  offs=0x18fd		End of second 255 rows
0x40e:pri[510]  offs=0xdf5		Start of last 150 rows
0x410:pri[511]  offs=0xdfc

...

0x536:pri[658]  offs=0x1200		End of last 150 rows
0x538:pri[659]  offs=0x1207

The first 255 rows inserted are stacked at the bottom of the block at offsets 0x1904 to 0x1f91.
The second 255 rows inserted are stacked above them at offsets 0x120e to 0x18fd (note 0x18fd + 6 = 0x1903)
The last 150 rows inserted are stack above them at offsets 0xdf5 to 0x1207 (note 0x1207 + 6 = 0x120d)

No matter how large your attempted array insert, the maximum number of rows (or index entries) Oracle can insert into a block in a single internal array operation is 255.

Further corroboration comes from the redo log dump – here’s a tiny bit of a single change vector (i.e. a single atomic change to a single Oracle block) from the redo generated while this insert was going on:


CHANGE #18 CON_ID:0 TYP:0 CLS:1 AFN:5 DBA:0x014000c7 OBJ:95876 SCN:0x0000.007528dd SEQ:1 OP:11.11 ENC:0 RBL:0 FLG:0x0000
KTB Redo
op: 0x01  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0007.007.000009f2    uba: 0x01000bda.02a9.14
KDO Op code: QMI row dependencies Disabled
  xtype: XA flags: 0x00000000  bdba: 0x014000c7  hdba: 0x01400083
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 lock: 1 nrow: 255
slot[0]: 0
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 02
slot[1]: 1
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 03
slot[2]: 2
tl: 6 fb: --H-FL-- lb: 0x0  cc: 1
col  0: [ 2]  c1 04

Although the CON_ID in the line 1 tells you this dump is from 12c the same limit holds across all (modern) versions of Oracle. The operation (OP 11.11 – in line 1) is “Insert Multiple Rows”, and at line 9 you can see: “nrow: 255″. I haven’t included the rest of the change vector, but all it does is show the remaining 252 rows.

October 1, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 2:55 pm GMT Oct 1,2014

In a comment on my previous post on shrinking tablespaces Jason Bucata and Karsten Spang both reported problems with small objects that didn’t move to the start of the tablespace. This behaviour is inevitable with dictionary managed tablespaces (regardless of the size of the object), but I don’t think it’s likely to happen with locally managed tablespaces if they’ve been defined with uniform extent sizes. Jason’s comment made me realise, though, that I’d overlooked a feature of system allocated tablespaces that made it much harder to move objects towards the start of file. I’ve created a little demo to illustrate the point.

I created a new tablespace as locally managed, ASSM, and auto-allocate, then created a few tables or various sizes. The following minimal SQL query reports the resulting extents in block_id order, adding in a “boundary_1m” column which subtracts 128 blocks (1MB) from the block_id, then divides by 128 and truncates to show which “User Megabyte” in the file the extent starts in.  (Older versions of Oracle typically have an 8 block space management header, recent versions expanded this from 64KB to 1MB – possibly as a little performance aid to Exadata).


select
        segment_name, block_id, blocks , trunc((block_id - 128)/128) boundary_1M
from
        dba_extents where owner = 'TEST_USER'
order by
        block_id
;

SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T1                              128       1024           0
T1                             1152       1024           8
T2                             2176       1024          16
T2                             3200       1024          24
T3                             4224          8          32
T4                             4232          8          32
T5                             4352        128          33

As you can see t3 and t4 are small tables – 1 extent of 64KB each – and t5, which I created after t4, starts on the next 1MB boundary. This is a feature of auto-allocate: not only are extents (nearly) fixed to a small number of possible extent sizes, the larger extents are restricted to starting on 1MB boundaries and the 64KB extents are used preferentially to fill in odd-sized” holes. To show the impact of this I’m going to drop table t1 (at the start of file) to make some space.


SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T2                             2176       1024          16
T2                             3200       1024          24
T3                             4224          8          32
T4                             4232          8          32
T5                             4352        128          33

Now I’ll move table t3 – hoping that it will move to the start of file and use up some of the space left by t1. However there’s a 1MB area (at boundary 32) which is partially used,  so t3 moves into that space rather than creating a new “partly used” megabyte.


SEGMENT_NAME               BLOCK_ID     BLOCKS BOUNDARY_1M
------------------------ ---------- ---------- -----------
T2                             2176       1024          16
T2                             3200       1024          24
T4                             4232          8          32
T3                             4240          8          32
T5                             4352        128          33

It’s a little messy trying to clear up the tiny fragments and make them do what you want. In this case you could, for example, create a dummy table with storage(initial 64K next 64K minextents 14) to use up all the space in the partly used megabyte, then move t3 – which should go to the start of file – then move table t4 – which should go into the first partly-used MB (i.e. start of file) rather than taking up the hole left by t3.

Even for a trivial example it’s messy – imagine how difficult it can get to cycle through building and dropping suitable dummy tables and move objects in the right order when you’ve got objects with several small extents scattered through the file, and objects with a mixture of small extents and large extents.

September 21, 2014

Partitioned Clusters

Filed under: 12c,clusters,Infrastructure,Oracle — Jonathan Lewis @ 7:28 pm GMT Sep 21,2014

In case you hadn’t noticed it, partitioning has finally reached clusters in 12c – specifically 12.1.0.2. They’re limited to hash clusters with range partitioning, but it may be enough to encourage more people to use the technology. Here’s a simple example of the syntax:


create cluster pt_hash_cluster (
        id              number(8,0),
        d_date          date,
        small_vc        varchar2(8),
        padding         varchar2(100)
)
-- single table
hashkeys 10000
hash is id
size 700
partition by range (d_date) (
        partition p2011Jan values less than (to_date('01-Feb-2011','dd-mon-yyyy')),
        partition p2011Feb values less than (to_date('01-Mar-2011','dd-mon-yyyy')),
        partition p2011Mar values less than (to_date('01-Apr-2011','dd-mon-yyyy'))
)
;

I’ve been waiting for them to appear ever since 11.2.0.1 and the TPC-C benchmark that Oracle did with them – they’ve been a long time coming (check the partition dates – that gives you some idea of when I wrote this example).

Just to add choice (a.k.a. confusion) 12.1.0.2 has also introduce attribute clustering so you can cluster data in single tables without creating clusters – but only while doing direct path loads or table moves. The performance intent is similar, though the technology and circumstances of use are different.

September 19, 2014

Shrink Tablespace

Filed under: fragmentation,Infrastructure,Oracle — Jonathan Lewis @ 12:10 pm GMT Sep 19,2014

A recent question on the OTN database forum raised the topic of returning free space in a tablespace to the operating system by rebuilding objects to fill the gaps near the start of files and leave the empty space at the ends of files so that the files could be resized downwards.

This isn’t a process that you’re likely to need frequently, but I have written a couple of notes about it, including a sample query to produce a map of the free and used space in a tablespace. While reading the thread, though, it crossed my mind that recent versions of Oracle introduced a feature that can reduce the amount of work needed to get the job done, so I thought I’d demonstrate the point here.

When you move a table its indexes become unusable and have to be rebuilt; but when an index becomes unusable, the more recent versions of Oracle will drop the segment. Here’s a key point – if the index becomes unusable because the table has been moved the segment is dropped only AFTER the move has completed. Pause a minute for thought and you realise that the smart thing to do before you move a table is to make its indexes unusable so that they release their space BEFORE you move the table. (This strategy is only relevant if you’ve mixed tables and indexes in the same tablespace and if you’re planning to do all your rebuilds into the same tablespace rather than moving everything into a new tablespace.)

Here are some outputs demonstrating the effect in a 12.1.0.2 database. I have created (and loaded) two tables in a tablespace of 1MB uniform extents, 8KB block size; then I’ve created indexes on the two tables. Running my ts_hwm.sql script I get the following results for that tablespace:


FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE
------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 TEST_USER  T1              TABLE
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1_I1           INDEX
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Notice that it was a nice new tablespace, so I can see the two tables followed by the two indexes at the start of the tablespaces. If I now move table t1 and re-run the script this is what happens:


alter table t1 move;

FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE
------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 free       free
                512         639 TEST_USER  T2_I1           INDEX
                640         767 TEST_USER  T1              TABLE
                768      65,535 free       free

Table t1 is now situated past the previous tablespace highwater mark and I have two gaps in the tablespace where t1 and the index t1_i1 used to be.

Repeat the experiment from scratch (drop the tables, purge, etc. to empty the tablespace) but this time mark the index unusable before moving the table and this is what happens:


FILE_ID    BLOCK_ID   END_BLOCK OWNER      SEGMENT_NAME    SEGMENT_TYPE
------- ----------- ----------- ---------- --------------- ------------------
      5         128         255 free       free
                256         383 TEST_USER  T2              TABLE
                384         511 TEST_USER  T1              TABLE
                512         639 TEST_USER  T2_I1           INDEX
                640      65,535 free       free

Table t1 has moved into the space vacated by index t1_i1, so the tablespace highwater mark has not moved up.

If you do feel the need to reclaim space from a tablespace by rebuilding objects, you can find that it’s quite hard to decide the order in which the objects should be moved/rebuilt to minimise the work you (or rather, Oracle) has to do; if you remember that any table you move will release its index space anyway and insert a step to mark those indexes unusable before you move the table you may find it’s much easier to work out a good order for moving the tables.

Footnote: I appreciate that some readers may already take advantage of the necessity of rebuilding indexes by dropping indexes before moving tables – but I think it’s a nice feature that we can now make them unusable and get the same benefit without introducing a risk of error when using a script to recreate an index we’ve dropped.

 

September 8, 2014

ASSM Truncate.

Filed under: ASSM,Infrastructure,Oracle,Troubleshooting — Jonathan Lewis @ 11:34 am GMT Sep 8,2014

Here’s one that started off with a tweet from Kevin Closson, heading towards a finish that shows some interesting effects when you truncate large objects that are using ASSM. To demonstrate the problem I’ve set up a tablespace using system allocation of extents and automatic segment space management (ASSM).  It’s the ASSM that causes the problem, but it requires a mixture of circumstances to create a little surprise.


create
	tablespace test_8k_auto_assm
	datafile	-- OMF
	SIZE 1030M
	autoextend off
	blocksize 8k
	extent management local
	autoallocate
	segment space management auto
;

create table t1 (v1 varchar2(100)) pctfree 99 tablespace test_8k_auto_assm storage(initial 1G);

insert into t1 select user from dual;
commit;

alter system flush buffer_cache;

truncate table t1;

I’ve created a table with an initial definition of 1GB, which means that (in a clean tablespace) the autoallocate option will jump straight to extents of 64MB, with 256 table blocks mapped per bitmap block for a total of 32 bitmap blocks in each 64MB extent. Since I’m running this on 11.2.0.4 and haven’t included “segment creation immediate” in the definition I won’t actually see any extents until I insert the first row.

So here’s the big question – when I truncate this table (using the given command) how much work will Oracle have to do ?

Exchanging notes over twitter (140 char at a time) and working from a model of the initial state, it took a little time to get to understand what was (probably) happening and then produce this silly example – but here’s the output from a snapshot of v$session_event for the session across the truncate:


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
local write wait                                    490           0          83.26        .170          13
enq: RO - fast object reuse                           2           0         104.90      52.451         105
db file sequential read                              47           0           0.05        .001           0
db file parallel read                                 8           0           0.90        .112           0
SQL*Net message to client                            10           0           0.00        .000           0
SQL*Net message from client                          10           0           0.67        .067         153
events in waitclass Other                             2           0           0.04        .018         109

The statistic I want to highlight is the number recorded against “local write wait”: truncating a table of one row we wait for 490 blocks to be written! We also have 8 “db file parallel read”  waits which, according to a 10046 trace file, were reading hundreds of blocks. (I think the most significant time in this test – the RO enqueue wait – may have been waiting for the database writer to complete the work needed for an object checkpoint, but I’m not sure of that.)

The blocks written were the space management bitmap blocks for the extent(s) that remained after the truncate – even the ones that referenced extents above the high water mark for the table. Since we had set the tables initial storage to 1GB, we had a lot of bitmap blocks. At 32 per extent and 16 extents (64MB * 16 = 1GB) we might actually expect something closer to 512 blocks, but actually Oracle had formatted the last extent with only 8 space management blocks. and the first extent had an extra 2 to cater for the level 2 bitmap lock and segment header block giving: 32 * 15 + 8 + 2 = 490.

As you may have seen above, the impact on the test that Kevin was doing was quite dramatic – he had set the initial storage to 128GB (lots of bitmap blocks), partitioned the table (more bitmap blocks) and was running RAC (so the reads were running into waits for global cache grants).

I had assumed that this type of behaviour happened only with the “reuse storage” option of the truncate command: and I hadn’t noticed before that it also appeared even if you didn’t reuse storage – but that’s probably because the effect applies only to the bit you keep, which may typically mean a relatively small first extent. It’s possible, then, that in most cases this is an effect that isn’t going to be particularly visible in production systems – but if it is, can you work around it ? Fortunately another tweeter asked the question “What happens if you ‘drop all storage?'” Here’s the result from adding that clause to my test case:


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
enq: RO - fast object reuse                           1           0           0.08        .079           0
log file sync                                         1           0           0.03        .031           0
db file sequential read                              51           0           0.06        .001           0
SQL*Net message to client                            10           0           0.00        .000           0
SQL*Net message from client                          10           0           0.56        .056         123
events in waitclass Other                             3           0           0.87        .289         186


Looking good – if you don’t keep any extents you don’t need to make sure that their bitmaps are clean. (The “db file sequential read” waits are almost all about the data dictionary, following on from my “flush buffer cache”).

Footnote 1: the same effect appears in 12.1.0.2
Footnote 2: it’s interesting to note that the RO enqueue wait time seems to parallel the local write wait time: perhaps a hint that there’s some double counting going on. (To be investigated, one day).

August 19, 2014

LOB Length

Filed under: Infrastructure,LOBs,Oracle,Performance — Jonathan Lewis @ 6:06 pm GMT Aug 19,2014

It’s funny how you can make little savings in work all over the place in Oracle if you’re prepared to look a little closely at what’s going on. Here’s a quirky little example with LOBs and function calls that might just have some greater relevance in other situations. Here’s a little data set, and two queries that I might run against it:


create table tbl(
	c1      clob
)
lob (c1) store as c_lob(
	disable storage in row
	nocache nologging
)
;

begin
	for i in 1..128 loop
		insert into tbl values(rpad('x',4000));
		commit;
	end loop;
end;
/

commit;

-- collect stats, prime dictionary cache and library cache
-- run both queries twice and check stats on second run

select
	round(sum(ceil(len/8100)) * 8/1024,0)    used_mb
from
	(
	select
		/*+ no_merge */
		dbms_lob.getlength(c1) len
	from
		tbl
	)
where
	len > 3960
;

select
	round(sum(ceil(len/8100)) * 8/1024,0)    used_mb
from
	(
	select
		rownum rn, dbms_lob.getlength(c1) len
	from
		tbl
	)
where
	len > 3960
;

The question that you might ask yourselves when you see these queries is: will they do similar amounts of work. Of course, I wouldn’t be asking the question if the answer were yes. Despite the no_merge() hint, which you might think would have the same effect as the rownum approach, Oracle seems to execute the call to dbms_lob.getlength() twice for each row in the first query, but only once per row for the second query. Here are the stats (from autotrace) on the second run of the two queries when autotrace is enabled:


Statistics (for no_merge)
----------------------------------------------------------
         40  recursive calls
          0  db block gets
        271  consistent gets
          0  physical reads
          0  redo size
        541  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Statistics (for rownum)
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        131  consistent gets
          0  physical reads
          0  redo size
        541  bytes sent via SQL*Net to client
        544  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

As you can see, the consistent gets for the no_merge() approach is roughly double that for the rownum approach – and since we have 128 rows/LOBs in the table that looks suspiciously like 2 gets vs. 1 get per LOB depending on the approach – which suggests two calls to the function. This is further corroborated by the execution plans, and especially by the predicate sections (how often have I said “always check the predicates”) which show that the predicate has been pushed inside the view that’s been hinted to be non-mergeable, but it hasn’t been pushed inside the view that uses the rownum instantion trick:


Execution Plan for no_merge()
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE     |      |     1 |    13 |            |          |
|   2 |   VIEW              |      |     6 |    78 |     2   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| TBL  |     6 |   522 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("DBMS_LOB"."GETLENGTH"("C1")>3960)

Execution Plan for rownum
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE      |      |     1 |    13 |            |          |
|*  2 |   VIEW               |      |   128 |  1664 |     2   (0)| 00:00:01 |
|   3 |    COUNT             |      |       |       |            |          |
|   4 |     TABLE ACCESS FULL| TBL  |   128 | 11136 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("LEN">3960)

My first thought on seeing this difference was to apply the /*+ no_push_pred */ hint to block predicate pushing – but then I remembered that the hint is about join predicate pushing and this is a simple filter push. A quick search of the hidden parameters, though, revealed this:

_optimizer_filter_pushdown : enable/disable filter predicate pushdown

Setting this parameter to false – either through a call to ‘alter session’ or through an /*+ opt_param( opt_param(‘_optimizer_filter_pushdown’ , ‘false’) */ hint – allowed the no_merge approach to produce the same plan and resource usage as the rownum approach. Of course, for a production system, I’d probably use the rownum approach rather than mess around with hidden parameters.

Footnote:

I don’t know why the code with the no_merge() approach reported 40 recursive calls (on its first execution with autotrace). A couple of variations on the experiment suggested that it had something to do with the number of rows (or consequential buffer visits) that survived the predicate call – for a sufficiently small number of rows the recursive call count happened to drop to zero; but the phenomenon needs further investigation.

July 13, 2014

Deferrable RI – 2

Filed under: Infrastructure,Locks,Oracle,Troubleshooting — Jonathan Lewis @ 7:46 pm GMT Jul 13,2014

A question came up on Oracle-L recently about possible locking anomalies with deferrable referential integrity constraints.

An update by primary key is taking a long time; the update sets several columns, one of which is the child end of a referential integrity constraint. A check on v$active_session_history shows lots of waits for “enq: TX – row lock contention” in mode 4 (share), and many of these waits also identify the current object as the index that has been created to avoid the “foreign key locking” problem on this constraint (though many of the waits show the current_obj# as -1). A possible key feature of the issue is that foreign key constraint is defined as “deferrable initially deferred”. The question is, could such a constraint result in TX/4 waits.

My initial thought was that if the constraint was deferrable it was unlikely, there would have to be other features coming into play.

Of course, when the foreign key is NOT deferrable it’s easy to set up cases where a TX/4 appears: for example you insert a new parent value without issuing a commit then I insert a new matching child, at that point my session will wait for your session to commit or rollback. If you commit my insert succeeds if you rollback my session raises an error (ORA-02291: integrity constraint (schema_name.constraint_name) violated – parent key not found). But if the foreign key is deferred the non-existence (or potential existence, or not) of the parent should matter.  If the constraint is deferrable, though, the first guess would be that you could get away with things like this so long as you fixed up the data in time for the commit.

I was wrong. Here’s a little example:


create table parent (
	id	number(4),
	name	varchar2(10),
	constraint par_pk primary key (id)
)
;

create table child(
	id_p	number(4)
		constraint chi_fk_par
		references parent
		deferrable initially deferred,
	id	number(4),
	name	varchar2(10),
	constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Simon');
insert into child values(1,2,'Sally');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;

begin
	dbms_stats.gather_table_stats(user,'parent');
	dbms_stats.gather_table_stats(user,'child');
end;
/

pause Press return

update child set id_p = 3 where id_p = 2 and id = 2;

If you don’t do anything after the pause and before the insert then the update will succeed – but fail on a subsequent commit unless you insert parent 3 before committing. But if you take advantage of the pause to use another session to insert parent 3 first, the update will then hang waiting for the parent insert to commit or rollback – and what happens next may surprise you. Basically the deferrability doesn’t protect you from the side effects of conflicting transactions.

The variations on what can happen next (insert the parent elsewhere, commit or rollback) are interesting and left as an exercise.

I was slightly surprised to find that I had had a conversation about this sort of thing some time ago, triggered by a comment to an earlier post. If you want to read a more thorough investigation of the things that can happen and how deferrable RI works then there’s a good article at this URL.

 

June 17, 2014

Cluster Nulls

Filed under: clusters,Indexing,Infrastructure,NULL,Oracle — Jonathan Lewis @ 7:39 am GMT Jun 17,2014

Yesterday’s posting was a reminder that bitmap indexes, unlike B-tree indexes in Oracle,  do store entries where every column in the index is null. The same is true for cluster indexes – which are implemented as basic B-tree indexes. Here’s a test case I wrote to demonstrate the point.

drop table tc1;
drop cluster c including tables;

purge recyclebin;

create cluster c(val number);
create index c_idx on cluster c;
create table tc1 (val number, n1 number, padding varchar2(100)) cluster c(val);

insert into tc1
select
	decode(rownum,1,to_number(null),rownum), rownum, rpad('x',100)
from
	all_objects
where
	rownum <= 100
;

insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
insert into tc1 select * from tc1;
commit;

analyze cluster c compute statistics;
execute dbms_stats.gather_table_stats(user,'tc1');

set autotrace traceonly explain

select * from tc1 where val = 2;
select * from tc1 where val is null;

set autotrace off

Here are the two execution plans from the above queries:


------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |    32 |  3424 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS CLUSTER| TC1   |    32 |  3424 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN  | C_IDX |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VAL"=2)

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    32 |  3424 |    14   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| TC1  |    32 |  3424 |    14   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VAL" IS NULL)

Spot the problem: the second query doesn’t use the index. So, despite the fact that I said that fully null index entries are stored in cluster indexes you might (as the first obvious question) wonder whether or not I was right. So here’s a piece of the symbolic dump of the index.


kdxconro 100                -- Ed: 100 entries (rows) in the leaf block

row#0[8012] flag: -----, lock: 0, data:(8):  02 00 02 0b 00 00 01 00
col 0; len 2; (2):  c1 03

row#1[7999] flag: -----, lock: 0, data:(8):  02 00 02 0c 00 00 01 00
col 0; len 2; (2):  c1 04

...

row#98[6738] flag: -----, lock: 2, data:(8):  02 00 02 6d 00 00 01 00
col 0; len 2; (2):  c2 02

row#99[8025] flag: -----, lock: 0, data:(8):  02 00 02 0a 00 00 01 00
col 0; NULL

The NULL entry is right there – sorted as the high value – just as it is in a bitmap index. But the optimizer won’t use it, even if you hint it.

Of course, Oracle uses it internally when inserting rows (how else, otherwise, would it rapidly find which the heap block needed for the next NULL insertion) – but it won’ use it to retrieve the data, it always uses a full table (cluster) scan. That’s really a little annoying if you’ve made the mistake of allowing nulls into your cluster.

There is a workaround, of course – in this case (depending on version) you could create a virtual column with index or a function-based index that (for example) uses the nvl2() function to convert nulls to a know value and all non-null entries to null; this would give you an index on just the original nulls which would be as small as possible and also have a very good clustering_factor. You’d have to change the code from “column is not null” to a predicate that matched the index, though. For example:


create index tc1_null on tc1(nvl2(val,null,0));

execute dbms_stats.gather_table_stats(user,'tc1', method_opt=>'for all hidden columns size 1');

select * from tc1 where nvl2(val,null,0) = 0;

----------------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |    32 |  3456 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TC1      |    32 |  3456 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | TC1_NULL |    32 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(NVL2("VAL",NULL,0)=0)

I’ve included the compress keyword (which implicitly means “all columns” for a non-unique index) in the definition since you only need 4 repetitions of the single-byte entry that is the internal representation of zero before you start to win on the storage trade-off – but since it’s likely to be a small index anyway that’s not particularly important. (Reminder: although Oracle collects index stats on index creation, you still need to collect column stats on the hidden column underlying the function-based columns on the index to give the optimizer full information.)

June 16, 2014

Bitmap Nulls

Filed under: bitmaps,Indexing,Infrastructure,NULL,Oracle — Jonathan Lewis @ 9:08 am GMT Jun 16,2014

It’s fairly well known that in Oracle B-tree indexes on heap tables don’t hold entries where all the indexed columns are all null, but that bitmap indexes will hold such entries and execution plans can for predicates like “column is null” can use bitmap indexes. Here’s a little test case to demonstrate the point (I ran this last on 12.1.0.1):


create table t1 (val number, n1 number, padding varchar2(100));

insert into t1
select
	decode(rownum,1,to_number(null),rownum), rownum, rpad('x',100)
from
	all_objects
where
	rownum <= 1000
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

commit;

create bitmap index t1_b1 on t1(val);

execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly explain

select * from t1 where val is null;

set autotrace off

The execution plan for this query, in the system I happened to be using, looked like this:


Execution Plan
----------------------------------------------------------
Plan hash value: 1201576309

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |    32 |  3488 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    32 |  3488 |     8   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | T1_B1 |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("VAL" IS NULL)

Note that the predicate section shows us that we used the “column is null” predicate as an access predicate into the index.

Of course, this is a silly little example – I’ve only published it to make a point and to act as a reference case if you ever need to support a claim. Normally we don’t expect a single bitmap index to be a useful way to access a table, we tend to use combinations of bitmap indexes to combine a number of predicates so that we can minimise the trips to a table as efficiently as possible. (And we certainly DON’T create a bitmap index on an OLTP system because it lets us access NULLs by index — OLTP and bitmaps don’t get on well together.)

If you do a symbolic block dump, by the way, you’ll find that the NULL is represented by the special “length byte” of 0xFF with no following data.

April 29, 2014

Bigfiles

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 2:12 pm GMT Apr 29,2014

It’s always the combinations that catch you out.

Bigfile tablespaces have their uses – especially in big systems

Materialized views have their users – especially in big systems

There’s absolutely no reason why the two technologies should interfere with each other … until you find a bug !

Running an example, stripped to the bare minimum, and doing a couple of things that I personally don’t like doing, on 11.2.0.4:

drop materialized view t1_mv;
drop table t1;

create table t1 (
	id	number constraint t1_pk primary key,
	n1	number,
	n2	number
)
tablespace bigfile_ts
;

create materialized view log on t1
with
	rowid		-- ugh!
;

create materialized view t1_mv
refresh fast on demand
with rowid		-- ugh!
enable query rewrite
as
select
	id, n1
from
	t1
where
	n2 is not null
;

-- Another ugh coming up !
insert into t1 values(1,null,null);
update t1 set n1 = 2, n2 = 1 where id = 1;
commit;

begin
        dbms_mview.refresh(
                list           => 'T1_MV',
                method         => 'F',
                atomic_refresh  => true
        );
end;
/

Things I don’t like:

  • Doing anything that is strongly dependent on rowid – I’d rather do my materialized view stuff by primary key … but, as Mick Jagger once told us: “You can’t always get what you want”.
  • Processes that insert an “empty” row and then update it – it’s very inefficient, generates excessive undo and redo, and often leads to row migration

Most significantly I don’t like operations that have worked for ages suddenly crashing when someone adds a new piece of code to the system.

If you run the fragment about, the refresh will fail with the follow string of errors:


ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00001: unique constraint (TEST_USER.I_SNAP$_T1_MV) violated
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2809
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 3025
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2994
ORA-06512: at line 2

When you try to find out what the unique constraint is, it’s based on a hidden column (M_ROW$$) of type rowid that has been added to the materialized view to emulate a primary key; and, for some reason, if your base table is in a bigfile tablespace and you insert a row and then update it before you’ve run a refresh then you can’t do a fast refresh again until you clean up the mess (e.g. with a complete refresh).

Note – if you complete all your inserts, then refresh, you can update a row as many times as you like without causing subsequent refresh problems – it looks like it’s just “insert it, update it, refresh” (or “insert it, delete it, refresh”) that has a problem because the only way to hit the issue is to get a correct rowid (from an update or delete) and an incorrect rowid (from an insert) for the same row in the log at the same time.

Workaround: if you move the base table to a smallfile tablespace you might just find you get what you need.

This shows up on MoS as bug 17603987, fixed in 12.2.

If you want to dig in a little bit more, you can start by comparing the contents of t1 and mlog$_t1 (the materialized view log table). Critically:

SQL> select rowid from t1;

ROWID
------------------
AAAXN2AAAAAAAJEAAA

1 row selected.

SQL> select m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AQAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

SQL> select rowidtochar(m_row$$) m_row$$, dmltype$$ from mlog$_t1;

M_ROW$$                          D
-------------------------------- -
AAAXN2AAAAAAAJEAAA               I
AAAXN2AAAAAAAJEAAA               U

2 rows selected.

If you compare the rowid from t1 with the m_row$$ column (which should hold the rowids of rows from t1) from mlog$_t1 you’ll notice that the insert and the update have been given different values for the source row. Somehow the logging code for the insert has generated the wrong rowid value; however, when you apply a rowidtochar() to the wrong value the return value is the right value – and the merge statement that transfers modified data from the source table into the materialized view uses the rowidtochar() function to join the materialized view log back to the source table, with the result that the same rowid can be inserted twice – except the attempt results in a “duplicate key” error.

 

March 20, 2014

RLS bug

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 1:21 pm GMT Mar 20,2014

RLS – row level security, aka VPD (virtual private database) or FGAC (fine grained access control) has a critical bug in 11g. The bug is unpublished, but gets mentioned in various other documents, so can be identified as: Bug: 7828323 “SYS_CONTEXTS RETURNS WRONG VALUE WITH SHARED_CONTEXT_SENSITIVE”

The title tells you nearly everything you need to know – if you’ve declared a security policy as context_sensitive or shared_context_sensitive then a change to the context ought to result in the associated predicate function being called to generate a new security predicate the next time the policy becomes relevant. Thanks to bug 7828323 this doesn’t always happen – so queries can return the wrong set of results.

There are some patches for older versions (11.1.0.7 and 11.2.0.2 as far as I’ve checked), but if you don’t have, or can’t get, a patch the “workaround” is to change any relevant policies to dynamic; unfortunately the consequence of this is that the predicate function will then be called for every execution of any statement against any objects protected by that policy.

Depending on how your application has been written and how many queries are likely to invoke security policies this could easily increase your CPU usage by several percent (and if it’s a badly written application maybe a lot more).

Footnote:

It has occurred to me to wonder what happens if you have use a (normal) pl/sql function in a select list, and the function executes a statement against a table, and the table is protected by a context_sensitive security policy – and you decide to use the pl/sql result cache on the function. How long is an item supposed to stay in the result cache, and if it’s longer than a single execution of a statement will the result cache be invalidated if your context changes in a way that invalidates the current security predicate ? No time to check or test at present, though, but I’d be very cautious about putting RLS predicate functions into the result cache until I’ve played around with that idea for a bit.

March 17, 2014

Temporary Segments

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 6:25 pm GMT Mar 17,2014

A question came up on oracle-l recently about identifying which temporary segment in v$tempseg_usage is associated with which global temporary table. Here’s my answer:

v$tempseg_usage is a synonym for v$sort_usage – which is the view that I still tend to think of first.

v$sort_usage sits on top of x$ktsso – and in x$ktsso the column ktssoobjn is the object number for the definition of the global temporary table (ditto for any indexes on the table). [Addendum: I didn’t point it out in the reply, but if you have indexes on the GTT the v$lock will show a TM lock for the table object_id, and TO locks that use the same object id for the TO locks representing the table and its indexes.]

If an internal GTT has appeared because of subquery factoring this is a little broken, but if you look in the execution plan (v$sql_plan / display_cursor) for the query you will find lines like:

|  63 |    HASH JOIN                          |                            |
|  64 |     VIEW                              |                            |
|  65 |      TABLE ACCESS FULL                | SYS_TEMP_0FD9D667C_74A306D |
|  66 |     VIEW                              |                            |
|  67 |      TABLE ACCESS FULL                | SYS_TEMP_0FD9D667B_74A306D |

We will see corresponding TO locks in v$lock (note the negative value):

ADDR             KADDR                   SID TY        ID1        ID2
---------------- ---------------- ---------- -- ---------- ----------
000000008ED8EC68 000000008ED8ECC0        143 TO  -40016261          1
000000008ED8F540 000000008ED8F598        143 TO  -40016260          1

And we note large values for ktssoobjn in x$ktsso:

  KTSSOBNO  KTSSOEXTS  KTSSOBLKS  KTSSORFNO  KTSSOOBJD  KTSSOOBJN KTSSOTSNUM KTSSOSQLID
---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------
     49792          1        128          1    4244096 4254951035          3 anb4mscz7wz71
     49152          1        128          1    4243456 4254951036          3 anb4mscz7wz71

Note the middle of the SYS_TEMP name:

0x0FD9D667C = 4254951036

and then spot the arithmetic:

4254951036 + 40016260 = 4294967296 = 0x100000000 = power(2,32)

v$lock.id1 is a 16 bit wrap of x$ktsso.ktssoobjn, so add it to power(2,32) and you can look for it (for the relevant session) in x$ktsso.

For things like sorts we can check v$sql_workarea_active against v$sort_usage / v$tempseg_usage (though you might stick with x$ktsso to be consistent) since both hold the relative file and block number for the segment.

Follow-up

Shortly after posting a reply to the original question I received an email from Stefan Koehler pointing out that in 11.2.0.2 the object x$ktsso was enhanced to include the SQL_ID of the statement that caused a temporary object to come into existence. (If you check the definition of v$sort_usage/v$tempseg_usage you’ll see that the SQL Id that it’s reporting is the prev_sql_id from V$session).  Stefan also gave me a link to a note that he had written on the topic.

March 14, 2014

12c Temporary

Filed under: 12c,Infrastructure,Oracle,undo — Jonathan Lewis @ 6:41 pm GMT Mar 14,2014

Just one of those little snippets to cover something new and remind of something old. A single session can now have three (or more) temporary tablespaces in use at the same time for different reasons.

  • In 12c you can set parameter temp_undo_enabled to true, at which point the undo relating to global temporary tables (GTTs) will be written into the database default temporary tablespace, which means it won’t generate redo. As an interesting side effect this also means that you can do DML on temporary objects in a physical standby database. Currently the undo segment shows as type “UNDEFINED” in v$sort_usage. A detail to watch out for, though – it looks as if each session may get its own temporary undo segment – so be careful about specifying the extent size and tempfile size for the database default temporary tablespace.
  • In 11g you can specify a tablespace (though not a tablespace group) in the ‘create global temporary table’ statement, which means you keep activity about GTTs separated from the I/O resulting from sorts and hash joins etc. spilling to disc; in fact you could specify a different temporary tablespace for every GTT if you wanted to – and I could imagine a case for having a couple of different temporary tablespaces to accommodate GTTs with very different usage characteristics. (Unfortunately you still can’t specify a tablespace in the dbms_lob.create_temporary() function).  If you don’t specify a tablespace for a GTT it will go into the default temporary tablespace of the user who is using it (not the database default, and not the default for the definer of the GTT). If you create indexes on a GTT they will automatically go into the same tablespace as the table.
  • Finally, of course, there’s the default temporary tablespace for the user and this is where GTTs will go if they don’t have a tablespace specified, and where all the scratch data (sorts, hash tables, factored subqueries et. al.) will go.

This combination means, of course, that you could manage to do a single “insert as select” writing a GTT to one temporary tablespace, with its undo going to a second temporary tablespace, and the spill from a sort or hash join in the select going to a third. The flexibility probably won’t make much difference to performance (for most people), but it’s possible that it will make it easier to monitor where the work is coming from if you’r ever in the position where your single temporary tablespace is subject to a lot of I/O.

Footnote:

In the past I’ve advised DBAs to set up a small number of tablespaces (or tablespace groups) so that they can allocate different classes of users – typically grouped by business function – to different temporary tablespaces. The ability to allocate GTTs to temporary tablespaces allows a further degree of refinement to this strategy.

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am GMT Mar 13,2014

Here’s a lovely effect looking at v$lock (on 11.2.0.4)

select
        sid, type, id1, id2, lmode, request, ctime, block
from
        V$lock
where   sid in (
                select  sid
                from    V$session
                where   username = 'TEST_USER'
        )
order by
        sid, type desc
;

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
       145 TX     589833       7518          6          0        489          0
           TM      92275          0          2          6        489          1
           SK          7   25165955          6          0        489          0
           AE        100          0          4          0        582          0

       148 TX     524308       7383          6          0        490          0
           TM      92275          0          2          6        490          1
           SK          7   25173379          6          0        490          0
           AE        100          0          4          0        998          0

You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.

The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:


  SID  SEQ#     EVENT# EVENT                            P1         P2         P3  WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO
----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- --------------------------
  145     1        798 Wait for shrink lock              0          0          0         10           99954                      70137
          2        235 enq: TM - contention     1414332422      92275          0        600         6002022                        101
          3        798 Wait for shrink lock              0          0          0         10          100723                      69335
          4        235 enq: TM - contention     1414332422      92275          0        600         6001589                         81
          5        798 Wait for shrink lock              0          0          0         10          100500                      69542
          6        235 enq: TM - contention     1414332422      92275          0        600         6002352                         86
          7        798 Wait for shrink lock              0          0          0         10          100618                      69145
          8        235 enq: TM - contention     1414332422      92275          0        600         6001545                        144
          9        798 Wait for shrink lock              0          0          0         10          100996                      69445
         10        235 enq: TM - contention     1414332422      92275          0        600         6002744                        310

The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.

The alert log, by the way, looked like this:


Wed Mar 12 12:53:27 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:30 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
Wed Mar 12 12:53:40 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:43 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.

I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.

Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.

Strategy Note

If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on the indexes on the same table at the same time.

Footnote:

This article was prompted by the recent appearance of bug 18388128 on MoS.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 4,521 other followers