Oracle Scratchpad

April 23, 2014

NL History

Filed under: Execution plans,Oracle — Jonathan Lewis @ 6:43 pm BST Apr 23,2014

Even the simplest things change – here’s a brief history of nested loop joins, starting from 8i, based on the following query (with some hints):

select
	t2.n1, t1.n2
from
	t2,t1
where
	t2.n2 = 45
and	t2.n1 = t1.n1
;

There’s an index to support the join from t2 to t1, and I’ve forced an (unsuitable) index scan for the predicate on t2.

Basic plan for 8i (8.1.7.4)

As reported by $ORACLE_HOME/rdbms/admin/utlxpls.sql.
Note the absence of a Predicate Information section.

Plan Table
--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |   225 |    3K|   3038 |       |       |
|  NESTED LOOPS             |          |   225 |    3K|   3038 |       |       |
|   TABLE ACCESS BY INDEX RO|T2        |    15 |  120 |   3008 |       |       |
|    INDEX FULL SCAN        |T2_I1     |    15 |      |      8 |       |       |
|   TABLE ACCESS BY INDEX RO|T1        |     3K|   23K|      2 |       |       |
|    INDEX RANGE SCAN       |T1_I1     |     3K|      |      1 |       |       |
--------------------------------------------------------------------------------

Basic plan for 9i (9.2.0.8)

As reported by a call to a home-grown version of dbms_xplan.display_cursor() with statistics_level set to all.

Note the “prefetch” shape of the body of the plan but the inconsistency in the numbers reported for Rows, Bytes, and Cost seem to be reporting the “traditional” 8i values transposed to match the new arrangement of the operations. There’s also a little oddity in the A-rows column in line 2 which looks as if it is the sum of its children plus 1 when the size of the rowsource is (presumably) the 225 rowids used to access the table.

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                     |  Name       | Rows  | Bytes | Cost  | Starts  | A-Rows  | Buffers |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |   225 |  3600 |  3038 |         |         |         |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1          |    15 |   120 |     2 |     1   |    225  |   3061  |
|   2 |   NESTED LOOPS                |             |   225 |  3600 |  3038 |     1   |    241  |   3051  |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2          |    15 |   120 |  3008 |     1   |     15  |   3017  |
|   4 |     INDEX FULL SCAN           | T2_I1       |  3000 |       |     8 |     1   |   3000  |     17  |
|*  5 |    INDEX RANGE SCAN           | T1_I1       |    15 |       |     1 |    15   |    225  |     34  |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 10g (10.2.0.5)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.

No change from 9i.

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.03 |    3061 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.03 |    3051 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.03 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      34 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Basic plan for 11g (11.2.0.4)

As reported by a call to dbms_xplan.display_cursor() with statisics_level set to all

Note how the nested loop has now turned into two NESTED LOOP operations – potentially opening the way for a complete decoupling of index access and table access. This has an interesting effect on the number of starts of the table access by rowid for t1, of course. The number of buffer gets for this operation looks surprisingly low (given that it started 225 times) but can be explained by the pattern of the data distribution – and cross-checked by looking at the “buffer is pinned count” statistic which accounts for most of the table visits.


-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    225 |00:00:00.01 |    3048 |
|   1 |  NESTED LOOPS                 |       |      1 |    225 |    225 |00:00:00.01 |    3048 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    225 |00:00:00.01 |    3038 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3013 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      13 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

There is, however, a second possible plan for 11g. The one above is the “NLJ Batching” plan, but I could have hinted the “NLJ prefetch” strategy, which takes us back to the 9i execution plan (with a very small variation in buffer visits).

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN           | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Base plan for 12c (12.1.0.1)

As reported by a call to dbms_xplan.display_cursor() with statistics_level set to all.
Note that the table access to t2 in line 3 is described as “batched” (a feature that can be disabled by the /*+ no_batch_table_access_by_rowid(alias) */  hint) otherwise the plan matches the 11g plan.


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   1 |  NESTED LOOPS                         |       |      1 |        |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
|   6 |   TABLE ACCESS BY INDEX ROWID         | T1    |    225 |     15 |    225 |00:00:00.01 |      10 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

Of course 12c also has the “prefetch” version of the plan available; and again “batched” access appears – for both tables in this case – and again the feature can be disabled individually by hints addressed at the tables:


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      0 |        |      0 |00:00:00.01 |       0 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED  | T1    |      1 |     15 |    225 |00:00:00.01 |    3052 |
|   2 |   NESTED LOOPS                        |       |      1 |    225 |    225 |00:00:00.01 |    3042 |
|*  3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |     15 |     15 |00:00:00.01 |    3017 |
|   4 |     INDEX FULL SCAN                   | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |      17 |
|*  5 |    INDEX RANGE SCAN                   | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   5 - access("T2"."N1"="T1"."N1")

In these examples the difference in work done by the different variations and versions is negligible, but there may be cases where the pattern of data distribution may change the pattern of logical I/Os and buffer pins – which may affect the physical I/O. In this light it’s interesting to note the hint /*+ cluster_by_rowid(alias) */ that was introduced in 11.2.0.4 but disappeared by 12c changing the 11g plan as follows:


----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      0 |        |      0 |00:00:00.01 |       0 |       |       |          |
|   1 |  TABLE ACCESS BY INDEX ROWID  | T1    |      1 |     15 |    225 |00:00:00.01 |     134 |       |       |          |
|   2 |   NESTED LOOPS                |       |      1 |    225 |    241 |00:00:00.01 |     124 |       |       |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T2    |      1 |     15 |     15 |00:00:00.01 |      99 |       |       |          |
|   4 |     SORT CLUSTER BY ROWID     |       |      1 |   3000 |   3000 |00:00:00.01 |       8 |   142K|   142K|  126K (0)|
|   5 |      INDEX FULL SCAN          | T2_I1 |      1 |   3000 |   3000 |00:00:00.01 |       8 |       |       |          |
|*  6 |    INDEX RANGE SCAN           | T1_I1 |     15 |     15 |    225 |00:00:00.01 |      25 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T2"."N2"=45)
   6 - access("T2"."N1"="T1"."N1")

Note the effect appearing at line 4 – and the extraordinary effect this has on the buffer visits (so significant that I did a follow-up check on v$mystat to see if the figures were consistent). This type of rowid sorting is, of course, an important fix for an Exadata issue I described some time ago, and I had assumed that the “batched” concept in the 12c plan was in some way enabling it – although the 12c rowsource execution stats don’t seem to bear that idea out.

Footnote:

You may also want to read the following note on the 12c batched rowid.

April 19, 2014

Coincindences

Filed under: Bugs,Oracle — Jonathan Lewis @ 9:22 am BST Apr 19,2014

I had another of those odd timing events today that make me think that Larry Ellison has access to a time machine. I found (yet another example of a) bug that had been reported on MoS just a few days before it appeared on an instance I was running. How is it possible that someone keeps doing things that I’m doing, but just a few days before I do them !

For no good reason I happened to browse through a load of background trace files on an 11.2.0.4 instance and found the following in an “m000″ file:

*** SERVICE NAME:(SYS$BACKGROUND) 2014-04-19 08:55:20.617
*** MODULE NAME:(MMON_SLAVE) 2014-04-19 08:55:20.617
*** ACTION NAME:(Auto-Purge Slave Action) 2014-04-19 08:55:20.617

*** KEWROCISTMTEXEC - encountered error: (ORA-06525: Length Mismatch for CHAR or RAW data
ORA-06512: at "SYS.DBMS_STATS", line 29022
ORA-06512: at line 1
)
  *** SQLSTR: total-len=93, dump-len=93,
      STR={begin dbms_stats.copy_table_stats('SYS', :bind1, :bind2, :bind3, flags=>1, force=>TRUE); end;}

Before trying to track down what had gone wrong I did a quick check on MoS, searching for “copy_table_stats” and “29022″ and found bug 17079301 – fixed in 12.2, and 12.1.0.2, with a patch for 12.1.0.1 (and some back-ports for 11.2.0.4). The description of the bug in the note was basically: “it happens”.

I may get around to looking more closely at what’s gone wrong but as an initial though I’m guessing that, even though the action name is “auto-purge slave action”, this may be something to do with adding a partition to some of the AWR history tables and rolling stats forward – so at some point I’ll probably start by checking for partitions with missing stats in the SYS schema.

The bug note, by the way, was published (last updated, on second thoughts) on 14th April 2014 – just 5 days before I first happened to spot the occurrence of the bug.

April 18, 2014

Bitmap loading

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 12:43 pm BST Apr 18,2014

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

Answer

As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create        index t1_btree on t1(btree_col) nologging;
-- create bitmap index t1_bitmap on t1(bitmap_col) nologging;

You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:

execute snap_my_stats.start_snap

insert /* append */ into t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	1e6 + rownum		id,
	mod(rownum,1000)	btree_col,
	mod(rownum,1000)	bitmap_col,
	rpad('x',100)		padding
from
	generator
;

execute snap_my_stats.end_snap

You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:


11.2.0.4 with btree
===================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   31,403
DB time                                                                     64
db block gets                                                           31,195
consistent gets                                                            208
db block changes                                                        21,511
redo entries                                                            10,873
redo size                                                            3,591,820
undo change vector size                                                897,608
sorts (memory)                                                               2
sorts (rows)                                                                 1

11.2.0.4 with bitmap
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   13,204
DB time                                                                     42
db block gets                                                            8,001
consistent gets                                                          5,203
db block changes                                                         5,911
redo entries                                                             2,880
redo size                                                            4,955,896
undo change vector size                                              3,269,932
sorts (memory)                                                               3
sorts (rows)                                                            10,001

As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.

The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering.  As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.

Further Observations

The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem.  On a large insert, the table data will be arriving, the b-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.

It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them.  And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.

You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path.  If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.

In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.

 

April 4, 2014

NVL() change

Filed under: CBO,Oracle,Troubleshooting,Upgrades — Jonathan Lewis @ 6:10 pm BST Apr 4,2014

One of the problems of functions is that the optimizer generally doesn’t have any idea on how a predicate based on function(col) might affect the cardinality. However,  the optimizer group are constantly refining the algorithms to cover an increasing number of special cases more accurately. This is a good thing, of course – but it does mean that you might be unlucky on an upgrade where a better cardinality estimate leads to a less efficient execution plan. Consider for example the simple query (where d1 is column of type date):

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate

Now, there are many cases in many versions of Oracle, where the optimizer will appear to calculate the cardinality of

nvl(columnX,{constant}) operator {constant}

as if it were:

columnX is null or columnX operator {constant}

Unfortunately this doesn’t seem to be one of them – until you get to 11.2.something. Here’s a little demonstration code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	case
		when rownum > 100 then null else sysdate - rownum
	end	d1
from
	generator	v1,
	generator	v2
where
	rownum <= 50000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	query with NVL

select	*
from	t1
where	nvl(d1,to_date('01-01-1900','dd-mm-yyyy')) < sysdate
;

prompt	query with OR clause

select	*
from	t1
where	d1 is null or d1 < sysdate
;

If you run this code in 11.1.0.7 you get the following – with numeric variations for cost (which I’m interested not in at the moment):


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  2500 |  2500 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  2500 |  2500 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

Take note particularly of the difference in the estimated cardinality for the tablescans.

When you upgrade to 11.2.0.4 (possibly earlier – though there are some nvl() related patches that appeared only in 11.2.0.4), you get this:


query with NVL
==============
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 50000 | 50000 |    18  (39)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 50000 | 50000 |    18  (39)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL("D1",TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))<SYSDATE@!)

query with OR clause
====================
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 49900 | 49900 |    13  (16)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 49900 | 49900 |    13  (16)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("D1" IS NULL OR "D1"<SYSDATE@!)

As you can see the estimate for the “NVL()” example is now correct – which means vastly different from the estimate in 11.1.0.7 which seemed to be using the standard “5% for range-based predicate on function(col)”.

It’s interesting to note that a (relatively) small error has crept in to the “OR” example – interestingly the size of the error is exactly the number of rows where d1 is not null (which looks like enough of a coincidence to be a bug – but maybe there’s a good rationale for it)

Conclusion

Yet again, a simple upgrade has the capacity to make a dramatic change to a cardinality estimate – which could mean a significant change to an execution plan and major change in performance. If you’ve read this note, though, you may recognise the pattern that is the driving feature behind the problem.

Footnote:

If you have access to versions 11.2.0.1 through 11.2.0.3 and find that the test data produces different cardinalities please publish the results in the comments – it would be nice to know exactly when this change appears.  (You need only show the body of the execution plans  with labels, not the entire output of the script).

 

April 3, 2014

Cache anomaly

Filed under: Bugs,Oracle,Performance — Jonathan Lewis @ 1:27 pm BST Apr 3,2014

Just a quick heads-up for anyone who likes to play around with the Keep and Recycle caches.

In 11g Oracle introduced the option for serial direct path reads for tablescans on tables that was sufficiently large – which meant more than the small_table_threshold – provided the table wasn’t already sufficient well cached.  (The rules mean that the choice of mechanism can appear to be a little random in the production environment for tables that are near the threshold size – but if you try testing by doing “alter system flush buffer_cache” you find that you always get direct path reads in testing.)

I’ve just discovered a little oddity about this, though.  I have a table of about 50MB which is comfortably over the threshold for direct path reads. But if I create a KEEP cache (db_keep_cache_size) that is a little larger than the table and then assign the table to the KEEP cache (alter table xxx storage(buffer_pool keep)) then 11.2.0.4 stops doing direct path reads, and caches the table.

Now this doesn’t seem unreasonable – if you’ve assigned an object to the KEEP cache you probably want it (or once wanted it) to be kept in cache as much as possible; so using the KEEP cache if it’s defined and specified makes sense. The reason I mention this as an oddity, though, is that it doesn’t reproduce in 11.1.0.7.

I think I saw a bug note about this combination a few months ago- I was looking for something else at the time and, almost inevitably, I can’t find it when I want it – but I don’t remember whether it was the 11.1 or 11.2 behaviour that was deemed to be correct.

 Update

See comments 1 and 2 below.  I’ve written about this previously, and the caching bechaviour is the correct behaviour. The patch is in 11.2.0.2 and backports are available for 11.1.0.7 and 11.2.0.1. The patch ensures that the table will be read into the cache if it is smaller than the db_keep_cache_size.  (Although we might want to check – see Tanel’s notes – whether this is based on the high water mark recorded in the segment header or on the optimizer stats for the table; and I suppose it might be worth checking that the same feature applies to index fast full scans). From the MoS document:

With the bug fix applied, any object in the KEEP buffer pool, whose size is less than DB_KEEP_CACHE_SIZE, is considered as a small or medium sized object. This will cache the read blocks and avoid subsequent direct read for these objects.

 

 

April 2, 2014

Easy – Oops.

Filed under: Bugs,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 7:47 pm BST Apr 2,2014

A question came up on OTN today asking for suggestions on how to enforce uniqueness on a pair of columns only when the second column was not null. There’s an easy and obvious solution – but I decided to clone the OP’s example and check that I’d typed my definition up before posting it; and the result came as a bit of a surprise. Here’s a demo script (not using the OP’s table):


create table t1  
(  
	col1	int not null,
	col2	varchar2(1)
);  

create unique index t1_i1 on t1( 
--	case col2 when null then cast(null as int) else col1 end,
--	case when col2 is null then cast(null as int) else col1 end,
	case when col2 is not null then col1 end,
	col2
)
;

insert into t1 values(1,null);
insert into t1 values(1,null);
insert into t1 values(1,'x');
insert into t1 values(1,'y');
insert into t1 values(1,'y');

commit;

column ind1_is   format a5
column ind1_when format 9999

set null N/A

select
	case when col2 is null then cast (null as int) else col1 end	ind1_is,
	case col2 when null then cast (null as int)  else col1 end	ind1_when
from 
	t1
;

The strategy is simple, you create a unique function-based index with two columns; the first column of the index id defined to show the first column of the table if the second column of the table is not null, the second column of the index is simply the second column of the table. So if the second column of the table is null, both columns in the index are null and there is no entry in the index; but if the second column of the table is not null then the index copies both columns from the table and a uniqueness test applies.

Based on the requirement and definition you would expect the first 4 of my insert statements to succeed and the last one to fail. The index will then have two entries, corresponding to my 3rd and 4th insertions.

I’ve actually shown three ways to use the case statement to produce the first column of the index. The last version is the cleanest, but the first option is the one I first thought of – it’s virtually a literal translation the original requirement. The trouble is, with my first definition the index acquired an entry it should not have got, and the second insert raised a “duplicate key” error; the error didn’t appear when I switched the syntax of the case statement to the second version.

That’s why the closing query of the demo is there – when you run it the two values reported should be the same as each other for all four rows in the table – but they’re not. This is what I got on 11.2.0.4:


IND1_IS IND1_WHEN
------- ---------
N/A             1
N/A             1
      1         1
      1         1


I’m glad I did a quick test before suggesting my original answer.

Anyone who has other versions of Oracle available is welcome to repeat the test and report back which versions they finding working correctly (or not).

Update

It’s not a bug (see note 2 below from Jason Bucata), it’s expected behaviour.

 

Tweaking

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:24 pm BST Apr 2,2014

The following question came up on OTN recently:

Which one gives better performance? Could please explain.

1) nvl( my_column, ‘N’) <> ‘Y’

2) nvl( my_column, ‘N’) = ‘N’

It’s a question that can lead to a good 20 minute discussion – if you were in some sort of development environment and had a fairly free hand to do whatever you wanted.

The most direct answer is that you could expect the performance to be the same whichever option you chose – but the results might be different, of course, unless you had a constraint on my_column that ensured that it would hold only null, ‘Y’, or ‘N’.  (Reminder:  the constraint check (my_column in (‘Y’,’N’) will allow nulls in the column).

On the other hand, if you create a function-based index on nvl(my_column,’N’) then the second option would give the optimizer the opportunity to use the index – which could make it the more efficient option if a sufficiently small quantity of the data from a large table matched the predicate. Of course in this case you would need a histogram on the hidden column definition supporting the index so that the optimizer could detect the data skew.

But if a function-based index is a step in the right direction it’s worth thinking about how to pick the best function-based index.


create index t1_i1 on t1(case my_column when 'Y' then null else 'N' end);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');

select * from t1 where case my_column when 'Y' then null else 'N' end = 'N';

Provided you can change the queries that use the original predicate, you can create a function-based index that is the smallest possible index to assist your query, and you won’t need a histogram to allow the optimizer to get the correct cardinality since there will be just one distinct value in the index (and the underlying column).

It’s possible to go a little further – what if the typical query that started this discussion was really something like:


select  *
from    t1
where   nvl(my_column, 'N') = 'N'
and     other_column = {some value}

If the nvl() predicate always returned a small fraction of the data we might engineer an index to handle the requirement of the entire where clause:


create index t1_i2 on t1(case my_column when 'Y' then null else other_column end);
execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all hidden columns size 254');

In this case you might need a histogram on the hidden column definition supporting the index, which is why I’ve changed my method_opt. We’ve constructed an index that is the smallest possible index that will satisfy the query as precisely as possible while giving the optimizer the best chance of producing accurate cardinality estimates. [Update: until the appearance of Kirill's comment (#1) I forgot to point out that once again you have to change the original SQL from using the nvl() expression to using a case expression that matches the index.]

Footnote

It would be easy to fall into the trap of playing around with this type of tweaking every time you hit a performance problem. Bear in mind, though, that even indexes which are as small and accurately targeted as possible can introduce various types of overhead (contention, cost of gathering stats, instability of histogram collection); so always try to think through the potential costs as well as the benefits of such an approach - is it really worth the effort.

It’s also worth noting that things don’t always work the way you expect.

Finally, of course, you might choose to create “proper” virtual columns in 11g so that you can refer to them by name, rather than handling them indirectly as you have to when they are “undocumented” hidden columns generated by function-based indexes.

 

March 28, 2014

Juggernaut

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 8:12 am BST Mar 28,2014

One of the problems of “knowing” so much about Oracle is that the more you know the more you have to check on each new release of the software. An incoming ping on my posting “Lock Horror” reminded me that I was writing about 11.2.0.1, and the terminal release is 11.2.0.4, and the whole thing may have changed in 12.1.0.1 – so I ought to re-run some tests to make sure that the articel is up to date if it’s likely to be read a few times in the next few days.

Unfortunately, although I often add a URL to scripts I’ve used to confirm results published in the blog, I don’t usually include a script name in my blog postings  to remind me where to go if I want to re-run the tests. So how do I find the right script(s) ? Typically I list all the likely scripts and compare dates with the date on the blog; so here’s what I got for “lock”.


SQL> host ls -ltr *lock*.sql | grep -v block
-rwxr-xr-x 1 jonathan dba 1569 Jun 28  2002 c_bitlock.sql
-rwxr-xr-x 1 jonathan dba 1303 Oct  5  2002 ddl_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1875 Oct  7  2002 ddl_deadlock_2.sql
-rwxr-xr-x 1 jonathan dba 1654 Aug  6  2003 hw_lock.sql
-rwxr-xr-x 1 jonathan dba 2626 Sep 17  2004 lock_oddity.sql
-rwxr-xr-x 1 jonathan dba 1804 Sep 17  2004 lock_speed.sql
-rwxr-xr-x 1 jonathan dba 3194 May  8  2006 space_locks.sql
-rwxr-xr-x 1 jonathan dba 4337 Jan  3  2008 tm_deadlock.sql
-rwxr-xr-x 1 jonathan dba 1149 Jan  3  2008 show_lock.sql
-rwxr-xr-x 1 jonathan dba 2068 Apr 21  2008 hw_lock_2.sql
-rwxr-xr-x 1 jonathan dba 1482 Feb  5  2010 tt_lock.sql
-rwxr-xr-x 1 jonathan dba 1692 Feb 16  2010 to_lock.sql
-rwxr-xr-x 1 jonathan dba 3308 Jun  1  2010 skip_locked.sql
-rwxr-xr-x 1 jonathan dba 2203 Nov  2  2010 deadlock_statement.sql
-rwxr-xr-x 1 jonathan dba 2883 Nov  3  2010 merge_locking.sql
-rwxr-xr-x 1 jonathan dba 1785 Dec 14  2010 sync_lock.sql
-rwxr-xr-x 1 jonathan dba  984 Apr 23  2011 para_dml_deadlock.sql
-rwxr-xr-x 1 jonathan dba 4305 Jun  4  2011 locking_fifo.sql
-rwxr-xr-x 1 jonathan dba 5970 Jun  5  2011 locking_fifo_2.sql
-rwxr-xr-x 1 jonathan dba  917 Jun 30  2011 ul_deadlock.sql
-rwxr-xr-x 1 jonathan dba  936 Jul  8  2011 funny_deadlock.sql
-rwxr-xr-x 1 jonathan dba  741 Sep  8  2011 row_lock_wait_index.sql
-rwxr-xr-x 1 jonathan dba 2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba 4561 Feb  6  2013 dbms_lock.sql
-rwxr-xr-x 1 jonathan dba 1198 Apr  6  2013 libcache_locks.sql
-rwxr-xr-x 1 jonathan dba 5636 Nov 27 19:40 ash_deadlocks.sql
-rwxr-xr-x 1 jonathan dba  379 Mar 27 19:17 fk_constraint_locks.sql

Nothing leaps out as an obvious candidate, though “funny_deadlock.sql” catches my eye for future reference; maybe I should look for “foreign key”.

SQL> host ls -ltr *fk*.sql | grep -v fkr
-rwxr-xr-x 1 jonathan dba  2140 Jun 16  2005 fk_check.sql
-rwxr-xr-x 1 jonathan dba  2897 Jun 16  2005 fk_order.sql
-rwxr-xr-x 1 jonathan dba   650 Oct 26  2007 pk_fk_null.sql
-rwxr-xr-x 1 jonathan dba  5444 Nov  4  2007 c_fk.sql
-rwxr-xr-x 1 jonathan dba  1568 Dec  5  2008 null_fk.sql
-rwxr-xr-x 1 jonathan dba  2171 Mar  2  2009 fk_anomaly_2.sql
-rwxr-xr-x 1 jonathan dba  3922 Mar  2  2009 fk_anomaly.sql
-rwxr-xr-x 1 jonathan dba  5512 Oct 15  2009 fk_check_2.sql
-rwxr-xr-x 1 jonathan dba  1249 Feb 15  2010 c_pk_fk_2.sql
-rwxr-xr-x 1 jonathan dba  1638 Feb 16  2010 c_pk_fk_3.sql
-rwxr-xr-x 1 jonathan dba  5121 Jun  1  2012 c_pt_fk_2.sql
-rwxr-xr-x 1 jonathan dba  4030 Jun  5  2012 c_pt_fk_3.sql
-rwxr-xr-x 1 jonathan dba  2062 Jun  5  2012 c_pt_fk_3a.sql
-rwxr-xr-x 1 jonathan dba  2618 Sep 23  2012 c_pk_fk_02.sql
-rwxr-xr-x 1 jonathan dba  1196 Oct 19  2012 deferrable_fk.sql
-rwxr-xr-x 1 jonathan dba  2590 Nov 30  2012 fk_lock_stress.sql
-rwxr-xr-x 1 jonathan dba  4759 Sep  1  2013 fk_bitmap.sql
-rwxr-xr-x 1 jonathan dba  1730 Sep 30 07:51 virtual_fk.sql
-rwxr-xr-x 1 jonathan dba  3261 Dec 22 09:41 pk_fk_gets.sql
-rwxr-xr-x 1 jonathan dba  8896 Dec 31 13:19 fk_delete_gets.sql
-rwxr-xr-x 1 jonathan dba 10071 Dec 31 14:52 fk_delete_gets_2.sql
-rwxr-xr-x 1 jonathan dba  4225 Jan 14 11:15 c_pk_fk.sql
-rwxr-xr-x 1 jonathan dba  2674 Jan 14 13:42 append_fk.sql
-rwxr-xr-x 1 jonathan dba  1707 Feb 10 12:34 write_cons_fk.sql
-rwxr-xr-x 1 jonathan dba  9677 Feb 24 17:23 c_pt_fk.sql
-rwxr-xr-x 1 jonathan dba   379 Mar 27 19:17 fk_constraint_locks.sql

(The “-fkr” is to eliminate scripts about “first K rows optimisation”). With a little luck, the dates are about right, c_pk_fk_2.sql and c_pk_fk_3.sql will be relevant. So keep an eye on “Lock Horror” for an update in the next few days.

You’ll notice that some of the scripts have a very old datestamp on them - that’s an indication of how hard it is to keep up; when I re-run a script on a new version of Oracle I invariably add a “Last Tested:” version to the header, and a couple of notes about changes.  A couple of my scripts date back to June 2001 – but that is, at least, the right century, and some people are still using Oracle 7.

Footnote

It should be obvious that I can’t test everything on every new release – but it’s amazing how often on a client site I can recognize a symptom and pick at script that I’ve used in the past to construct the problem – and that’s when a quick bit of re-testing helps me find a solution or workaround (or Oracle bug note).

 

March 26, 2014

Diagnostics

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 2:13 pm BST Mar 26,2014

Here’s a little test you might want to try. Examine the following script, and decide what sort of symptoms you would see in the AWR report.


create global temporary table gtt1(n1 number);

execute dbms_workload_repository.create_snapshot;

insert into gtt1 values(1);
truncate table gtt1;

-- repeat insert/truncate for a total of 100 cycles

execute dbms_workload_repository.create_snapshot;

-- generate an AWR report across the interval.

I don’t need anyone to tell me their results – but if your predictions and the actual results match then you can give yourself a pat on the head.
You might also like to enable SQL trace for all the inserts/truncate to see if that shows you anything interesting.

This is one of the simpler scripts of the 3,500 I have on my laptop that help me interpret the symptoms I see in client systems.

March 24, 2014

Min/Max

Filed under: Function based indexes,Indexing,Oracle — Jonathan Lewis @ 11:11 pm BST Mar 24,2014

One of my most-repeated observations about trouble-shooting Oracle is that things break when you start combining features. Here’s an example that demonstrates the point.

It’s possible to create “descending” indexes – or indexes with descending columns, as I prefer to call them, and there’s a special “min/max range scan” optimizer operation for a particular kind of index usage – demonstrated in the following code fragment (running under 11.2.0.4, and reporting the rowsource execution statistics):


create table t1(
	a number not null,
	b number not null,
	c number not null,
	padding varchar2(100)
);

insert into t1
select
	mod(object_id +   1245,1001),
	mod(object_id +   4545,1111),
	mod(object_id + 774545,  13),
	rpad('x',100,'x')
from
	all_objects
where
	rownum<=10000
;

commit;

create index t1_i1 on t1(b, c, a);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

alter session set statistics_level = all;

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

Note how the optimizer is aware that it can find a path aiming for one specific index entry (FIRST ROW), using the (min/max) option on the index.

So what happens when we change the index:


drop index t1_i1;
create index t1_i1 on t1(b, c, a desc);

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select
	max(a)
from	t1
where	b=1
and	c=1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

-------------------------------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("B"=1 AND "C"=1)

We’ve changed the index so that the final column is descending and although the optimizer is smart enough to determine that the query can still be satisfied without visiting the table, it can no longer use the min/max optimization, instead it does a range scan the section of the index matching the where clause, using the normal aggregate operation to find max(a).

In this tiny example the difference in the work load is barely perceptible – but there will be cases where the change in plan will make a difference in performance. As ever, when taking advantage of a feature that looks useful you have to try to imagine all the possible cases for the feature that might appear in your application and test them to see whether they introduce an unexpected (and possibly unacceptable) overhead.

Footnote:

There is a workaround in this case – not that I would suggest using it in a production system. If you remember that descending columns are implemented through a function-based index using the sys_op_descend() function, you can write code like this:

select
	utl_raw.cast_to_number(hextoraw(sys_op_undescend(MIN(sys_op_descend(a)))))	a
from
	t1
where
	b = 1
and	c = 1
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE              |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|   2 |   FIRST ROW                  |       |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T1_I1 |      1 |      1 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"=1 AND "C"=1)

These results came from an instance of 11.2.0.4, but the limitation is still present in 12.1.0.1

March 20, 2014

RLS bug

Filed under: Bugs,Infrastructure,Oracle — Jonathan Lewis @ 1:21 pm BST Mar 20,2014

RLS – row level security, aka VPD (virtual private database) or FGAC (fine grained access control) has a critical bug in 11g. The bug is unpublished, but gets mentioned in various other documents, so can be identified as: Bug: 7828323 “SYS_CONTEXTS RETURNS WRONG VALUE WITH SHARED_CONTEXT_SENSITIVE”

The title tells you nearly everything you need to know – if you’ve declared a security policy as context_sensitive or shared_context_sensitive then a change to the context ought to result in the associated predicate function being called to generate a new security predicate the next time the policy becomes relevant. Thanks to bug 7828323 this doesn’t always happen – so queries can return the wrong set of results.

There are some patches for older versions (11.1.0.7 and 11.2.0.2 as far as I’ve checked), but if you don’t have, or can’t get, a patch the “workaround” is to change any relevant policies to dynamic; unfortunately the consequence of this is that the predicate function will then be called for every execution of any statement against any objects protected by that policy.

Depending on how your application has been written and how many queries are likely to invoke security policies this could easily increase your CPU usage by several percent (and if it’s a badly written application maybe a lot more).

Footnote:

It has occurred to me to wonder what happens if you have use a (normal) pl/sql function in a select list, and the function executes a statement against a table, and the table is protected by a context_sensitive security policy – and you decide to use the pl/sql result cache on the function. How long is an item supposed to stay in the result cache, and if it’s longer than a single execution of a statement will the result cache be invalidated if your context changes in a way that invalidates the current security predicate ? No time to check or test at present, though, but I’d be very cautious about putting RLS predicate functions into the result cache until I’ve played around with that idea for a bit.

March 17, 2014

Temporary Segments

Filed under: Infrastructure,Oracle — Jonathan Lewis @ 6:25 pm BST Mar 17,2014

A question came up on oracle-l recently about identifying which temporary segment in v$tempseg_usage is associated with which global temporary table. Here’s my answer:

v$tempseg_usage is a synonym for v$sort_usage – which is the view that I still tend to think of first.

v$sort_usage sits on top of x$ktsso – and in x$ktsso the column ktssoobjn is the object number for the definition of the global temporary table (ditto for any indexes on the table). [Addendum: I didn't point it out in the reply, but if you have indexes on the GTT the v$lock will show a TM lock for the table object_id, and TO locks that use the same object id for the TO locks representing the table and its indexes.]

If an internal GTT has appeared because of subquery factoring this is a little broken, but if you look in the execution plan (v$sql_plan / display_cursor) for the query you will find lines like:

|  63 |    HASH JOIN                          |                            |
|  64 |     VIEW                              |                            |
|  65 |      TABLE ACCESS FULL                | SYS_TEMP_0FD9D667C_74A306D |
|  66 |     VIEW                              |                            |
|  67 |      TABLE ACCESS FULL                | SYS_TEMP_0FD9D667B_74A306D |

We will see corresponding TO locks in v$lock (note the negative value):

ADDR             KADDR                   SID TY        ID1        ID2
---------------- ---------------- ---------- -- ---------- ----------
000000008ED8EC68 000000008ED8ECC0        143 TO  -40016261          1
000000008ED8F540 000000008ED8F598        143 TO  -40016260          1

And we note large values for ktssoobjn in x$ktsso:

  KTSSOBNO  KTSSOEXTS  KTSSOBLKS  KTSSORFNO  KTSSOOBJD  KTSSOOBJN KTSSOTSNUM KTSSOSQLID
---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------
     49792          1        128          1    4244096 4254951035          3 anb4mscz7wz71
     49152          1        128          1    4243456 4254951036          3 anb4mscz7wz71

Note the middle of the SYS_TEMP name:

0x0FD9D667C = 4254951036

and then spot the arithmetic:

4254951036 + 40016260 = 4294967296 = 0x100000000 = power(2,32)

v$lock.id1 is a 16 bit wrap of x$ktsso.ktssoobjn, so add it to power(2,32) and you can look for it (for the relevant session) in x$ktsso.

For things like sorts we can check v$sql_workarea_active against v$sort_usage / v$tempseg_usage (though you might stick with x$ktsso to be consistent) since both hold the relative file and block number for the segment.

Follow-up

Shortly after posting a reply to the original question I received an email from Stefan Koehler pointing out that in 11.2.0.2 the object x$ktsso was enhanced to include the SQL_ID of the statement that caused a temporary object to come into existence. (If you check the definition of v$sort_usage/v$tempseg_usage you’ll see that the SQL Id that it’s reporting is the prev_sql_id from V$session).  Stefan also gave me a link to a note that he had written on the topic.

March 14, 2014

12c Temporary

Filed under: 12c,Infrastructure,Oracle,undo — Jonathan Lewis @ 6:41 pm BST Mar 14,2014

Just one of those little snippets to cover something new and remind of something old. A single session can now have three (or more) temporary tablespaces in use at the same time for different reasons.

  • In 12c you can set parameter temp_undo_enabled to true, at which point the undo relating to global temporary tables (GTTs) will be written into the database default temporary tablespace, which means it won’t generate redo. As an interesting side effect this also means that you can do DML on temporary objects in a physical standby database. Currently the undo segment shows as type “UNDEFINED” in v$sort_usage. A detail to watch out for, though – it looks as if each session may get its own temporary undo segment – so be careful about specifying the extent size and tempfile size for the database default temporary tablespace.
  • In 11g you can specify a tablespace (though not a tablespace group) in the ‘create global temporary table’ statement, which means you keep activity about GTTs separated from the I/O resulting from sorts and hash joins etc. spilling to disc; in fact you could specify a different temporary tablespace for every GTT if you wanted to – and I could imagine a case for having a couple of different temporary tablespaces to accommodate GTTs with very different usage characteristics. (Unfortunately you still can’t specify a tablespace in the dbms_lob.create_temporary() function).  If you don’t specify a tablespace for a GTT it will go into the default temporary tablespace of the user who is using it (not the database default, and not the default for the definer of the GTT). If you create indexes on a GTT they will automatically go into the same tablespace as the table.
  • Finally, of course, there’s the default temporary tablespace for the user and this is where GTTs will go if they don’t have a tablespace specified, and where all the scratch data (sorts, hash tables, factored subqueries et. al.) will go.

This combination means, of course, that you could manage to do a single “insert as select” writing a GTT to one temporary tablespace, with its undo going to a second temporary tablespace, and the spill from a sort or hash join in the select going to a third. The flexibility probably won’t make much difference to performance (for most people), but it’s possible that it will make it easier to monitor where the work is coming from if you’r ever in the position where your single temporary tablespace is subject to a lot of I/O.

Footnote:

In the past I’ve advised DBAs to set up a small number of tablespaces (or tablespace groups) so that they can allocate different classes of users – typically grouped by business function – to different temporary tablespaces. The ability to allocate GTTs to temporary tablespaces allows a further degree of refinement to this strategy.

March 13, 2014

Shrink Space

Filed under: Bugs,deadlocks,fragmentation,Index Rebuilds,Indexing,Locks,Oracle — Jonathan Lewis @ 7:08 am BST Mar 13,2014

Here’s a lovely effect looking at v$lock (on 11.2.0.4)

select
        sid, type, id1, id2, lmode, request, ctime, block
from
        V$lock
where   sid in (
                select  sid
                from    V$session
                where   username = 'TEST_USER'
        )
order by
        sid, type desc
;

       SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------- -- ---------- ---------- ---------- ---------- ---------- ----------
       145 TX     589833       7518          6          0        489          0
           TM      92275          0          2          6        489          1
           SK          7   25165955          6          0        489          0
           AE        100          0          4          0        582          0

       148 TX     524308       7383          6          0        490          0
           TM      92275          0          2          6        490          1
           SK          7   25173379          6          0        490          0
           AE        100          0          4          0        998          0

You’ll notice I’ve got two sessions holding a TM lock on the same table (object 92275) in mode 2 (sub-share) and waiting for an exclusive lock on the same table. Both sessions are recording the fact that they are blocking something else. You’ll have trust me when I say there are no other user sessions on the system at this point, and none of the background sessions is doing anything with that table.

The clue to what’s happening is the SK lock – it’s the “segment shrink” lock. I had two sessions start an “alter index I_n shrink space” (two different indexes on the same table) at the same time. The problem is that “shrink space” without the “compact” tries to drop the highwater mark on the index’ allocated space after completing the shrink phase – and there’s a defect in the internal code that tries to get the required exclusive lock on the underlying table: it doesn’t seem to allow for all the possible ways you can fail to get the lock. If you look at v$session_wait_history for either of these sessions, you’ll see something like the following:


  SID  SEQ#     EVENT# EVENT                            P1         P2         P3  WAIT_TIME WAIT_TIME_MICRO TIME_SINCE_LAST_WAIT_MICRO
----- ----- ---------- ------------------------ ---------- ---------- ---------- ---------- --------------- --------------------------
  145     1        798 Wait for shrink lock              0          0          0         10           99954                      70137
          2        235 enq: TM - contention     1414332422      92275          0        600         6002022                        101
          3        798 Wait for shrink lock              0          0          0         10          100723                      69335
          4        235 enq: TM - contention     1414332422      92275          0        600         6001589                         81
          5        798 Wait for shrink lock              0          0          0         10          100500                      69542
          6        235 enq: TM - contention     1414332422      92275          0        600         6002352                         86
          7        798 Wait for shrink lock              0          0          0         10          100618                      69145
          8        235 enq: TM - contention     1414332422      92275          0        600         6001545                        144
          9        798 Wait for shrink lock              0          0          0         10          100996                      69445
         10        235 enq: TM - contention     1414332422      92275          0        600         6002744                        310

The attempt to acquire the TM enqueue (lock) times out every three seconds – and I think the session then releases and re-acquires the SK lock before trying to re-acquire the TM lock – and it’s never going to get it.

The alert log, by the way, looked like this:


Wed Mar 12 12:53:27 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:30 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
Wed Mar 12 12:53:40 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.
Wed Mar 12 12:53:43 2014
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_1832.trc.
ORA-00060: Deadlock detected. More info in file /u01/app/oracle/diag/rdbms/test/test/trace/test_ora_2242.trc.

I’ve said it before, and I keep repeating it when people say “Oracle resolves deadlocks automatically”: Oracle does NOT resolve deadlocks automatically – one of the sessions will rollback its last DML statement to clear the deadlock, but the other session will (almost invariably) still be waiting. It’s up to the application to do something sensible to resolve the deadlock after it receives the ORA-00060 error.

Don’t ask about the trace files – but they had both reached 400MB by the time I finished this note.

Strategy Note

If you are going to shrink objects, it’s probably best to do it in two steps: “shrink space compact, followed by “shrink space”.
If you’re going to try to use multiple sessions to shrink several indexes as quickly as possible, make sure there’s no way that two sessions can try to “shrink space” on the indexes on the same table at the same time.

Footnote:

This article was prompted by the recent appearance of bug 18388128 on MoS.

March 11, 2014

Parallel Plans

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 7:39 am BST Mar 11,2014

This is the directory for a short series I wrote discussing how to interpret parallel execution plans in newer versions of Oracle.

Temporary footnote: I’ll be repeating November’s 90 minute webinar on parallel execution plans on 15th April. I’ll also be doing a full-day webinar on trouble-shooting (spread over two days) on 16th/17th April.

For other aspects of parallel execution, here are links to several articles by Randolf Geist, published on his own blog or on Oracle’s Technet:

 

 

 

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,529 other followers