Oracle Scratchpad

June 17, 2015

Reverse Key

Filed under: Indexing,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 1:11 pm BST Jun 17,2015

A question came up on the OTN database forum recently asking if you could have a partitioned index on a non-partitioned table.

(Aside: I’m not sure whether it would be quicker to read the manuals or try the experiment – either would probably be quicker than posing the question to the forum. As so often happens in these RTFM questions the OP didn’t bother to acknowledge any of the responses)

The answer to the question is yes – you can create a globally partitioned index, though if it uses range partitioning you have to specify a maxvalue partition. The interesting thing about the question, though is that several people tried to guess why it had been asked and then made suggestions based on the most likely guess (and wouldn’t it have been nice to see some response from the OP ). The common guess was that there was a performance problem with the high-value block of a sequence-based (or time-based) index – a frequent source of “buffer busy waits” events and other nasty side effects.

Unfortunately too many people suggesting reverse key as a solution to this “right-hand” problem. If you’re licensed for partitioning it’s almost certain that a better option would simple be to use global hash partitioning (with 2^N for some N) partitions. Using reverse keys can result in a bigger performance problem than the one you’re trying to avoid – you may end up turning a little time spent on buffer busy waits into a large amount of time spent on db file sequential reads. To demonstrate the issue I’ve created a sample script – and adjusted my buffer cache down to the appropriate scale:

rem     Script:         c_reverse_ind3.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2010
rem     Purpose:

create table t1(
	id	not null
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		rownum <= 1e4
	1e7 + rownum	id
	generator	v1,
	generator	v2
	rownum <= 1e7 -- > comment to avoid WordPress format issue

		ownname		 => user,
		tabname		 => 'T1'

alter table t1 add constraint t1_pk primary key(id) 
using index 

alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';

	for i in 20000001..20010000 loop
		insert into t1 values(i);
	end loop;

I’ve created a table with 10,000,000 rows using a sequential value as the primary key, then inserted “the next” 10,000 rows into the table in order. The index occupied about 22,000 blocks, so to make my demonstration show you the type of effect you could get from a busy production system with more tables and many indexes I ran my test with the buffer cache limited to 6,000 blocks – a fair fraction of the total index size. Here’s a small section of the trace file from the test running on an elderly machine:

WAIT #43: nam='db file sequential read' ela= 13238 file#=6 block#=12653 blocks=1 obj#=63623 tim=3271125590
WAIT #43: nam='db file sequential read' ela=  7360 file#=6 block#=12749 blocks=1 obj#=63623 tim=3271133150
WAIT #43: nam='db file sequential read' ela=  5793 file#=6 block#=12844 blocks=1 obj#=63623 tim=3271139110
WAIT #43: nam='db file sequential read' ela=  5672 file#=6 block#=12940 blocks=1 obj#=63623 tim=3271145028
WAIT #43: nam='db file sequential read' ela= 15748 file#=5 block#=13037 blocks=1 obj#=63623 tim=3271160998
WAIT #43: nam='db file sequential read' ela=  8080 file#=5 block#=13133 blocks=1 obj#=63623 tim=3271169314
WAIT #43: nam='db file sequential read' ela=  8706 file#=5 block#=13228 blocks=1 obj#=63623 tim=3271178240
WAIT #43: nam='db file sequential read' ela=  7919 file#=5 block#=13325 blocks=1 obj#=63623 tim=3271186372
WAIT #43: nam='db file sequential read' ela= 15553 file#=6 block#=13549 blocks=1 obj#=63623 tim=3271202115
WAIT #43: nam='db file sequential read' ela=  7044 file#=6 block#=13644 blocks=1 obj#=63623 tim=3271209420
WAIT #43: nam='db file sequential read' ela=  6062 file#=6 block#=13741 blocks=1 obj#=63623 tim=3271215648
WAIT #43: nam='db file sequential read' ela=  6067 file#=6 block#=13837 blocks=1 obj#=63623 tim=3271221887
WAIT #43: nam='db file sequential read' ela= 11516 file#=5 block#=13932 blocks=1 obj#=63623 tim=3271234852
WAIT #43: nam='db file sequential read' ela=  9295 file#=5 block#=14028 blocks=1 obj#=63623 tim=3271244368
WAIT #43: nam='db file sequential read' ela=  9466 file#=5 block#=14125 blocks=1 obj#=63623 tim=3271254002
WAIT #43: nam='db file sequential read' ela=  7704 file#=5 block#=14221 blocks=1 obj#=63623 tim=3271261991
WAIT #43: nam='db file sequential read' ela= 16319 file#=6 block#=14444 blocks=1 obj#=63623 tim=3271278492
WAIT #43: nam='db file sequential read' ela=  7416 file#=6 block#=14541 blocks=1 obj#=63623 tim=3271286129
WAIT #43: nam='db file sequential read' ela=  5748 file#=6 block#=14637 blocks=1 obj#=63623 tim=3271292163
WAIT #43: nam='db file sequential read' ela=  7131 file#=6 block#=14732 blocks=1 obj#=63623 tim=3271299489
WAIT #43: nam='db file sequential read' ela= 16126 file#=5 block#=14829 blocks=1 obj#=63623 tim=3271315883
WAIT #43: nam='db file sequential read' ela=  7746 file#=5 block#=14925 blocks=1 obj#=63623 tim=3271323845
WAIT #43: nam='db file sequential read' ela=  9208 file#=5 block#=15020 blocks=1 obj#=63623 tim=3271333239
WAIT #43: nam='db file sequential read' ela=  7708 file#=5 block#=15116 blocks=1 obj#=63623 tim=3271341141
WAIT #43: nam='db file sequential read' ela= 15484 file#=6 block#=15341 blocks=1 obj#=63623 tim=3271356807
WAIT #43: nam='db file sequential read' ela=  5488 file#=6 block#=15437 blocks=1 obj#=63623 tim=3271362623
WAIT #43: nam='db file sequential read' ela= 10447 file#=6 block#=15532 blocks=1 obj#=63623 tim=3271373342
WAIT #43: nam='db file sequential read' ela= 12565 file#=6 block#=15629 blocks=1 obj#=63623 tim=3271386741
WAIT #43: nam='db file sequential read' ela= 17168 file#=5 block#=15725 blocks=1 obj#=63623 tim=3271404135
WAIT #43: nam='db file sequential read' ela=  7542 file#=5 block#=15820 blocks=1 obj#=63623 tim=3271411882
WAIT #43: nam='db file sequential read' ela=  9400 file#=5 block#=15917 blocks=1 obj#=63623 tim=3271421514
WAIT #43: nam='db file sequential read' ela=  7804 file#=5 block#=16013 blocks=1 obj#=63623 tim=3271429519
WAIT #43: nam='db file sequential read' ela= 14470 file#=6 block#=16237 blocks=1 obj#=63623 tim=3271444168
WAIT #43: nam='db file sequential read' ela=  5788 file#=6 block#=16333 blocks=1 obj#=63623 tim=3271450154
WAIT #43: nam='db file sequential read' ela=  9630 file#=6 block#=16429 blocks=1 obj#=63623 tim=3271460008
WAIT #43: nam='db file sequential read' ela= 10910 file#=6 block#=16525 blocks=1 obj#=63623 tim=3271471174
WAIT #43: nam='db file sequential read' ela= 15683 file#=5 block#=16620 blocks=1 obj#=63623 tim=3271487065
WAIT #43: nam='db file sequential read' ela=  8094 file#=5 block#=16717 blocks=1 obj#=63623 tim=3271495454
WAIT #43: nam='db file sequential read' ela=  6670 file#=5 block#=16813 blocks=1 obj#=63623 tim=3271502293
WAIT #43: nam='db file sequential read' ela=  7852 file#=5 block#=16908 blocks=1 obj#=63623 tim=3271510360
WAIT #43: nam='db file sequential read' ela= 10500 file#=6 block#=17133 blocks=1 obj#=63623 tim=3271521039
WAIT #43: nam='db file sequential read' ela= 11038 file#=6 block#=17229 blocks=1 obj#=63623 tim=3271532275
WAIT #43: nam='db file sequential read' ela= 12432 file#=6 block#=17325 blocks=1 obj#=63623 tim=3271544974
WAIT #43: nam='db file sequential read' ela=  7784 file#=6 block#=17421 blocks=1 obj#=63623 tim=3271553331
WAIT #43: nam='db file sequential read' ela=  7774 file#=5 block#=17517 blocks=1 obj#=63623 tim=3271561346
WAIT #43: nam='db file sequential read' ela=  6583 file#=5 block#=17613 blocks=1 obj#=63623 tim=3271568146
WAIT #43: nam='db file sequential read' ela=  7901 file#=5 block#=17708 blocks=1 obj#=63623 tim=3271576231
WAIT #43: nam='db file sequential read' ela=  6667 file#=5 block#=17805 blocks=1 obj#=63623 tim=3271583259
WAIT #43: nam='db file sequential read' ela=  9427 file#=6 block#=18029 blocks=1 obj#=63623 tim=3271592988
WAIT #43: nam='db file sequential read' ela= 52334 file#=6 block#=18125 blocks=1 obj#=63623 tim=3271646055
WAIT #43: nam='db file sequential read' ela= 50512 file#=6 block#=18221 blocks=1 obj#=63623 tim=3271697284
WAIT #43: nam='db file sequential read' ela= 10095 file#=6 block#=18317 blocks=1 obj#=63623 tim=3271708095

Check the block number for this list of single block reads – we’re jumping through the index about 100 blocks at a time to read the next block where an index entry has to go. The jumps are the expected (and designed) effect of reverse key indexes: the fact that the jumps turn into physical disc reads is the (possibly unexpected) side effect. Reversing an index makes adjacent values look very different (by reversing the bytes) and go to different index leaf blocks: the purpose of the exercise is to scatter concurrent similar inserts across multiple blocks, but if you scatter the index entries you need to buffer a lot more of the index to keep the most recently used values in memory. Reversing the index may eliminate buffer busy waits, but it may increase time lost of db file sequential reads dramatically.

Here’s a short list of interesting statistics from this test – this time running on on a machine with SSDs) comparing the effects of reversing the index with those of not reversing the index – normal index first:

Normal index
CPU used by this session               83
DB time                                97
db block gets                      40,732
physical reads                         51
db block changes                   40,657
redo entries                       20,174
redo size                       5,091,436
undo change vector size         1,649,648

Repeat with reverse key index
CPU used by this session              115
DB time                               121
db block gets                      40,504
physical reads                     10,006
db block changes                   40,295
redo entries                       19,973
redo size                       4,974,820
undo change vector size         1,639,232

Because of the SSDs there’s little difference in timing between the two sets of data and, in fact, all the other measures of work done are very similar except for the physical reads count – and that’s the one that’s going to matter. In my test the increase in the number of physical reads is probably the cause of the extra CPU time thanks to both the LRU manipulation and the interaction with the operating system.

If you want to check the effect of index reversal you can take advantage of the sys_op_lbid() function to sample a little of your data – in my case I’ve queried the last 10,000 rows (values) in the table:

	count (distinct sys_op_lbid( &m_ind_id ,'L',t1.rowid)) as leaf_blocks
	id between 2e7 + 1 and 2e7 + 1e4

The &m_ind_id substitution variable is the object_id of the index t1_i1.

In my case, with an index of 22,300 leaf blocks, my 10,000 consecutive values were scattered over 9,923 leaf blocks. If I want access to “recent data” to be as efficient as possible I need to keep that many blocks of the index cached, compared to (absolute) worst case for my data 100 leaf blocks. When you reverse key an index you have to think about how much bigger you have to make your buffer cache to keep the performance constant.


For a completely different (documented) issue with reverse key indexes, see this note on how they can be ignored in distributed queries.


Sequences and reverse key indexes generally don’t work nicely together, whatever it may say in the manuals or on MoS.

July 2, 2014


Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:09 pm BST Jul 2,2014

Catching up (still) from the Trivadis CBO days, here’s a little detail which had never crossed my mind before.

where   (col1, col2) < (const1, const2)

This isn’t a legal construct in Oracle SQL, even though it’s legal in other dialects of SQL. The logic is simple (allowing for the usual anomaly with NULL): the predicate should evaluate to true if (col1 < const1), or if (col1 = const1 and col2 < const2). The thought that popped into my mind when Markus Winand showed a slide with this predicate on it – and then pointed out that equality was the only option that Oracle allowed for multi-column operators – was that, despite not enabling the syntax, Oracle does implement the mechanism.

If you’re struggling to think where, it’s in multi-column range partitioning: (value1, value2) belongs in the partition with high value (k1, k2) if (value1 < k1) or if (value1 = k1 and value2 < k2).

December 26, 2013

Current row

Filed under: Oracle,Partitioning — Jonathan Lewis @ 9:52 pm BST Dec 26,2013

Here’s a question that I’ve had on my todo (and draft posts) list for a few years – so I’m presenting it as a task for anyone who can demonstrate the answer.

If you’ve got a pl/sql cursor open and you’re using the “update current of” syntax, what happens if you update the same row twice but the row comes from a partitioned table and moves to a new partition on the first update ?

If you have a demo that you want to include in the comments then start with “sourcecode” and end with “/sourcecode” – in square brackets, without the quotation marks – to get a fixed font format and space preservation. If you have a good demonstration or reference article that you can link to, a simple URL will do nicely.#

December 13, 2013

Bitmap Question

Filed under: Indexing,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 6:09 pm BST Dec 13,2013

This question came up on the OTN database forum a couple of months ago: “Why doesn’t Oracle allow you to create globally partitioned bitmap indexes?” The obvius answer is “It just doesn’t, okay.” But it can be quite interesting to think of reasons why a particular mechanism might not have been implemented – sometimes the answer can give you an insight into how a feature has been implemented, or it might suggest cases where a feature might not work very well, it might give you some ideas on how to work around a particular limitation, and sometimes it might just help to pass the time on a short flight.


November 5, 2013


Filed under: Oracle,Parallel Execution,Partitioning — Jonathan Lewis @ 6:14 am BST Nov 5,2013

There an interesting example of a deadlock on the OTN database forum:

[Transaction Deadlock]

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
PS-00000001-00000011        92     423     S             33     128     S     X
BF-2ed08c01-00000000        33     128     S             92     423     S     X


October 17, 2013

Virtual date partitions

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 7:40 pm BST Oct 17,2013

I posted this question on twitter earlier on today (It was a thought that crossed my mind during a (terrible) presentation on partitioning that I had to sit through a few weeks ago – no matter how bad a presentation is I always seem to be able to get a couple of interesting questions out of it):

Quiz: if you have a table with a date column date_col and create a virtual column defined as trunc(date_col,’W’) and partition on it – will a query on date_col result in partition elimination?


October 11, 2013

Testing Partitions

Filed under: Oracle,Partitioning — Jonathan Lewis @ 7:47 am BST Oct 11,2013

A quick check-list on testing new partitioning features:

  • Step 1 – test with a small amount of data in place, just to see if it works at all
  • Step 2 – test with a reasonably large amount of data in place so that you can spot any unexpected time being lost
  • Step 3 – test with a few uncommitted transactions from other sessions so that you can spot locking problems easily
  • Step 4 – test with SQL tracing enabled so that you can check the trace files for any recursive SQL threats
  • Step 5 – (optional, and harder to recognise) test with event 10704 set so that you can check the trace files for locking threats

Whatever else you do, though, make sure that you always have some data in every object because Oracle has some special optimisations for dealing with partitions that are known to be empty so, in the absence of data, you may be testing something that will never happen in production.


September 11, 2013

MV Refresh

Filed under: compression,Infrastructure,Materialized view,Oracle,Partitioning,Performance — Jonathan Lewis @ 8:47 pm BST Sep 11,2013

I have a fairly strong preference for choosing simple solutions over complex solutions, and using Oracle-supplied packages over writing custom code – provided the difference in cost (whether that’s in human effort, run-time resources or licence fees) is acceptable. Sometimes, though, the gap between simplicity and cost is so extreme that a hand-crafted solution is clearly the better choice. Here’s an idea prompted by a recent visit to a site that makes use of materialized views and also happens to be licensed for the partitioning option.


June 25, 2013


Filed under: 12c,Infrastructure,Oracle,Partitioning,redo — Jonathan Lewis @ 11:43 pm BST Jun 25,2013

The news is out that 12c is now available for download (Code, Docs and Tutorials). There are plenty of nice little bits in it, and several important additions or enhancements to the optimizer, but there’s one feature that might prove to be very popular:

SQL> alter table p1 move partition solo ONLINE;

Table altered.


October 2, 2012

Partitioning 12c

Filed under: 12c,Oracle,Partitioning — Jonathan Lewis @ 8:11 pm BST Oct 2,2012

Most useful presentation of OOW so far, from Hermann Baer of Oracle on improvements in partitioning features in 12c – and there are lots of useful ones, including:

Online move of a partition – so easy to compress a partition when it has reached its final “read-only” state

multiple partition maintenance in one operation – e.g. merge 3 monthly partitions into one quarterly partition, or split one partition into 24 (think about “how do I partition a non-partitioned table”, and 12c has just made it easier and quicker – exchange it into an empty partitioned table, then do one massive split).

partial indexing – define which partitions of a partitioned table should be included in the indexes you create on the table – and the optimizer also knows that different partitions need different plans (an enhancement of “table expansion”.

interval-reference partitionining – you can create an interval partitioned table, and use ref-partitioning to create child (and further decendent) tables, and their partitions will automatically be generated, truncated, and dropped as the parent is extended, truncated or dropped (needs enabled foreign key constraints).

Lots more details – and lots of stress-testing to be done – but I’m off to hear “the optimizer lady” talk about hints.


Oracle has a “safe harbour” slide at the start of all presentations about future developments pointing out that the information presented is an indication of direction, but not guaranteed to make it into production.

July 13, 2012

Recursive SQL

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 4:41 pm BST Jul 13,2012

A question came up on the OTN database forum a few days ago asking why a very simple query against a hash partitioned table was doing so much physical I/O to get a very small amount of data from a single partition:

I have table hash-partitioned on an indexed column “id” which is non-unique and part of my primary key. Inside each partition, rows with the same id are located close to each other which is done by dbms_redefinition reorg using orderby_cols. The intention is to reduce the amount of physical reads since there are no queries that don’t filter on the id column.

What I’m seeing is a lot of physical reads though. The first partition has roughly 80K rows, an average row length of 347, block size of 8K and compression … resulting in 821 blocks. And when (after flushing buffered cache and shared pool) submit a query that filters on “id” only and results is 106 selected rows, I see roughly 1400 physical reads.

The poster has clearly applied some careful thought to minimising the work required – so what’s gone wrong ? To help us analyse the problem he posted the results from running a query through autotrace:


June 5, 2012

Truncate Partition

Filed under: Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 5:11 pm BST Jun 5,2012

A recent posting on the OTN database forum (which I didn’t answer correctly on my first attempt) raised the problem of truncating partitions when there are referential integrity constraints in place. Let’s start with a demonstration of the problem:

September 30, 2011

Table rebuilds

Filed under: Oracle,Partitioning — Jonathan Lewis @ 11:06 am BST Sep 30,2011

Here’s a question from the OTN forum that I’ve seen fairly frequently in the last couple of years:

I have a table abc(c_date date, c_id varchr2(20), c_name varchar2);
The table is already range partitoned on “c_date” column and I have lot of data in that table.
Now my task is to create the hash partions under range; hash partiton needs 16 partitons.
I dont want to create another table with range and hash partitons and migrate the data. The amount of data I have is 3 terabytes.


July 6, 2011


Filed under: Indexing,Oracle,Partitioning,Tuning — Jonathan Lewis @ 5:39 pm BST Jul 6,2011

A question about partitioning came up on OTN a few days ago – but the really interesting part of the thread was about the choice of indexes, and how the choice of partitioning, combined with the data patterns, could make a big difference to the choice of indexes. I think the comments I made about indexes are worth seeing, so I’ve linked to the thread.

July 1, 2011

Partitioned Bitmaps

Filed under: Index Joins,Indexing,Oracle,Partitioning — Jonathan Lewis @ 5:19 pm BST Jul 1,2011

The following question appeared in a comment to an earlier posting on multi-column bitmap indexes and the inability of Oracle to create a bitmap index join when (to the human eye) the strategy was an obvious choice.

I have a query which is using 2 indexes both are bitmap indexes (sizes are 37 and 24 Mbs) and table size is 17gb. While i ran the following query which can very well get the index itself, it takes around 6-8 minutes and using pga around 3 gb.

could you please explain me why ?


« Previous PageNext Page »

Powered by