Oracle Scratchpad

February 28, 2011

Prefixed

Filed under: Indexing,Partitioning — Jonathan Lewis @ 6:12 pm BST Feb 28,2011

From time to time the question about whether local indexes on partitioned tables should be prefixed or non-prefixed appears on the Oracle forums and mailing lists.

It’s possible that I’m wrong – although no-one has come up with a counter-example to the statement I keep repeating – but the whole prefixed/non-prefixed thing for local indexes dates back to a limitation in the optimizer somewhere in the 8.0 time line where Oracle couldn’t do partition elimination on indexes properly but the overhead of the error it made could be dramatically reduced (in most cases) by sticking the partition key at the start of the index.

The guideline for local indexes are the same as they would be for a non-partitioned index on a non-partitioned heap table – the partitioning column(s) are just columns in the table that might be worth including in the index, and the order of the index columns is dictated by the important queries that have to access the table.

For further comments, there’s a note I wrote (which I’ve just been reminded of) on the OTN database forum that adds a little detail to this argument.

4 Comments »

  1. Hi Jonathan

    The only difference these days that I can come up with is the manner in which the index is used if the partitioning columns aren’t specified. In the case of a prefixed index, the CBO has no choice but to use an index skip scan operation (if possible) on each partition which is generally costed much greater than with a “normal” range scan on each partition which is performed on an equivalent non-prefixed index. So a non-prefixed index is much more likely to be used when the partitioning column is unknown in the query, even though all partitions need to be visited in both cases, especially as the number of distinct partitioning values in each partition increases.

    If the partitioning column(s) is specified, then a prefixed index becomes more efficient (providing there’s more than 1 partitioning value per partition), especially as the number of distinct partitioning values in each partition increases.

    So it’s just the everyday case of an index being more selective if more columns are specified.

    I have heard recommended to “always” (hate that word) use a prefixed local index because if the partitoning column is known, great it’ll be efficient and if it’s not know, then so what, a non-prefixed index would have to visit every partition anyways. But it can do so without the need for a index skip scan which will be more efficient, even if there’s just the one partitioning value per partition.

    It’s just the everyday case of an index not being as useful if the leading column is unknown.

    So in the end I agree, it really all just comes back to what columns and in what order within the index makes sense to the application, rather than it being a prefixed/nonprefixed index.

    Comment by Richard Foote — March 1, 2011 @ 12:29 pm BST Mar 1,2011 | Reply

  2. I thought it would be interesting to look up what I said about prefixed indexes in “Practical Oracle 8i”. This is it (with a couple of minor edits):


    Consider the following incomplete table definition and the two similar queries that follow it; you will note that the table is partitioned by date, and each partition represents one month’s worth of data:

    	create table sales(
    		date_part		date,
    		store_id		number(8),
    		...
    	)
    	partition by range (date_part) (
    		partition p01 values less than (to_date('20000201','YYYYMMDD')),
    		partition p02 values less than (to_date('20000301','YYYYMMDD')),
    		partition p03 values less than (to_date('20000401','YYYYMMDD')),
    		partition p04 values less than (to_date('20000501','YYYYMMDD')),
    		...
    	);
    	
    	select	sa.date_part, sa.store_id, sum(sa.sales_value)
    	from
    		sales		sa,
    	where 
    		sa.date_part between to_date('07-Apr-2000','dd-mon-yyyy')
    			     and     to_date('13-Apr-2000','dd-mon-yyyy')
    	group by
    		sa.date_part, sa.store_id
    	;
    	
    	
    	select	sa.date_part, sa.store_id, sum(sa.sales_value)
    	from
    		sales			sa,
    		financial_weeks		wk
    	where 
    		wk.financial_week_id = 200002
    	and	sa.date_part between wk.date_start and wk.date_end
    	group by
    		sa.date_part, sa.store_id
    	;
    


    Although you and I can look at these two queries and decide that they are ‘obviously’ equivalent to each other all that Oracle can see, logically, is that the first query specifies a literal date range that it can check against the partition definitions at parse time to discover that all the relevant data is in partition P04, whereas the second query requires access to a set of unknown dates which are derived from another table at run time. The first query will execute very efficiently against just partition P04, the second query will have to try to join every relevant row from the FINANCIAL_WEEKS table against every partitions in the SALES table.

    Now if you happen to have created an index on SALES(date_part,store_id) this excessive joining might take advantage of the index and appear to work quite efficiently. This is because of a mechanism which I have seen described as ‘index probing’, a technical term which means that Oracle simply tries to do the join through the index, and because the index is highly appropriate discovers very efficiently that there is no relevant data. The critical feature of the index is that it has to start with the partitioning column(s) – in the jargon a ‘prefixed’ index – so that when Oracle tries to join a row from FINANCIAL_WEEK to an unsuitable partition, it will quickly traverse one of the extreme edges of the index to discover in just two or three logical I/Os that the content of the index is either too low, or too high for the incoming date.

    In some circumstances putting the partitioning columns at the leading edge of most of your indexes will be sufficient to get reasonable performance – however it is a solution I particularly dislike because it is inherently sloppy (on the part of the database engine), inefficient, and potentially a huge waste of space. Imagine a 400 million row table partitioned into daily partitions by date. In order to allow ‘index probing’ to work we have to add the date at the front of the indexes on that table; but in any one partition the date value is identical across all the rows; it is a column we would not consider as part of an index if we were dealing with a table rather than a single partition. Nevertheless the manuals advise us that we should waste 3.2GB in each index (400M rows x 8 bytes – but less if we remember to compress the index) by adding this worthless column to the index so that the run-time engine can decide very quickly that it is trying to select rows from an unsuitable partition. It does seem that there is room for improvement in this aspect of partition elimination.

    At the time I was writing about 8.1.5, and by the time 8.1.7 was out (possibly even 8.1.6) the “inherently sloppy” detail had been fixed, and “prefixed” should have become a redundant concept for local indexes. (So my comment about the problem being in 8.0 wasn’t quite accurate enough.)

    Comment by Jonathan Lewis — March 4, 2011 @ 1:03 pm BST Mar 4,2011 | Reply

  3. Hi Jonathan,

    Maybe this is the wrong topic for me to post a question related to partitioning. For one of my clients, in a data warehouse, I have been trying to help with the performance optimization where the bottleneck predominantly is IO as the NetApp filer where the DB sits has reduced write throughout capacity. Though we are trying to alleviate this by relocating some of the volumes to a different filer, I noticed that the single largest number of writes happens on an index tablespace – a couple of orders of magnitude higher than the rest of the tablespaces (even the ones containing the tables). This tablespace contains the locally partitioned indexes for a table. The data is loaded via external tables using direct path append hints directly into the partition. The table has 1 B-Tree index (I_CDR2) and 1 bitmap index (I_CDR)

    The bitmap index is on an ACCOUNT_NUMBER column with a cardinality of 43116 for a sample partition containing rows.

    set lines 200
    col OBJECT_NAME format a30
    col subobject_name format a30
    col TABLESPACE_NAME format a30
    col STATISTIC_NAME format a30
    select OBJECT_NAME,SUBOBJECT_NAME,TABLESPACE_NAME,STATISTIC_NAME,VALUE
    from v$segment_statistics
    where OBJECT_NAME in ('I_CDR','I_CDR2','CDRS')
    and STATISTIC_NAME in ('physical writes','db block changes')
    and subobject_name='CDR_2011_02_25_24';
    
    
    
    
    OBJECT_NAME                    SUBOBJECT_NAME                 TABLESPACE_NAME                STATISTIC_NAME                      VALUE
    ------------------------------ ------------------------------ ------------------------------ ------------------------------ ----------
    I_CDR                          CDR_2011_02_25_24              CDRIDX01                       physical writes                     16205
    I_CDR                          CDR_2011_02_25_24              CDRIDX01                       physical writes direct                  0
    I_CDR2                         CDR_2011_02_25_24              CDRIDX01                       physical writes                    131623
    I_CDR2                         CDR_2011_02_25_24              CDRIDX01                       physical writes direct                  0
    CDRS                           CDR_2011_02_25_24              CDR18                          physical writes                     35451
    CDRS                           CDR_2011_02_25_24              CDR18                          physical writes direct              33891
    
    
    
    
    OBJECT_NAME                    SUBOBJECT_NAME                 TABLESPACE_NAME                STATISTIC_NAME                      VALUE
    ------------------------------ ------------------------------ ------------------------------ ------------------------------ ----------
    I_CDR                          CDR_2011_02_25_24              CDRIDX01                       db block changes                   468688
    I_CDR                          CDR_2011_02_25_24              CDRIDX01                       physical writes                     16205
    I_CDR2                         CDR_2011_02_25_24              CDRIDX01                       db block changes                   234864
    I_CDR2                         CDR_2011_02_25_24              CDRIDX01                       physical writes                    131623
    CDRS                           CDR_2011_02_25_24              CDR18                          db block changes                     9744
    CDRS                           CDR_2011_02_25_24              CDR18                          physical writes                     35451
    
    

    I am unable to figure out why the bitmap index shows such a high value for the “DB Block Changes” statistic while the actual physical writes is only a fraction of that. There are upto 15 concurrent streams of inserts using the below SQL going on at any point of time

    insert /*+ append parallel(cdrs, 35) */ into cdrs ( CARRIER, ACCOUNT_NUMBER, SERVICE_ID, 
    ...
    REASONCHANGE, OTS, DROPPED ) 
    select /*+ parallel(cdrs_ext, 35) */ CARRIER, ACCOUNT_NUMBER, SERVICE_ID, 
    ...
    REASONCHANGE, OTS, DROPPED from cdrs_ext 
    
    

    Could you offer any insight into this and also any recommendations on whether converting the bitmap index to a b-tree makes more sense and would alleviate the enormous amount of writes in the index tablespace?

    I am guessing that the discrepancy between the db block changes and the physical writes means that temporary segments are being constructed and later merged into the physical bitmap index structure

    Comment by suryarao — April 17, 2011 @ 7:24 pm BST Apr 17,2011 | Reply

    • Suryarao,

      Your question may be about parallelism, bitmaps, index maintenance, or the meaning of “db block changes”, but your question is clearly not related to the topic – beyond the fact that you mention locally partitioned indexes.

      Point 1 – you say you have up to 15 concurrent processes doing direct path inserts (apparently on the same table) with a parallel 35 hint … which means potentially 525 concurrent processes somehow affecting the same object.

      Point 2 – you have a bitmap index on a table, and have up to 15 concurrent processes inserting data into that table: bitmap indexes and concurrency do not work well together, and it is a little surprising that you are not seeing numerous deadlocks

      Point 3 – When you insert data into a table (especially when using the append mechanism) the data tends to be packed into a few blocks; however indexes (other then sequential meaningless key indexes) are often subject to random located maintenance operations across their entire width. This means you often see more I/O on indexes than on tables when doing data warehouse loading.

      Point 4 – the database writer tends to write every three seconds. In the course of a three second pl/sql loop I could probably update a single datablock tens of thousands of times before it got written once.

      I have no idea what the net result might be of your combination of concurrency, parallelism, append hints and bitmap indexes – but I’d trace a session for a while to see where it spent its time (including read time), and I’d take a snapshot of its activity (v$sesstat) to find out what it was doing (e.g. redo size) before I did anything else.

      A couple of posts that might be of interest: “Tuning Updates” and “Bitmap Updates”

      Comment by Jonathan Lewis — April 24, 2011 @ 9:11 am BST Apr 24,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,161 other followers