Over-indexing

January 10, 2013

Over-indexing

Filed under: Indexing,Oracle — Jonathan Lewis @ 6:43 pm GMT Jan 10,2013

This is the text of an article I published in the UKOUG magazine a few years ago, but it caught my eye while I was browsing through my knowledge base recently, and it’s still relevant. I haven’t altered the original apart from adding a couple of very brief comments in brackets [Ed: like this].

Over-indexing

One of the strengths of a relational database is that you should be able to throw any reasonable query (and even some unreasonable queries) at it and it will be able to return the right answer without being told how to navigate through the data.

There’s no guarantee, though, that you’ll get the answer quickly unless you’ve given the database some help by turning your logical model into a sensible physical implementation. Part of the physical implementation will be the choice of indexes – and this article reviews one of the commonest indexing issues that I see in OLTP systems

Costs and Benefits.

Why do we create indexes? There are three main reasons as far as Oracle is concerned. We need some indexes to help Oracle enforce uniqueness; we need some indexes for performance reasons to supply a high-precision access path to important data; and we may need some indexes to help Oracle enforce referential integrity constraints.

Theoretically, of course, we don’t need any indexes to ensure data correctness, so you could say that the only reason for having indexes is to improve performance. For example, you could enforce primary key constraints simply by locking tables on every insert, update or delete and then doing a tablescan to make sure that there is no conflicting data – but having an index “for” the primary key allows you to check just one block in the index without locking the table, so you get better performance and better concurrency. (In passing, an index that covers a primary key or unique constraint need not be a unique index, and the index definition need not be an exact match for the constraint – provided it starts with the relevant columns.)

So indexes can be a massive aid to performance and concurrency. But indexes have a price – you have to maintain them, usually in real-time, and that can cause problems. Roughly every two weeks [Ed: not quite so frequently in 2012], someone writes into the Oracle Forums with a question like:

“I have a batch process that inserts 1 million rows of data into a table every night. But it’s very slow and I see lots of waits for ‘db file sequential read’ while it’s going on. Why is this happening?”

If you insert 1 million rows into a table you might have to create and (eventually) write something in the region of 25,000 table blocks (assuming a row size of about 200 bytes … 40 rows per block). If you have one index on that table then, in principle, you might have to do a random read to find an index leaf block for every single row you insert – and you may have to write the updated block out very soon afterwards to make space for the next random read. If the table is large and you have a couple of dozen indexes this “in principle” threat very soon becomes a real one.

There is a very important difference between (heap) tables and their indexes – a row can go into a table wherever the next empty space happens to be; a row has to go into an index in exactly the right place, and finding that place can be expensive. Things get worse if you are doing updates – you can update a row in the table “in situ”, but if you change the value of a column that appears in an index you have to find the index entry using the old value, delete it, and then insert an index entry that uses the new value in the right place.

So don’t create indexes unless they really pay for their existence when you come to query the data – and when you do create an index make sure you maximise the payback.

The Commonest Error.

There are a number of subtle ways in which you can create too many indexes but there is one error that is easy to spot – the foreign key index.

When you create a referential integrity constraint (foreign key) you do not need to create an index to support it and (unlike the unique constraint and primary key constraint) Oracle will not create such an index automatically. However, if you are going to update the primary key or unique key at the “parent” end of the referential integrity constraint and a suitable index does not exist Oracle will lock the “child” table in a fairly aggressive way to check for matching committed and uncommitted child rows. Consequently many people (and some application generators) automatically create a matching index for every foreign key they create.

Rule 1: don’t create “foreign key” indexes unless they are really necessary, or unless they are sufficiently useful that you would have created them anyway even if there hadn’t been a referential integrity constraint to protect.

If you do have a foreign key constraint that needs an index, though, remember that the index does not have to be an exact match for the constraint – the critical feature is that it should start with the same columns (not necessarily in exactly the same order) as the constraint definition. This means that you can always “add value” to a foreign key index that might otherwise be just a technical overhead.

Rule 2: consider adding columns to foreign key indexes to make them useful to high precision queries.

Moreover, when you think about the parent/child relationship remember that you often see one parent row with several child rows – think order/order_lines, or currency_code/payments. In cases like this there may be an opportunity for saving quite a lot of space (and reducing the impact on the buffer cache) by compressing the index on some, or all, of the foreign key columns.

Case Study.

Here’s a list of indexes (with a little camouflage) from a system I was looking at recently [Ed: some time in 2008]. It’s typical of the over-indexing problems that I see fairly frequently. I got this list by running a query similar to:

select
        index_name, column_name
from
        user_ind_columns
where
        table_name = 'XXXXX'
order by
        index_name, column_position
;

You could enhance this by query to use dba_ind_columns, of course, and add all sorts of extras by joining to dba_tab_columns, dba_indexes, and dba_ind_expressions to pick up details about mandatory columns, column types, uniqueness, index types, and functions involved in function-based indexes. In the case of my example, I used a slightly more sophisticated query to show the expressions used in function-based indexes – and here’s the list of indexes I found on one table:

ap_fun_fk_i ( fun_id )
ap_grp_fk_i ( grp_id )
ap_grp_fun_id_i( grp_id, fun_id )
ap_org_ap_i ( org_id, ap_id )
ap_org_fk_i ( org_id )
ap_per_ap_i ( per_id, ap_id )
ap_per_fk_i ( per_id )
ap_pk ( ap_id )
ap_ud_i ( trunc(update_date) )

This system had a convenient naming convention, PK in the name indicates the primary key, and FK indicates foreign keys. You can see immediately that we seem to have several redundant indexes as far as protecting foreign keys is concerned.

AP_GRP_FK_I is made redundant by AP_GRP_FUN_ID_I,
AP_ORG_FK_I is made redundant by AP_ORG_AP_I,
AP_PER_FK_I is made redundant by AP_PER_AP_I.

On top of this, we might decide that since there are only a few organizations (org_id) we could benefit by compressing the index on (org_id, ap_id) on its first column. We might also compress the index on functions (fun_id) as well – although we might go one step further and decide that we will never delete rows from the functions table or update its key values, do decide to drop the index altogether.

Following the thought about compression, we could also consider compressing the index on (trunc(update_date)) – a fairly large amount of data gets updated each day, so the value is quite repetitive, similarly (grp_id, fun_id) is also fairly repetitive, so we might compress both columns.

This leaves us with:

ap_grp_fun_id_i ( grp_id, fun_id ) compress 2
ap_org_ap_i ( org_id, ap_id )  compress 1
ap_per_ap_i ( per_id, ap_id )
ap_pk ( ap_id )
ap_ud_i ( trunc(update_date) )  compress 1

The reduction in the number of indexes is likely to reduce the undo and redo by about 40% as we insert (single row) data into this table, as well as reducing the possible I/O demand by the same sort of percentage. At the same time, the effect of the compression could (in this case) eliminate a further 10% to 15% of I/O related to this table because of the reduction in size of the indexes.

Further considerations.

The only remaining issue to consider is whether there is anything about the nature of our processing that suggests the need for housekeeping on these indexes – and the index on trunc(update_date) is an ideal candidate for consideration.

As we update data, we are going to delete entries from blocks in the left hand end of the index and insert them at the right hand end of the index. Depending on the rate and pattern of updates it is possible that a large number of blocks at the left hand end of the index will become close to empty – this could have a significant impact on the effectiveness of the buffer cache and might encourage us to use the coalesce command or shrink space compact command on the index every few days. (If this sounds familiar, I have written about this particular index in the past.)

Having thought about that, you might then wonder why we have an index on trunc(update_date) at all. A query that tries to find all rows updated on a certain date, or range of dates, may have to jump around all over the table to acquire a lot of data. Who wants to do this, and how often? It’s possible that the index exists for a particular report that runs just once per day – so maybe there is an argument for dropping this index as well.

Conclusion:

It’s very easy to create more indexes than you really need, and indexes can be very expensive to maintain. Sometimes you can spot “obviously” redundant indexes simply by inspecting index names and column names. A few reasonable guesses may allow you to drop some indexes and make others more efficient with a minimum of risk and effort.

Comments (17)

17 Comments »

Jonathan,
Great article. Many data architects / DBAs / developers do not give enough thought to the indexing strategy that is employed until an application is deployed and database performance becomes an issue. Then, they start adding indexes on an ad-hoc basis to solve some particular problem.

I recently came across a database that was over-indexed. You could tell based on naming conventions, which indexes were added after the fact, and usually by what tool (Enterprise Manager – SQL Tuning Advisor). The problem I ran into, was that to make any change (remove / consolidate) indexes was a project on its own. (ie. Set-up a QA database, characterise the performance of the application, fix the indexes, re-characterise the performance to identify negative / positive effects ) Of course you need to get everyone to buy into doing this – Business Analysts, Developers, QA, Product Managers – and until the pain of poor application performance is significant enough, the status quo wins.

Hopefully, as more people read your work, and the works of others in the Oracle community, about how Oracle works, the quality of deployed applications and the databases they run against will get better.

Philip

Comment by Philip Papadopoulos (@pmpapadopoulos) — January 10, 2013 @ 7:02 pm GMT Jan 10,2013 | Reply
- But that is not how it always is?
  Fixing something that was not properly done initially is double (or triple) the work,time and risk of doing things right from the beginning.
  
  Comment by Pablo — January 11, 2013 @ 9:15 am GMT Jan 11,2013 | Reply
Oh man! Can you please send that to the j2ee “expert” con-sultants who developed our POS system?
Let me see:

1- Table TXN_SEARCH, 51 columns

2- Number of indexes on TXN_SEARCH: 50

They: “Nuno, can you check the database? Updating our searches is slow, must be something in your setup! Can you move to RAC?

Me: (after finding the above, slowly reaching for the BIG mattock handle…) Are you people completely insane?

Comment by Noons — January 11, 2013 @ 5:47 am GMT Jan 11,2013 | Reply
Two things I see a fair amount of – firstly is a reluctance or an ignorance to use composite indexes except for referential integrity (if any); secondly that redundancy when a composite index has the same leading column or columns as another index.

Index compression is probably one of the the most under utilised features in the database, I know I’m guilty of just not considering it enough. I recently added it to a couple of composite indexes with 5,6 even 7 highly repetitive columns (on a wider table). Actually I’m about to do some tests on making the table concerned a partitioned IOT (back again to underutilised features) with some key compression. But just the space saving on such a repetitive key composite index is great and the performance benefit is not bad either.
But when I was looking into key compression I canvassed a couple of opinions and the general response was that they never even tend to consider it either.

And talking of things I’m guilty of, another one is being too quick to bung a couple of extra columns on the end of an index because I know one or two queries will benefit. A quick win perhaps for those one or two queries, which can arbitrarily improve some performance metric when sql tuning had already achieved the benefit required, but at the expense of making the index bigger than it needs to be for everyone else.

Sometimes it can be tricky striking the right balance.

Comment by Dom Brooks — January 11, 2013 @ 11:03 am GMT Jan 11,2013 | Reply
- Dom,
  
  “And talking of things I’m guilty of, another one is being too quick to bung a couple of extra columns on the end of an index because I know one or two queries will benefit. A quick win perhaps for those one or two queries, which can arbitrarily improve some performance metric when sql tuning had already achieved the benefit required, but at the expense of making the index bigger than it needs to be for everyone else.”
  
  And, repeating Mohamed’s comment, the consequences of the change in clustering_factor can be even more of a problem. Wouldn’t is be nice if Oracle has the same “carried columns” concept that SQL Server has – i.e. the ability to add columns that don’t participate in the ordering of the index entries.
  
  Comment by Jonathan Lewis — January 12, 2013 @ 1:12 pm GMT Jan 12,2013 | Reply
- I can’t remember when index compression was dropped as a default. Oracle used to compress indexes in V5 and V6. I think it was in either 7 or 8 they dropped index compression as a default setting, but can’t place the actual release?
  
  Comment by Noons — January 12, 2013 @ 1:31 pm GMT Jan 12,2013 | Reply
  - Noons,
    
    The current structure for indexes was introduced in v6.
    
    Prior to that, v5 had a compression mechanism (when machines were small and very expensive) that reduced the size of index entries using a sort of “jpeg” like algorithm that recorded changes in the leading edge and trailing edge. According to a comment that Dave Ensor once made (possibly during a SIG presentation in the UK) the mechanism was so aggressive that sometimes you got false positives – i.e. you had to visit the table to find out the actual value that had been encoded in the index, and it wasn’t always one you wanted.
    
    The current form of index compression is nothing like th v5 form – it’s deduplication of the leading column set on a block by block basis. From memory I’d say it it appeared in v8, but I wouldn’t be surprised if someone told me it was v7.
    
    Comment by Jonathan Lewis — January 12, 2013 @ 6:46 pm GMT Jan 12,2013 | Reply
Considering over indexing at a design step is very safe then considering it once in PRODUCTION. Dropping an index from an existing application is not the same thing as if we have never created it.

What I want to emphasize is that it might exist a situation where two redundant indexes exist, t1(a,b) and t2(a) for example, and we decide to drop the t2(a) index because it is already covered by the index t1(a,b). However, we know that adding an extra column to an index might damage the clustering factor of that index. This is why t2(a) may have a better clustering factor than t1(a,b) and dropping it could make the CBO choosing another execution path that does not include an access to the remaining t1(a,b) index. This is why I always start looking at redundant index at design time.

Comment by Houri — January 11, 2013 @ 11:36 am GMT Jan 11,2013 | Reply
- Mohamed,
  
  You’re quite right to emphasise how hard it can be to be change the indexes in a production system with confidence that nothing will go wrong. (Thank goodness for the advent of invisible indexes – not a 100% solution, but a most helpful option.)
  
  Your point about the clustering_factor is paricularly significant – I may even have an article on the blog demonstrating Oracle deciding not to use an index on (cola,colb) after dropping the index on just (cola) [Ed Jan 2016: If I have, I can’t find it – but here’s a short article on the topic]. The change in clustering_factor can have a much more profound impact on the use of the index than the change in basic size, and it’s quite hard to predict how bad things can be as a side effect.
  
  Comment by Jonathan Lewis — January 12, 2013 @ 1:10 pm GMT Jan 12,2013 | Reply
“indexes can be very expensive to maintain”.

Expensive in terms of what (latches, elapsed time, buffer gets)? And how do you calculate how expensive an index is to maintain?

Comment by Guest — January 11, 2013 @ 3:09 pm GMT Jan 11,2013 | Reply
- Extra buffer visits, which also means extra latch activity, and an increased risk of random disk I/O
  Extra undo and redo (twice as much per index entry as for a table row on an update).
  
  Comment by Jonathan Lewis — January 12, 2013 @ 1:19 pm GMT Jan 12,2013 | Reply
[…] am very often warning: dropping redundant indexes in production is not 100% safe. I have instead always been advocating […]

Pingback by Redundant Indexes | Mohamed Houri’s Oracle Notes — March 24, 2014 @ 11:28 am GMT Mar 24,2014 | Reply
Hello Jonathan,

I have two questions. First, you said that if you are going to update the primary key or unique key at the “parent” end of the referential integrity constraint Oracle will lock the “child” table to check for committed and uncommitted child rows if a suitable index does not exist. So, it is understood that creating index on foreign keys are beneficial. After that you said don’t create “foreign key” indexes. I don’t understand what do you mean?

My second question is, also can you briefly explain what do you mean in rule 2.

Thanks

Comment by gs — July 2, 2015 @ 7:27 am BST Jul 2,2015 | Reply
- gs
  
  I didn’t say: ‘don’t create “foreign key” indexes’, I said: ‘Don’t create “foreign key” indexes unless they are really necessary’.
  If you never update or delete the key at the parent end of the referential integrity constraint, is a “foreign key” index necessary ? It isn’t needed to avoid the locking problem so you’d better have some other (good) reason for creating it.
  
  Rule 2: example:
  A “foreign key” index on currency_code on a transactions table is probably a waste of resources because (a) we’re not (I hope) going to change or try to delete a currency code in the parent table and (b) if we query for “currency_code = ‘XXX'” in the transactions table we’re probably going to find a huge amount of data and the optimizer would (probably) ignore the index and do a full tablescan.
  
  On the other hand (a) company standards might demand a “foreign key” index, and (b) we might know that there were fairly regular queries for things like: “currency_code = ‘XXX’ and transaction_date > trunc(sysdate) – 1′. In this case we might decide to get rid of the “foreign key” index of (currency_code) and replace it with (currency_code, transaction_date).
  
  Comment by Jonathan Lewis — July 2, 2015 @ 9:17 am BST Jul 2,2015 | Reply
[…] imagine it even if you’ve never seen it). After all, it’s only relatively recently that questions of the following type stopped appearing regularly on forums: “I have a batch process that inserts 1 million rows of […]

Pingback by Investigating Indexes – 2 – All Things Oracle — January 9, 2017 @ 9:54 am GMT Jan 9,2017 | Reply
[…] obvious response to this heading is to question the number of indexes on the table – because big tables with lots of indexes tend to give you lots of random I/O as […]

Pingback by ASSM argh! | Oracle Scratchpad — January 9, 2018 @ 6:42 pm GMT Jan 9,2018 | Reply
[…] Over-Indexing (Jan 2013): Thoughts on minimising the costs of index maintenance. […]

Pingback by Indexing Catalogue | Oracle Scratchpad — September 26, 2023 @ 11:40 am BST Sep 26,2023 | Reply