Quiz Night

June 10, 2011

Quiz Night

Filed under: Indexing,Oracle,Performance — Jonathan Lewis @ 6:16 pm BST Jun 10,2011

Here’s an interesting question from the OTN database forum:

Now I delete 90% rows and DO NOT rebuild table indexes nor coalesce indexes .
In that case does index range scans using queries perform faster (less i/o) ?

The thing that makes it interesting is the scope it gives you for imagining reasons why the performance won’t change, or might get better, or could get worse. So how about it – can you think of an argument for each of the three possibilities ?

Update – 15th June 2011

The first thing you need to do when you see a question like this is start asking for more information – as the first comment did.

How big is the table – percentage are generally useless unless you also know the scale, and it’s possible that pure size might make a difference.
What’s the pattern of data in the table – that probably matters
Is the table partitioned – and does the delete pattern vary with partition
What’s the pattern of data across the indexes, that could make a lot of difference
Are any of the indexes partitioned (whether or not the table is)
Are all the indexes plain B-tree indexes – or bitmaps, or context, or spatial; and function-based indexes, reverse key indexes …
Are you deleting the data in lots of tiny batches with commits, or one big delete
And yes, as asked in the first comment – are you going to collect statistics before running queries, because that could make a big difference.
Timing is quite important – are you thinking only in the very short term (delayed block cleanout effects), or in the longer, stabilised term.
Are you assuming the data you delete is data that was not previously returned in the queries – but would that always matter anyway
And why are you asking about range scans, not general purpose queries – that rather avoids the problem of how query plans can change as stats change.

Maybe we could make some assumptions – though that can have dangerous consequences. We might guess that the driving idea behind this question is that there’s a load of old data that isn’t being queried any more and if we simply delete it our queries will, at worst, still show the same performance. So I’m going to take the line that (a) we’re expecting any queries to return the same result and (b) we’re going to be inserting new data over time.

Just picking up a few of the comments that have appeared so far we can see how many details you might have to consider to think this question properly.

Comment 1 says: “how about collecting stats” – it seems likely that new stats would appear soon (90% deletion is more than a 10% change, so if auto stats collection hasn’t been disabled …). What if, for some indexes, 100% of the data has been deleted from 90% of the blocks, then the leaf_block count of an index will change by a factor of 10 (and that’s effectively a bug, by the way): a possible consequence is that Oracle will decide that an index fast full scan is appropriate in cases where it previously took a different path.

Comment 3 picks up the case that if you had an index based on columns where the deleted data had null values in the relevant columns, then the 90% table deletion wouldn’t affect that index at all. This may sound a little extreme – but as a minor variation on the theme, the idea of creating function-based indexes which hold values for “recent” data is quite well-known and very effective; the argument might be totally appropriate.

Comment 5 makes a fairly subtle and important point. If I used to have a range scan that examined 1,000 rows and returned 100 and the 90% of rows I deleted was exactly the 900 rows I didn’t return then there are two outcomes. If I had to visit the table 1,000 times to identify the 100 rows then the query will now be quicker; if I had been able to identify the 100 rows by examining 1,000 index entries then the query performance will be pretty much unchanged.

Comment 5 also has a rather nice thought on stopkeys (rownum <= N) – but it’s similar to Richard Foote’s min() example in comment 4 – I think it would only apply if the result were going to change.

Comment 9 makes an interesting point. If you delete a lot of data from a table you have no idea (until you look at the data patterns carefully, or until after you’ve collected the stats) of how the clustering_factor of the index will change. It will probably drop – though technically it could stay the same – but how far it drops compared to the change in the number of rows left in the table could make a big difference to the “selectivity * clustering_factor” bit of the index range scan calculation. Bear in mind that if your queries are returning the same result before and after the delete then your selectivities must have gone up by a factor of 10 because you’re returning the same volume of data from a total volume that has decreased by a factor of 10. (I’m assuming that the optimizer gets its stats right when I say this, of course).

Comment 6 brings up an important feature of bitmap indexes. When you delete a row from a table you delete the corresponding row from a b-tree index (eventually), but you update a bitmap index chunk, which means generating a new bitmap index chunk with one bit changed. So deleting 90% of the table data, however you did it, could result in a doubling of every bitmap index on the table. (Results will vary with version of Oracle and the strategy used to delete the data)

Several comments picked up the issue of “delayed block cleanout” that’s likely to have an impact for a while after the big delete. There might be other intermittent effects later on, depending where new data goes and how many leaf blocks were completely emptied by the delete – it is possible for some odd locking waits to appear as Oracle tries to find an empty block that can legally be moved for the split.

Personally, and ignoring the bitmap index threat, I think it’s safe to say that if your indexes are very well designed (which eliminates most systems I’ve seen), then most queries will probably perform almost as efficiently after the delete (and delayed block cleanout) as they did before – provided the optimizer still picks the same execution plan: but the moment you collect up to date statistics you may see some staggering changes (in either direction) for some queries. (And if you don’t collect up to date statistics at some point you will also seem some staggering changes as Oracle starts bring in “out of range” adjustments to costs.

Bottom line: If you’re going to delete 90% of the data in a large table then you’ve got to think carefully about how to minimise the side effects.

Comments (15)

15 Comments »

Can I at least gather new statistics after the delete? : )

Comment by Kamal — June 10, 2011 @ 8:47 pm BST Jun 10,2011 | Reply
Hi,

I suppose that:
Performance won’t change if deleted rows are out of the scope of index range scan, or if the statement does not have to go to the table (table access by rowid) after the index range scan
Performance will be worse for other sessions until I commit.
Performance will get better if it has to read less blocks from the table (depends on the clustering of the remaining rows).

Regards,
Franck.

Comment by pachot — June 10, 2011 @ 9:21 pm BST Jun 10,2011 | Reply
If the table only had 10 rows, and you delete nine, then I suspect you wouldn’t have many index range scans.
A table with a million rows, reduced to a hundred thousand. One big delete or 900,000 individual deletes ?
If I delete all the rows that have values for a particular set of indexed columns (ie leaving only rows with null values for all columns in the index) then index range scans on the empty index should be fast.
Conversely, if all the deleted rows had null values and weren’t indexed, then there would be no impact.

If I deleted but not committed, applying undo could give much worse performance.

Comment by Gary — June 10, 2011 @ 9:23 pm BST Jun 10,2011 | Reply
Lots of scope here.

Example of worse. You delete 90% of data that corresponds to all the “left most” index entries and you run a query looking for say the MIN of the value and the CBO still uses the index and so has to troll through numerous empty leaf blocks looking for the first non-deleted index entry.

Example of better. You delete 90% of data and you run a query that uses the index to now select 0 rows, where previously it had to visit the table with lots of table related I/Os to return a number of rows.

Example of the same. You delete 90% of data but you still query the 10% of data that is accessed by the 10% of the index that might not have been impacted by the deletes. Effectively, nothing changes for the specific query.

Overall answer to the OTN question, it depends :)

Comment by Richard Foote — June 11, 2011 @ 4:28 am BST Jun 11,2011 | Reply
- Hi Richard,
  Can you please explain your Example of Worse ? How the performnace will be worse compare to if all left index entries had data in them to find MIN and if optimizer decides to use Index ?
  
  thanks
  kc
  
  Comment by kc — June 15, 2011 @ 1:05 pm BST Jun 15,2011 | Reply
  - Hi KC
    
    Because when looking for the MIN of a value, Oracle can generally use the index via the index full scan min/max access method to find the first leaf block and hence find the minimum value. However, if this block contains nothing but empty index entries, it has to keep looking and so hops across to the second leaf block and so on and so on until it finds the first non-deleted value. If you delete 90% of index entries and they’re all on the “left hand” of the index, you might need to access 90% of the entire index, a single block read at a time to find the min value (maybe 1000s of LIOs) when previously it found it straight away. Hence, it’s an example of performance being much worse than previously.
    
    Comment by Richard Foote — June 16, 2011 @ 11:57 am BST Jun 16,2011 | Reply
It depends on how you query the data, or what is the data distribution of the range scan predicates.

1. does the result set of the range scan changed so much or almost no change.
2. does the range scan followed by a index filter or table filter.
3. does the range scan followed by a stop key by count.

a. If the predicates is nearly unique, the range scan performance will remain unchanged or will have a minor improvement(by eliminate the leaf block or table block scan).
b. If the result set is based on a stop key by count, and there are so many rows map this predicates, the scan performance may be will degrade.
c. If the result set will be significantly decreased, then the scan performance will be improved significantly.

Comment by jametong — June 11, 2011 @ 4:41 am BST Jun 11,2011 | Reply
Worse: If the delete happened in a single transaction, and assuming a large table, the first range scan after the commit will have to do some massive cleanouts and re-formats of the leaf blocks, thus generating some serious redo.
Same: Since the blevel hasn’t changed, short, equality-based range scans looking for still-existing keys will see no difference.
With bitmap indexes, it will all depend on how the deleted rows were distributed; for versions prior to 10g it will always be worse.

Comment by Flado — June 11, 2011 @ 7:34 am BST Jun 11,2011 | Reply
Suppose that the deleted table contains indexes that are based on monotonically increased sequences i.e. indexes based on a sequence; and the 90% delete concerns the left hand side of the index, and the deleted free space will not be re-used by subsequent inserts; we will then have an index that will become full of data only on its right hand side. If we apply the sys_op_lbid function to this index then we will point out that the number of keys per leaf block will be such that this index is smashed and not efficient and need to be coalesced (the same thing could be said about an index based on a date last_update_date for example).

Another example of worse is the High water mark of the table. FULL table scans will be of a high cost if nothing is done on the table and the deleted space keep empty and not inserted again.

But if we shrink the table to set its High Water Mark, we might damage the Clustering Factor of the indexes and hence decrease their desirability by the CBO.

Finally it’s all about the delete pattern and how the rows are scattered through the table indexes and the effect this delete will have on this data scattering

Best regards

Mohamed Houri

Comment by hourim — June 11, 2011 @ 8:37 am BST Jun 11,2011 | Reply
- Hi Jonathan,
  
  If I delete 90% of my rows, then performance of the queries is based on following scenarios –
  
  1. If query is doing range scan, then there will not be any impact. Only thing here is if after rebuild, blevel of the index reduces then I have to do one less IO.
  2. If query is doing Index Full/Fast Full scan, then there will be huge impact on the query.
  3. My full Table scan will impact. This is not related to index rebuild but I have to do table reorg as well.
  
  Thanks
  
  Nilesh Rajapurkar
  
  Comment by Nilesh Rajapurkar — June 11, 2011 @ 9:40 am BST Jun 11,2011 | Reply
This must depend on the size of the table and complexity of both table and indexes.

SAME: If the table is small, then would the delete not have put the modified blocks into the buffer cache? Plus the index would have been read during the DELETE and would also reside in the buffer cache. Therefore not performing any other modifications to the table or the index, the performance difference would be so dramatically small as to be un-noticeable (is this how we are measuring the performance?).

WORSE: If the table or index is large (maybe too large to fit in the buffer cache or gets aged out almost immediately), then we start talking about physical reads on the table or index and I can’t think that hard but I think the others have nailed it when they talk about traversals and differing index methods.

BETTER: The performance would be better if you have a partitioned index/table and you delete the records 90% of which happen to be in one partition. The range scan will have significantly less work to do as the other partition(s) would have been dropped (??).

Comment by Darryl Griffiths — June 14, 2011 @ 6:11 pm BST Jun 14,2011 | Reply
SAME:
If the retrieved data required the use of index blocks which were not affected by the delete, the performance will not change.

BETTER:
If the retrieved data required the use of index blocks that have been affected by the delete, the performance will be better because you do not have access to the corresponding blocks of the table.

Note that the index blocks will be taken to buffer cache anyway, even if empty.

WORSE:
After the cancellation of 90% of the data(and recalculation of the statistics), if the clustering factor is high (for high, I mean a number very close to the number of rows in the table), the optimizer might choose a table scan and then read up to HWM, which means reading all blocks, even those subjected to delete (if you’re not taking Reorg).

Regards,

Donatello Settembrino

Comment by Donatello Settembrino — June 15, 2011 @ 5:51 pm BST Jun 15,2011 | Reply
I wonder if it has a sense to compare performance when the result is different. Most of the answers are concerning different result because of the deleted rows. It would be interresting to think of worse/same/better reasons as well for the same query giving the same result…

Comment by pachot — June 15, 2011 @ 7:50 pm BST Jun 15,2011 | Reply
[…] out his Night Quiz blog entry for an excellent discussion on how things might be better, worse, unchanged or possibly […]

Pingback by A Few Random Notes « Richard Foote’s Oracle Blog — June 18, 2011 @ 12:42 pm BST Jun 18,2011 | Reply
[…] Will a massive delete improve performance […]

Pingback by Massive Delete | Oracle Scratchpad — January 31, 2021 @ 5:34 pm GMT Jan 31,2021 | Reply