As we saw in part 1 of Index Explosion, we can lose a lot of space in index leaf blocks because
- at even fairly low levels of concurrency Oracle can manage to introduce a large number of (redundant ?) ITL entries into a leaf block and …
- once a single leaf block has a large number of ITL entries the entire ITL is copied into both target blocks every time the leaf block splits – so one brief accident can have a significant long term effect.
But what’s happening, and what should we do about it ?
The first thing to notice is that the effect appears when three conditions appear simultaneously:
- the CPU is heavily loaded,
- several transactions hit a single leaf block in the index at the same time,
- the index leaf block has to split to handle those transactions
The example I created for part 1 made the effect highly visible – I ran far more concurrent processes than I had CPUs, I was using pl/sql loops to insert data continuously, and I was targetting a sequence-based index at the “high values” end. If your system isn’t heavily loaded, or if you don’t have any time-based or sequence-based indexes, you may never see this problem.
In fact, as with many other features of Oracle, you may be suffering from this problem to a slight degree without even noticing it. When I started investigating the issue and saw the leaf block dump that had obviously “gone bad” it reminded me that I had seen the problem before – about 15 years ago on a system that was probably Oracle 7.3 or earlier. Since then I’ve never seen a performance issue that led me back to an index with this problem.
So how do you deal with the problem ? Back in the days when I used to fix things without having time to work out why they had got broken and documenting why the fix was appropriate the answer was obvious: something was making the ITL grow very large when it should not need to grow – so set maxtrans on that index to a reasonable value that represented the worst case concurrency.
If you do this, you have to coalesce or rebuild the index just once to get rid of the extreme ITLs, and it’s just possible that setting maxtrans will introduce some unexpected contention at the moments when Oracle would otherwise have run away with the ITL – but maxtrans fixes the immediate problem.
It’s a long time since I applied that fix, but I don’t recall anyone complaining after I did it and it didn’t seem to have any negative side effects for that application on that version of Oracle, but we’ll talk about side effects and whether it’s the best (or right) thing to do in a future note. First let’s look at the impact of setting maxtrans to nine on my test case – again running the test on 10.2.0.3 using eight concurrent processes on a laptop with two CPUs running XP Pro.
Here are a few results to compare – I ran the test three times with and without setting maxtrans, and noted various statistics, times, and sizes (for various reasons I had 110,008 rows in this test run):
With maxtrans set: ------------------ Average DB Time for a process (seconds) 31.59 Average redo size for a process (bytes) 14,088,953 Average leaf blocks 409 Average leaf block len (bytes) 7,832 Index height (blevel + 1) 2 Without setting maxtrans ------------------------ Average DB Time for a process (seconds) 43.23 Average redo size for a process (bytes) 15,211,463 (15.3MB / 14.9MB) Average leaf blocks 743 (671 / 780) Average leaf block len (bytes) 4,176 (3992 / 4544) Index height (blevel + 1) 3
You’ll notice that I’ve included pairs of figures for the “no maxtrans“ test. One run was less of a disaster than the other two, giving you some indication of how you can get significant variation in results from a tiny variation in timing, or even a variation in the process ID that you happen to use when you connect to the instance.
As you can see: in my specific case, with my specific data set etc. the index behaved quite nicely when I set maxtrans to nine. Ideally I’d like to see a sequence-based index running at 100% packing – which happens at low levels of concurrency – but this one ran at about 67% packing, which is a little below average for a typical B-tree index. Without a limiting value for maxtrans, though, the index degenerated to something like 35% average utilisation in a block – the index was physically much larger than it needed to be.
Time, of course, is likely to be the most important consideration and in this test it happened that most of the time difference was due to log file parallel writes, with some time lost due to buffer busy waits. The 8% increase in the redo size was basically due to the extra redo generated during each leaf block split combined with an increase in the number of splits.
In this case, especially after a little warm-up time, the best solution to the problem is simply to recreate the index as a reverse-key index and spread the concurrency across more leaf blocks. It’s a unique sequence-based index and the only use for the index is to answer the query “select single_row where id = constant” so key reversal isn’t going to change the data retrieval speed (provided you allow extra space in the buffer cache to keep the index buffered.)
If your index is time-based you might not want to take this approach because you’re more likely to want to run queries that cover a time range – and Oracle won’t use a reverse-key index on the query “select many_rows where reversed_column between constant1 and constant2″ - so setting maxtrans may be your best bet.
Having said that, I can safely say that I have occasionally come across indexes that are exactly this type of time-based index with concurrent inserts – so you might wonder why I haven’t seen the space-wasting phenomenon more often. I’ll tell you why in a future article. At the same time, I’ll talk about strategies for minimising the potential for damage if you have to put up with an index like this, and I’ll also tell you how I set maxtrans on an index in 10g … which ignores maxtrans!