Shortly after I published the previous post, I received an email from an old client who reminded me of a problem he had had with empty index blocks, the solution we had used (which was a coalesce) and an update on how things had changed in the latest releases of Oracle.
Strangely, I was sure I had written a note about his problem, but I couldn’t find anything about it until I did a search on the Oracle Forum and found this thread where someone seems to have suffered from the same sort of issue.
The problem is that when an index suffers a massive delete – which could have accumulated over time, of course – it can end up with a lot of empty leaf blocks which cause an unexpected problem in the future. My summary of the issue is in the thread at this posting.
After a massive delete, you may have a lot of empty leaf blocks in the index. These blocks are left in the b-tree structure AND attached to the freelist.
On a subsequent insert, if a process needs to split a block it takes the top block off the free list and shares the data from the existing block between the old and new blocks; after which the two blocks have various pointers corrected.
However, there are cases where the splitting process will discover that the block it took from the free-list was also in a position in the index structure that made it an illegal choice. It only discovers this after doing all the work of splitting and relinking the blocks, and so has to roll back that work – which shows up as a ‘transaction rollback’ in v$sysstat – and try another block.
If you have done a massive delete, you could find that one process COULD work through a couple of hundred “not quite free enough” blocks before finding one it can use. This could (a) require lots of db file sequential reads to pick up each block in turn (b) cause other transactions to wait on a TX/4 for each of the blocks in turn.
I have seen this happen on a couple of live systems. I have not yet been able to emulate it on a test system because I can’t figure out the rules that cause the leaf block to be a reject.
A work around to the problem is to do an ‘alter index coalesce’ after doing the massive delete. This cleans all the empty blocks and takes them out of the index structure. (You have to have done a commit after the delete for this to work).
I don’t thnk you can spot this from the statistics – one long catastrophe appears as a single event in the stats. I think it might also be recorded as a hidden statistics in v$segstat, the ‘ITL service wait’.
Since I posted to that thread, though, things have moved on. My old client has upgraded to 10.2.0.3 where the “bug” has been fixed, and a new statistic (“failed probes on index block reclamation”) has been created to count the occurrences of the event.
Things still aren’t perfect unfortunately. The client has a small index which sometimes grows just a little bigger than it ought to and suffers a root block split to introduce a new index level. At present, they rebuild this index on a weekly basis to make sure that the index doesn’t (or is much less likely to) reach the critical size where the index root block splits. Unfortunately, when the root block split occurs, they still suffer a temporary blockage of their system. (Somewhere I’m sure I wrote a note suggesting that a rebuld like this that kept a small index to the minimum level could be a good idea because of a detail of the optimizer arithmetic – but I can’t find it! I’d have to update it if I could).
After reading the email, I had a little search around Metalink, and found a couple of interesting bugs in the same area of code, in particular bug 5764993.8:
During an index block split a session may wait for one second unnecessarily if it already holds the required block in S mode and there are other holders in S mode – If it requests the block in X mode and the other S holders release their block pin/s then in this case the requesting (X) process is not posted so does not realize it can proceed until the “buffer busy wait” times out.
Another bug number gave a little summary of what the ASH (V$active_session_history) output might look like when this happened:
BLCKNG TIME SESSION SESSION EVENT P1 P2 P3 WAITED ------- ------- -------------------------- ---------- --------- ----- ------ 543 530 enq: TX - index contention 1415053316 2359308 2830 991957 543 555 enq: TX - index contention 1415053316 2359308 2830 992447 543 543 buffer busy waits 12 880671 1 992274
Note how session 543 is both blocker and blocked in the bottom line. It is also the session that is blocking everyone else who is stuck in the ‘index transaction’ wait.
One of the things I really like about the blog environment is that other people can add value – this was a wonderful example of how the interaction can work well; so thanks, “old client”.