While checking my backlog of drafts (currently 75 articles in note form) I came across this one from August 2009 and was a little upset that I hadn’t finished it sooner – it’s a nice example of geek stuff that has the benefit of being useful.
From the Oracle newsgroup comp.databases.oracle.server, here’s an example of how to recreate a performance problem due to maintenance on ASSM bitmaps in 10.2.0.4.
Create a table in a tablespace with an 8KB block size, locally managed tablespace with uniform 1MB extents, and automatic segment space management (ASSM). Check the newsgroup thread if you want complete details on reproducing the test:
Session 1: Insert 100,000 rows of about 490 bytes into a table using a pl/sql loop and commit at end.
Session 1: Insert 1,000 rows into the table with single SQL inserts and no commits
Session 1: delete all data from the table with a single statement – but do not commit
Session 2: Insert 1,000 rows into the table with single SQL inserts and no commits – it’s very slow.
As one person on the thread pointed out – it looks as if Oracle is doing a full tablescan of the table, one block at a time showing “db file sequential read” waits all the way through the table. (If your db_cache_size is large enough you might not see this symptom).
I simplified the test – inserting just 100,000 of the rows (with the commit), then deleting them all (without the commit), then inserting one row from another session. Taking a snapshot of x$kcbsw and x$kcbwh, I got the following figures for the activity that took place inserting that one extra row (this was on Oracle 10.2.0.3):
--------------------------------- Buffer Cache - 23-Dec 11:50:36 Interval:- 0 seconds --------------------------------- Why0 Why1 Why2 Other Wait ---- ---- ---- ---------- 1,457 0 0 0 ktspfwh10: ktspscan_bmb 8 0 0 0 ktspswh12: ktspffc 1 0 0 0 ktsphwh39: ktspisc 7,061 0 0 0 ktspbwh1: ktspfsrch 1 0 0 0 ktuwh01: ktugus 7,060 0 0 0 ktuwh05: ktugct 1 0 0 0 ktuwh09: ktugfb 2 0 0 0 kdswh02: kdsgrp 2 0 0 0 kdiwh06: kdifbk 2 0 0 0 kdiwh07: kdifbk ---- ---- ---- ---------- 15,595 0 0 0 Total: 10 rows
The figures tell us how much work Oracle has to do to find a table block that could accept a new row. The idea is simple – Oracle checks the first “level 3″ bitmap block (which is actually part of the segment header block) to find a pointer to the current “level 2″ bitmap block; it checks this level 2 bitmap block to find a pointer to the current “level 1″ bitmap block; and finally it checks the “level 1″ bitmap block to find a pointer to a data block that shows some free space.
Unfortunately every block in our table is apparently empty – but that’s only going to be true once session 1 commits. In this version of Oracle the blocks are all visible as “x% free” in the level 1 bitmaps – but when Oracle visits each block (“ktspbwh1: ktspfsrch”) it checks the ITL entry, which points it to the transaction table slot in the related undo segment header block to Get the Commit Time for the transaction (“ktuwh05: ktugct”) and finds that the transaction is not committed so the space is not really free. So Oracle has to visit the next block shown as free in the bitmap.
In our “bulk delete / no commit” case, we end up visitng every (or nearly every) block in the entire table before we find a block we can actually use – and, given the nature of the ASSM bitmap implementation, the order of the block visits is the “natural” table order, so we see something that looks like a full tablescan operating through single blocks reads (apart, perhaps, from a few blocks that are still cached).
I can’t explain why we do 1,457 visits to bitmap blocks (“ktspfwh10: ktspscan_bmb”) in this version of Oracle, but perhaps it’s simply an indication that Oracle picks (say) five “free block” entries from the bitmap block each time it visits it and therefore has to visit each bitmap block about 12 times if it doesn’t find a data block with space that really is free in it search. This may be a compromise between two possible extremes in “bad” cases – having to hold it while it checking every single entry (causing long buffer busy waits), or having to get it again after every single failed check (potentially causing Cache Buffers Chains problems).
Note – these results will be hugely dependent on version of Oracle – in an earlier version of Oracle the bitmap blocks were not updated by the delete until some time after the commit – and this variation of delayed block cleanout produced other unpleasant anomalies; and, just to make like difficult in later versions of Oracle, the x$kcbsw / x$kcbwh objects are not populated properly in 11g.
Footnote: In case you hadn’t realised, ASSM is a mechanism aimed at OLTP systems with a reasonable degree of concurrency – so it’s not too surprising that you can find problems and weak spots if you hit it with processing which is biased towards the DW and batch processing end of the spectrum.
Update (April 2015)
The problem has been resolved in 126.96.36.199 – a thread that came up on the OTN database forum recently looked as if it might have been an example of the problem, and the owner of the problem originally described in the newsgroup joined in the discussion, read the note I had made linking back to this blog, and re-ran their test case in 188.8.131.52 and 184.108.40.206, reporting that the problem was still there in 220.127.116.11 but gone in 18.104.22.168.