Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.
There’s a good reason why this can be the case. No prizes for working out what it is – and I’ll supply an answer in a couple of days time. (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).
As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:
create table t1 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum id, mod(rownum,1000) btree_col, mod(rownum,1000) bitmap_col, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1e6 ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; / create index t1_btree on t1(btree_col) nologging; -- create bitmap index t1_bitmap on t1(bitmap_col) nologging;
You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:
execute snap_my_stats.start_snap insert /* append */ into t1 with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select 1e6 + rownum id, mod(rownum,1000) btree_col, mod(rownum,1000) bitmap_col, rpad('x',100) padding from generator ; execute snap_my_stats.end_snap
You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:
18.104.22.168 with btree =================== Name Value ---- ----- session logical reads 31,403 DB time 64 db block gets 31,195 consistent gets 208 db block changes 21,511 redo entries 10,873 redo size 3,591,820 undo change vector size 897,608 sorts (memory) 2 sorts (rows) 1 22.214.171.124 with bitmap ==================== Name Value ---- ----- session logical reads 13,204 DB time 42 db block gets 8,001 consistent gets 5,203 db block changes 5,911 redo entries 2,880 redo size 4,955,896 undo change vector size 3,269,932 sorts (memory) 3 sorts (rows) 10,001
As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.
The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering. As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.
The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem. On a large insert, the table data will be arriving, the b-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.
It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them. And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.
You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path. If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.
In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.