I came across a blog entry recently with the title Myths about bitmap indexes which set about debunking the following three claims:
Myth 1 – Bitmap indexes take up a lot more space than B-Tree indexes.
Myth 2 – Bitmap indexes are only suitable for data warehouses because of the nature of the data
Myth 3 – Bitmap indexes have slow performance on DML actions
I have a few problems with anyone who tries to debunk these “myths” because they are essentially correct – although I have to say I’ve not heard them stated in exactly the way they appear above.
Given that the statements have a lot of truth in them, anyone trying to argue that they are myths – hence encouraging people to do something wrong – is doing a disservice to the Oracle community.
So let’s take the statements one at a time:
Do bitmap indexes take up a lot more space than B-Tree indexes ? Funnily enough I’ve heard the opposite (and also reasonably accurate ) statement more often, that “bitmap indexes are usually tiny compared to the corresponding B-tree index” and found that this sometimes encourages people to use them when they shouldn’t.
But it is fairly common for bitmap indexes to grow dramatically when the underlying tables are subject to DML. A single bitmap index entry can be as large as 3,900 bytes (in an 8K block), and changing the value of the indexed column for just one row of the table makes Oracle clone two such entries – which can cause two leaf block (node) splits, and each leaf block split could cause a branch block (node) split. Even in the absence of block splitting the volume of new index content results in a lot of undo and a lot of redo being generated. Try the following little test (in 8i or 9i; it will need a little editing):
create table t1 as select rownum id, mod(rownum,2) bit_col from all_objects -- or your favourite row source where rownum <= 10000 ; create unique index t1_pk on t1(id); create bitmap index t1_bi on t1(bit_col); -- gather statistics on the table validate index t1_bi; select name, height, lf_blks, lf_rows, btree_space, used_space from index_stats ; begin -- update 250 rows for i in reverse 9500..10000 loop update t1 set bit_col = 0 where id = i and bit_col = 1; end loop; commit; end; / validate index t1_bi; select name, height, lf_blks, lf_rows, btree_space, used_space from index_stats ;
Oracle 126.96.36.199 seems to have a bug in the validate command, but in 8i and all other versions of 9i, you should see that the index has grown from one block to about 1.8M, generating about 6M of redo as it did so.
[Edited 21st Nov, see note 13]: To my mind, this “myth” hasn’t reached mythic proportions, and if the statement were changed to “bitmap indexes can grow to take up a lot more space than the corresponding B*tree indexes as the underlying table is updated“ then it wouldn’t even qualify as a candidate for mythological status, it would simply be a statement of fact.
Note – This demonstration is not intened to suggest that bitmap indexes always grow, it is merely showing the possible effects of a small number single row updates. Moreover, 10g has taken some steps to minimise the damage that this specific type of activity causes. The 10g change is designed to address the problems of batch-like processes that do lots of single row updates; even so, it doesn’t reduce the impact on undo and redo.
What about “myth” 2, the one about bitmap indexes only being suitable for data warehouses because of the nature of the data ? Again I haven’t heard it stated this way. Most commonly I’ve heard the statement put the other way around: “bitmap indexes are usually inappropriate for OLTP systems”, and it’s not because of the nature of the data, it’s “because of the nature of the activity” (as we saw above … but you can still do the wrong type of processing on data warehouses and get massive bitmap index problems).
The author of the piece provides an example to show that bitmap indexes can be effective on an OLTP system. The code fragment includes the execution plans for a query against a ‘usage flag’ column which holds only the values ‘J’ (yes) or ‘N’ (no). The plan with a bitmap index in place uses the index, the plan with a B-tree in place uses a full tablescan.
There are three flaws in the demonstration. First, many systems which query for a small number of rows with a certain flag value often change the value after finding the rows – and see above what happens when you change bitmap indexed columns.
Secondly, most of the work you do in getting data from a table usually comes from visiting the table, not from visiting the index. Finding a small number of entries in an index usually takes 2 or 3 logical I/O whether the index is a bitmap or a B-tree: so the fact that the path changes from an indexed access to a full tablescan as you switch from B-tree to bitmap is more likely to be about the optimizer’s arithmetic than it is to be about real resource usage. (And the switch could go either way).
Finally, if you check the cardinalities reported in the plan you will see that the optmizer has predicted 22 rows in the case of the bitmap index, but 13,415 in the case of the B-tree. Why the difference ? My best guess is that there were 26,430 rows in the table of which 22 were set to ‘J’ and the rest set to ‘N’, and the bitmap test case had a histogram on the column while the B-tree case did not.
Here’s some test data to demonstrate the point:
create table t1 as with generator as ( select --+ materialize rownum id from all_objects where rownum <= 1000 ) select sysdate date1, 'N' flag1, 'N' flag2 from generator v1, generator v2 where rownum <= 26830 ; update t1 set flag1 = 'Y', flag2 = 'Y' where rownum <= 22; create index t1_i1 on t1(flag1); create bitmap index t1_b1 on t1(flag2);
On a 188.8.131.52 database, with an 8KB block size, and not using ASSM (automatic segment space management) a query for flagN = ‘Y’ used the index in both cases when I had histograms in place, and did a tablescan in both cases when I did not have histograms.
So let’s look at the third “myth”: do bitmap indexes have slow performance on DML. Again it’s not a myth, it’s true. In fact, it’s only part of the truth. As we saw in my first example DML on a bitmap indexed column can cause an excessive amount of change in the index, generating a huge amount of undo and redo. This does make serial DML slower – though not necessarily perceptibly slower in small cases.
However, all the excess garbage in the index will slow down simple queries. At the end of my example there were 250 (nearly identical) index chunks for the zero value, and another 250 chunks for the ones – in both cases 249 of the chunks were marked for deletion. A query for the zeros may have to walk through all 250 chunks to find the one chunk that is the correct chunk – possibly doing a load of extra work to check for read-consistency as it goes. (The 10g optimisation is that it doesn’t keep the full trail of chunks).
But the big problem with DML and bitmap performance is usually about contention, not about pure resource usage. In my original example I had one bitmap index chunk covering all 10,000 rows in the table. What’s going to happen if I update one of those rows (on the indexed column) and you update another – only one of us can change the bitmap index chunk, so the other update will have to wait. Bitmap chunks in an 8K block can cover about 30,000 rows – virtually any concurrent DML is going to lead to locking issues, and make deadlocking very easy.
One final thought: in the code demonstrating his case the author demonstrates an example where the update with a bitmap index in place is actually quicker than the corresponding update with the B-tree in place. However, the B-tree test uses a full tablescan to find the data, the bitmap example uses the indexed access path – and I’ve written a little note about the different update costs you get using different access mechanisms. This explains the difference in redo that the author highlighted in this case – so again the issue isn’t really about whether the index is a bitmap or not.
In his conclusion the author of the article states that he sees no reason not to use bitmap indexes in an OLTP environment. I do hope he changes his mind and pulls the article.