Oracle Scratchpad

December 23, 2009

Btree / Bitmap

Filed under: Indexing,Infrastructure,Oracle,Tuning — Jonathan Lewis @ 8:21 pm GMT Dec 23,2009

In a recent ‘philosophy’ post I focused on the critical mental image that should be adopted when comparing B-tree and bitmap indexes. I was a little surprised, however, to discover that the idea I proposed needed further explanation. So here’s a note that expands on the original comment.

The point that caused confusion was simple: when designing B-tree indexes you tend to think in terms of high-precision access using a single index to identify a few rows, but when designing bitmap indexes you tend to assume that an individual bitmap index key is likely to identify a large number of rows but you aim to get precision by combining several bitmaps.

Putting this in terms of execution plans, compare the following single table plans that show typical of B-tree and bitmap access paths respectively:

---------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     5 |   230 |    8 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T1    |     5 |   230 |    8 |
| * 2 |   INDEX RANGE SCAN           | T1_N1 |     5 |       |    3 |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=6)

---------------------------------------------------------------------
| Id  | Operation                     | Name | Rows  | Bytes | Cost |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |     4 |   200 |   11 |
|   1 |  TABLE ACCESS BY INDEX ROWID  |   T1 |     4 |   200 |   11 |
|   2 |   BITMAP CONVERSION TO ROWIDS |      |       |       |      |
|   3 |    BITMAP AND                 |      |       |       |      |
| * 4 |     BITMAP INDEX SINGLE VALUE |   B3 |       |       |      |
| * 5 |     BITMAP INDEX SINGLE VALUE |   B2 |       |       |      |
| * 6 |     BITMAP INDEX SINGLE VALUE |   B1 |       |       |      |
---------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("BIT3"=10)
   5 - access("BIT2"=10)
   6 - access("BIT1"=10)

As you can see, both queries are supposed to access a few rows from the table, and both queries have a fairly low-cost. In the first case the optimizer has used a B-tree index that targeted the data very precisely (on average a single key value will identify five rows in this table); in the second case the optimizer has combined three bitmap indexes to identify four rows, and the cost of doing so is fairly small even though all three indexes were low-precision indexes (on average a single key value for any of the indexes would identify about 10,000 rows in this table).

Bitmap indexes tend to be very small, can be built relatively quickly, and can be combined very efficiently. We typically use them where the users can supply ad-hoc queries with predicates based on combination of several columns … and if we don’t know in advance which combinations will appear we create B-tree indexes on the high-precision columns and bitmap indexes on the low-precision columns.

I made the point in the original note that things were more subtle than the ideal put forward as the critical difference. Here’s an execution plan that would have appeared if I had created B-tree indexes instead of bitmap indexes in the second case above:

-------------------------------------------------------------------------
| Id  | Operation                         | Name | Rows  | Bytes | Cost |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |      |     4 |   200 |   60 |
|   1 |  TABLE ACCESS BY INDEX ROWID      |   T1 |     4 |   200 |   60 |
|   2 |   BITMAP CONVERSION TO ROWIDS     |      |       |       |      |
|   3 |    BITMAP AND                     |      |       |       |      |
|   4 |     BITMAP CONVERSION FROM ROWIDS |      |       |       |      |
| * 5 |      INDEX RANGE SCAN             |   I3 |       |       |   18 |
|   6 |     BITMAP CONVERSION FROM ROWIDS |      |       |       |      |
| * 7 |      INDEX RANGE SCAN             |   I2 |       |       |   18 |
|   8 |     BITMAP CONVERSION FROM ROWIDS |      |       |       |      |
| * 9 |      INDEX RANGE SCAN             |   I1 |       |       |   18 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("TREE3"=10)
   7 - access("TREE2"=10)
   9 - access("TREE1"=10)

In this case I’ve created compressed B-tree indexes (the treeN columns are copies of the bitN columns), and Oracle can use btree/bitmap conversion to good effect. The cost is higher, of course, because the indexes are bigger. But there’s a reason why we might prefer the larger, less efficient indexes. Bitmap indexes are efficient because they are very small, but that means they are very compact, which means they can introduce a lot of contention, in particular locking issues. It’s a classic case of the trade-off dictated by your application:

  • bitmap indexes are smaller and more efficient to use in many ways but their compactness can result in severe undo and redo overheads and  terrible lock contention if you expect to see frequent small concurrent data changes.
  • B-tree indexes are larger (which may result in more physical I/O) and less efficient to combine, and slower to build and rebuild for bulk loads, but they do avoid the locking problems, and don’t generate so much undo and redo on updates.

You may have to choose a “second best” (i.e. compromise) option because the option you would really like for queries introduces a problem for DML that your system can’t handle.

If you’re interested in comparing the sizing for this little demonstration, here are a few numbers about the indexes:

select
        index_name, blevel, num_rows, distinct_keys, leaf_blocks
from
        user_indexes
where
        table_name = 'T1'
order by
        index_name
;

INDEX_NAME  BLEVEL    NUM_ROWS   DISTINCT_KEYS LEAF_BLOCKS
---------- ---------- ---------- ------------- -----------
B1                  1        245            49         123
B2                  1        250            50         125
B3                  1        255            51         128
I1                  2     500000            49         769
I2                  2     500000            50         769
I3                  2     500000            51         769
T1_N1               2     500000        100000        1107

You can see that the bitmap indexes are significantly smaller than the B-tree indexes on the same data. In fact the data distribution in my example is one that results in the worst case (largest) possible bitmap indexes. The ordering of the data makes a much bigger impact on the size of a bitmap index than it does on a B-tree index. (If you’ve got my book Practical Oracle 8i, you may remember my demonstration of a table with 1 million rows and a column with 8 distinct values each appearing 125,000 times – depending on the order I created the data the bitmap index varied in size from 156KB to 1.12MB).

The variation in size of bitmap indexes can lead to some counter-intuitive conclusions. As another example of bypassing the basic philosophy of the btree/bitmap comparison, think about an OLTP table where a small number of related rows are inserted into the table simultaneously, generally don’t get deleted, and don’t see updates on critical columns.

Take for example an order_lines table – though I will point out shortly that it’s a bad example – where you insert an average of five order-lines for each order. The order lines all go in at once (with a well-written array insert), and the order number on those order-lines doesn’t change. Technically it would be perfectly feasible to create a bitmap index on the order_number column without paying any of the usual penalties of creating a bitmap index on an OLTP table.

Here are some figures from a model of such a table. It has 100,000 orders, five order lines per order where the five order lines for an order are physically adjacent in the table. Index C1 (C for clustering) shows the statistics for a bitmap index on the order_number column, index C2 shows the statistics for a B-tree index on the same column. The space saving could be a significant argument for using a bitmap index IF you don’t run into any of the classic bitmap problems.  (The difference become more pronounced the more “order_lines” you have per “order”.)

INDEX_NAME           BLEVEL     NUM_ROWS   DISTINCT_KEYS LEAF_BLOCKS
-------------------- ---------- ---------- ------------- -----------
C1                            1     100001        100001         369
C2                            2     500000        100001         921

And this is what the indexes looked like after inserting another 5,000 orders:

INDEX_NAME           BLEVEL     NUM_ROWS   DISTINCT_KEYS LEAF_BLOCKS
-------------------- ---------- ---------- ------------- -----------
C1                            1     105001        105001         389
C2                            2     525000        105001         962

This is an example of data pattern and usage where bitmap indexes could be efficient and non-threatening even in an OLTP system, with a high precision access path.

A word of warning for this particular example though: I did say that the order_lines table was actually a bad example, and I only used it to give a concrete image of the type of data pattern I was talking about. There are two reasons why you probably wouldn’t create the bitmap index in this case.

  • First, you would probably have a primary key index on the table that started with the order_number anyway, making the bitmap index redundant
  • Secondly, you would probably have a foreign key constraint relating the order_lines table to the orders table, and you might decide that you needed a supporting index on the order_lines table to avoid  locking issues. If you want such an index it has to be a B-tree index, again making the bitmap index in the example redundant.

Update Feb 2010: Richard Foote has recently published a lengthy blog item about bitmap indexes that contains some useful information and myth-busting.

5 Comments »

  1. […] 15-Compressed Btree indexes vs Bitmap indexes Jonathan Lewis-Btree/Bitmap […]

    Pingback by Blogroll Report – 18/12/2009-25/12/2009 « Coskan’s Approach to Oracle — January 6, 2010 @ 7:45 pm GMT Jan 6,2010 | Reply

  2. I’m trying to understand if the btree index will help in the case of column with only 2 distinct values in oltp database with lots of updates happening on the table.
    Also, how does btree index works on the columns with low number of distinct values.

    Thanks,
    Gary

    Comment by Gary — January 6, 2010 @ 10:35 pm GMT Jan 6,2010 | Reply

    • Gary,

      There are various ways you might have “only 2 distinct values” – so the question comes back to the basic one of indexing: “will the index eliminate more work than it introduces (at the point in time when it really matters) ?”

      There’s also the question of whether you have to stick with the current SQL, or whether you can enigineer the code and the indexes at the same time.

      For example:

        If your two values are ‘N’ and ‘Y’, and nearly all the data has value ‘Y’, then an index on that column could be very helpful for the query “select where col = ‘N'” – but only if you have a histogram on that column.

        On the other hand, if you can change the code and data so that it holds ‘N’ or null then you get a tiny index and don’t need the histogram.

        On the third hand – if you create the index as (decode(col,’N’.N’,null)) you don’t have to change the code that maintains the data, you get a tiny index, you don’t need the histogram, but you have to modify the code that queries the data so that the queries match the index definition.

      Comment by Jonathan Lewis — January 7, 2010 @ 8:00 am GMT Jan 7,2010 | Reply

  3. “Secondly, you would probably have a foreign key constraint relating the order_lines table to the orders table, and you might decide that you needed a supporting index on the order_lines table to avoid locking issues.”

    Of course this is generally quite reasonable. But in the given case – no deletes on the parent table, no updates of order_number – I think you shouldn’t be that much afraid of locking issues due to FK on order_number.

    Comment by Todor Botev — January 29, 2010 @ 8:15 am GMT Jan 29,2010 | Reply

  4. […] 23rd Dec 2009: I’ve written a follow-up article to this note since the point I was trying to make seemed to cause some […]

    Pingback by Philosophy – 8 | Oracle Scratchpad — January 20, 2020 @ 12:03 pm GMT Jan 20,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.