Oracle Scratchpad

October 26, 2009

Philosophy – 8

Filed under: Philosophy — Jonathan Lewis @ 8:33 pm GMT Oct 26,2009

Btree indexes vs. Bitmap indexes – the critical difference

  • A single B-tree index allows you to access a small amount of data very precisely.
  • It is the combination of a subset of the available bitmap indexes that offers the same degree of precision.

You should not be comparing the effectiveness of a bitmap index with the effectiveness of a b-tree index.

(Inevitably it’s a little more subtle than this – you may create some low-precision b-tree indexes to avoid foreign key locking issues,the optimizer can combine b-tree indexes, and so on - but if you start from this basis you will have a rational view about how to use bitmap indexes).

Footnote: remember, also, that bitmap indexes introduce massive concurrency issues and other maintenance overheads – if you see them in an OLTP system it’s very likely that they’re causing problems.

Update 23rd Dec 2009: I’ve written a follow-up article to this note since the point I was trying to make seemed to cause some confusion.

[The Philosophy Series]

9 Comments »

  1. Jonathan,

    Could you please clarify what do you mean by:
    >It is the combination of a subset of the available bitmap indexes that offers the same degree of precision.
    I’ve read it several times – and still don’t get it.

    Comment by Timur Akhmadeev — October 26, 2009 @ 9:25 pm GMT Oct 26,2009 | Reply

  2. What I think he’s saying is that creating a bitmap index on something like “gender” (M/F) is not going to be too helpful if the data is split evenly between M and F. You still need to retrieve half the rows in the table. On the other hand, if you have additional columns with low cardinality e.g. “hair color”, “eye color”, etc. then it *might* make sense to build bitmap indexes on all three columns. Or not.

    At least that’s what I think he meant.

    Comment by Joseph Charpak — October 26, 2009 @ 10:14 pm GMT Oct 26,2009 | Reply

    • Joseph,

      thank you. Your version sounds reasonable, but it doesn’t fit to “generic” claim – there’s no a word about is this multi-column indexes specific only or not. My first (and the only one) guess was something like this:

      drop table t cascade constraints purge;
      create table t as 
      select mod(rownum, 1e4) x, lpad(rownum, 10, '0') pad from dba_objects, dba_objects 
       where rownum <= 1e6;
      create index t_indx on t(x);
      exec dbms_stats.gather_table_stats(user, 't', estimate_percent=>null,method_opt=>'for all columns size 1');
      select /*+ gather_plan_statistics */ * from t where x in (1, 2);
      @x
      
      drop index t_indx;
      create bitmap index t_indx2_01 on t(decode(x, 1, x));
      create bitmap index t_indx2_02 on t(decode(x, 2, x));
      exec dbms_stats.gather_table_stats(user, 't', estimate_percent=>null,method_opt=>'for all hidden columns size 1')
      select /*+ gather_plan_statistics */ * from t where decode(x, 1, x) = 1 or decode(x, 2, x) = 2;
      @x
      
      -------------------------------------------------------------------------------------------------
      | Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
      -------------------------------------------------------------------------------------------------
      |   1 |  INLIST ITERATOR             |        |      1 |        |    200 |00:00:00.01 |     207 |
      |   2 |   TABLE ACCESS BY INDEX ROWID| T      |      2 |    200 |    200 |00:00:00.01 |     207 |
      |*  3 |    INDEX RANGE SCAN          | T_INDX |      2 |    200 |    200 |00:00:00.01 |       7 |
      -------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         3 - access(("X"=1 OR "X"=2))
      
      -----------------------------------------------------------------------------------------------------
      | Id  | Operation                    | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
      -----------------------------------------------------------------------------------------------------
      |   1 |  TABLE ACCESS BY INDEX ROWID | T          |      1 |   1000K|    200 |00:00:00.01 |     106 |
      |   2 |   BITMAP CONVERSION TO ROWIDS|            |      1 |        |    200 |00:00:00.01 |       4 |
      |   3 |    BITMAP OR                 |            |      1 |        |      1 |00:00:00.01 |       4 |
      |*  4 |     BITMAP INDEX SINGLE VALUE| T_INDX2_01 |      1 |        |      1 |00:00:00.01 |       2 |
      |*  5 |     BITMAP INDEX SINGLE VALUE| T_INDX2_02 |      1 |        |      1 |00:00:00.01 |       2 |
      -----------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         4 - access("T"."SYS_NC00003$"=1)
         5 - access("T"."SYS_NC00004$"=2)
      

      Comment by Timur Akhmadeev — October 27, 2009 @ 9:18 am GMT Oct 27,2009 | Reply

  3. Hi jonathan,

    It is very important that you included a Footenote to Philosophy8. I believe that before disussing the critical difference between B-tree indexes and Bitmap Indexes, we have first to discuss about the situations where itmap indexes are dangerous to implement.

    In this context, I have one question,

    Are there situations in mutli-user OLTP applications where Bitmap indexes can be safely implemented so that one can start comparing their benefit versus B-tree indexes benefit?

    Mohamed Houri

    Comment by Mohamed Houri — October 27, 2009 @ 7:45 am GMT Oct 27,2009 | Reply

    • Mohamed,

      Here is a scenario, that I experienced recently, where (I thought) bitmap indexes might have been useful in “multi-user OLTP” application.
      This system is used by many users and they do perform many small DMLs every day (hence qualifies the “multi-user OLTP” definition). However, this system has a set of tables, which act as base tables for a search algorithm (that is implemented using PL/SQL). These base tables are refreshed daily/weekly by a batch process. Many users concurrently perform searches (using the developed search algorithm) against these base tables (but only SELECTS and no INSERTs/UPDATEs/DELETEs). The search algorithm involves many queries that access few rows but on different combinations of columns in WHERE clause.
      So, even though, the overall nature of this application is “multi-user OLTP”, this feature (search) in particular, acts like a “DSS” and can be benifited by having bitmap indexes on individual columns, instead of multiple composite B*Tree indexes, having same columns but in different order.

      Comment by Narendra — October 27, 2009 @ 11:03 am GMT Oct 27,2009 | Reply

  4. After 15 years OLTPying, I may testify that what ever good intentions and super relational data model you possess, security procedure and logon trigger implemented, the very moment where you introduce a bitmap index into your OLTP, somebody, somewhere in a near or remote future, will manage to lock a row covered by this bitmap and your production will hang. Seems a Universal law.

    Comment by Bernard Polarski — October 27, 2009 @ 10:10 am GMT Oct 27,2009 | Reply

  5. I’d like to just add into the mix that if you are thinking a bitmap index might be useful, but are put off by the fact you are multi-user-OLTP-ish, then think compressed b*tree indexes instead. For relatively low cardinality columns, a compressed b*tree index will be pretty small, just like a bitmap index will be. And you’ll pay a CPU price for the compression (or rather, the decompression required to read the darn’d thing). But you won’t pay a thumping great big row locking issue. It may be a better bet is all I’m saying.

    Comment by Howard Rogers — October 27, 2009 @ 12:01 pm GMT Oct 27,2009 | Reply

    • I agree; in fact, with the introduction of the btree/bitmap conversion mechanism a collection of single column b-tree indexes (which I once viewed as an indication of a design flaw) can be a very smart move to avoid creating a large number of multi-column indexes.

      (The transformation and strategy is only available for enterprise edition, of course, see this note).

      Comment by Jonathan Lewis — November 3, 2009 @ 11:24 pm GMT Nov 3,2009 | Reply

  6. I thought that the point I made in this note was fairly straightforward, but it has obviously caused some confusion – so I’ll be writing another blog entry in a few days to expand on the concept given here.

    Comment by Jonathan Lewis — November 3, 2009 @ 11:18 pm GMT Nov 3,2009 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,430 other followers