Bitmap Indexes

November 29, 2006

Bitmap Indexes

Filed under: Indexing,Infrastructure,Oracle — Jonathan Lewis @ 10:38 pm GMT Nov 29,2006

Coming a close second to the “rebuild indexes regularly” fairy tale is the “a column with a small number of distinct values is a good candidate for a bitmap index” legend.

I’ve had to explain the error in the legend three times in the last month, so I thought I’d just point to a series of articles I wrote some time ago that might save people from wasting too much effort the next time they start to play with bitmap indexes.

All four articles were written several years ago; so when you review them, remember the advice I gave in an earlier post.

Comments (19)

19 Comments »

Whenever I encounter someone who doesn’t really understand the proper application of bitmap indexes, I point them at that first article. In fact, I think I probably mentioned it to Lisa Dobson in preparation for her UKOUG presentation. It covers the principles extremely well, regardless of it’s age.

Comment by Doug Burns — November 29, 2006 @ 11:23 pm GMT Nov 29,2006 | Reply
I love all the three articles. I do point out them to my friends. I guess these are the only three articles which are discussing about bitmap indexes in depth.

Comment by CT — November 30, 2006 @ 1:48 am GMT Nov 30,2006 | Reply
I have to wonder if this is coincidence or if JL is cyberstalking me (yes, ME!) via all of my newbie posts on the dizwell forums. ;)

(I’m referring to http://www.dizwell.com/prod/node/423#comment-1169 )

Comment by Don Seiler — November 30, 2006 @ 3:58 pm GMT Nov 30,2006 | Reply
Don, I’m afraid you’ve just fallen into the the classic DBA error of confusing correlation and causality ;-) Just because I wrote about bitmap indexes when you were asking about bitmap indexes, that doesn’t mean I wrote about them because you asked about them.

For example, “I rebuilt this index and the query went faster” doesn’t mean that rebuildling indexes makes Oracle go faster; it doesn’t even mean that the specific query went faster because of the rebuild. It may mean that the automatic recalculation of statistics during the rebuild made the optimizer decide not to use the index.

Comment by Jonathan Lewis — November 30, 2006 @ 8:49 pm GMT Nov 30,2006 | Reply
Jonathan,

Thanks for the reply. Regarding my index problem, our statistics had been gathered as of early that morning. What had happened was that a new index was added to this table as a quick-fix for another long-running query. Once the system had gathered statistics on it (later that night), the CBO found it much more attractive than the old index. That is my interpretation anyway, as the timing of the new index and the query problems seemed to be more than coincidental.

What I did do was sit down with the lead dev and rebuild those indexes to handle their respective queries (I tried to solve all queries with one index but couldn’t see it working). I was able to even drop a third index that covered the first two columns of one of the rebuilt indexes.

And not to sound like a wide-eyed fanboi, I’m really enjoying your book so far, as well as the first bitmap index article.

Comment by Don Seiler — November 30, 2006 @ 9:01 pm GMT Nov 30,2006 | Reply
Don, Hmmm! Who’s cyber-stalking whom here ? 12 minutes after I comment on your comment, you’re commenting on my comment ;-)

I hadn’t got around to looking at your thread on Howard Rogers’ forum. But just to answer the point you raised there about column order: as a weak bias, columns that appear with range-based predicates (or are sometimes omitted from queries) should be moved towards the end of the index, columns that always appear with equality should be moved to the start of the index. The range (number of leaf blocks) of an index that you have to examine is dictated by the first gap, or range-based predicate.
(See chapter 4 in the book for more details)

Comment by Jonathan Lewis — November 30, 2006 @ 9:19 pm GMT Nov 30,2006 | Reply
Thanks! And I assure you that the timing of my response WAS coincidental and I just thought while taking a break from reading to see if you might have replied. I’ve since add the RSS feed for these comments to my google reader. :p

And thanks for that link, I’ll make sure to check up on the addenda and errata (gasp!) when I finish a chapter now.

Comment by Don Seiler — November 30, 2006 @ 10:04 pm GMT Nov 30,2006 | Reply
For the last couple of years I lived with “no need to rebuild” the index as a rule.
But in my latest project I come to experience an exception to this rule.
And the excpetion seems to have something to do with the “partition exchange”-operation.
After months of ETL loading and partition exchanging we found that out indexes were far too big and ineffective. The solution was to rebuild (once a month or so) them. Any Idea what happend and why?

Comment by Jonas Andersson — December 3, 2006 @ 7:49 pm GMT Dec 3,2006 | Reply
Jonas, you didn’t say whether these were local or global indexes, bitmap or b-tree, or whether you were exchanging new (hence empty) partitions with freshly load tables, or using the exchange as a way of adding data to existing partitions (copy old partition, append data, exchange…).

My best guess is that this refers to global indexes where there was pre-existing data in the partitions, and you were using the ‘maintain (global) indexes’ option, with the result that Oracle does a massive ‘direct path’ delete then insert on the indexes – which is exactly the type of operation where you are most likely to see waste space accumulating in the indexes.

Comment by Jonathan Lewis — December 3, 2006 @ 9:35 pm GMT Dec 3,2006 | Reply
@Doug,

“regardless of it’s age”

Remember JL’s other rules including, “If it’s more than 6 weeks old it is rubbish and complete and utterly wrong.” Or something like that, can’t remember precisely.

@Ct

“I do point out them to my friends”

Not sure whether to feel pity or jealousy.

Comment by Mark Brady — January 31, 2007 @ 6:13 pm GMT Jan 31,2007 | Reply
Mark, to jog your memory: https://jonathanlewis.wordpress.com/2006/11/13/trust/

At least, in this case, the article mentions a version a couple of times, supplies the script for a test table, shows how to do block dumps, and reproduces a couple of extracts from a block dump to support the comments. Thus allowing people to discover, and demonstrate conclusively, that in 10g the indexes behave differently when faced with a string of single-row updates – although the same cannot be said about the undo and redo costs.

Comment by Jonathan Lewis — February 1, 2007 @ 7:01 am GMT Feb 1,2007 | Reply
For those who are interested in what is behind the various implementations, wikipedia has a pretty good entry about the underlying bitmap indexing technologies. This article has more in-depth desciption of the current development in bitmap index research. It appeared in a book on data warehousing.

Comment by John Wu — November 22, 2007 @ 11:50 pm GMT Nov 22,2007 | Reply
Jonathan,

We would like to use “star_transformation_enabled = temp_disabled” for our data warehouse implementation but are concerned about the amount of memory that could be used during the star transformation. We would like to measure the amount of memory used during our pre-production performance testing and
estimate the memory that could be used for our production user volume. Is the memory used when temp tables are disabled included in the session’s PGA usage?

Thanks,
Margaret

Comment by Margaret Norman — April 9, 2009 @ 8:15 pm BST Apr 9,2009 | Reply
- Margaret,
  
  I believe the option for “temp_disabled” has nothing to do directly with memory usage; instead it changes the strategy used in the join back to the dimensions.
  
  If the dimension tables are large and your query is known to identify a relatively small fraction of the rows from a dimension then Oracle has the option of extracting those rows from the dimension and copying them into a global temporary table (which is specially defined by in-memory metadata). This means that the join back will probably do a hash join with a small GTT, rather than joining back to a large dimension.
  
  If you allow Oracle to use a GTT, then the memory usage ought to be blocks in the buffer cache, plus a PGA cost for buildling a hash table from the GTT.
  
  If you opt for temp_disable, the join back could be a nested loop (on primary key), a merge join (which could result in two sorts, one large fact sort and a smaller dimension sort), or a hash join (which would probably build a hash table from the dimension table). Every query could be difference, and it could even be that the data set from the fact table is much smaller than the dimension anyway – which means the extracted fact data might be the thing that Oracle uses to build the hash table.
  
  This doesn’t answer your question, but I hope it fills in some of the information around the edges.
  
  Comment by Jonathan Lewis — April 15, 2009 @ 10:39 am BST Apr 15,2009 | Reply
[…] Performance — Jonathan Lewis @ 5:33 pm UTC Oct 21,2009 It is fairly well-known that bitmap indexes are very dense structures that can behave badly if their underlying tables are subject to even […]

Pingback by Bitmap Updates « Oracle Scratchpad — October 21, 2009 @ 5:33 pm BST Oct 21,2009 | Reply
Hi Jonathan,

Thanks for your articles on BITMAP Indexes, they are very informational.

I have a question though,

example 1:
I am running simple distinct query on column which has bitmap index. table is non-partitioned. Theoretically, all distinct extries have an associated BITMAP so oracle has to just emit BITMAP headers. Then why is there a “HASH UNIQUE” step? Does this have anything to do with the fact that a BITMAP for a value is broken into multiple chunks.

example 2:
On same table, now I want to do “count(*)”. In this case also execution plan show same step “BITMAP INDEX FAST FULL SCAN”; I thought the operation is different in nature compared to previous distinct query. Earlier we were counting headers(example 1) so I was expecting to see some differentiation.

regards,

db version : DB 11Gr1, FX has ~4M rows; column c_id has 100 distinct values

select /*+ gather_plan_statistics */ distinct c_id from fx
Plan hash value: 693526871
—————————————————————————————————————————–
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
—————————————————————————————————————————–
| 1 | HASH UNIQUE | | 1 | 100 | 100 |00:00:00.01 | 731 | 1067K| 1067K| 1217K (0)|
| 2 | BITMAP INDEX FAST FULL SCAN| FX_BMP2 | 1 | 4163K| 1300 |00:00:00.01 | 731 | | | |
—————————————————————————————————————————–

example 2:
select /*+ gather_plan_statistics */ count(*) from fx
Plan hash value: 106145427
—————————————————————————————————
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
—————————————————————————————————
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.03 | 731 |
| 2 | BITMAP CONVERSION COUNT | | 1 | 4163K| 1300 |00:00:00.03 | 731 |
| 3 | BITMAP INDEX FAST FULL SCAN| FX_BMP2 | 1 | | 1300 |00:00:00.01 | 731 |
—————————————————————————————————

Comment by svdixit — November 20, 2009 @ 12:36 pm GMT Nov 20,2009 | Reply
- 1) I think your guess is correct – a single key value can cover so many rows that the index has to hold several entries for it; so you have to have the hash unique step to eliminate duplicates.
  
  2) The two plans are different – one of them counts key values, the other counts bits in the index entries (probably not really doing the conversion to rowids, but may it does). How different do you expect the plans to be ?
  
  Comment by Jonathan Lewis — November 25, 2009 @ 1:18 pm GMT Nov 25,2009 | Reply
[…] migration from Oracle as a replacement for bitmap indexes (without going into details, I provide a reference to Lewis's post for inquisitive minds). As a rule, bitmap indexes are used for fields that have few unique values, […]

Pingback by Indexes in PostgreSQL — 7 (GIN) : Postgres Professional — September 29, 2020 @ 7:02 am BST Sep 29,2020 | Reply
[…] from Oracle as a replacement for bitmap indexes (without going into details, I provide a reference to Lewis’s post for inquisitive minds). As a rule, bitmap indexes are used for fields that have few unique […]

Pingback by Indexes in PostgreSQL (GIN) – DBA Notes — October 21, 2021 @ 1:28 am BST Oct 21,2021 | Reply