Oracle Scratchpad

December 19, 2006

Mything in action

Filed under: Execution plans,Indexing,Infrastructure,Performance — Jonathan Lewis @ 8:03 pm GMT Dec 19,2006

I came across a blog entry recently with the title Myths about bitmap indexes which set about debunking the following three claims:

Myth 1 – Bitmap indexes take up a lot more space than B-Tree indexes.
Myth 2 – Bitmap indexes are only suitable for data warehouses because of the nature of the data
Myth 3 – Bitmap indexes have slow performance on DML actions

I have a few problems with anyone who tries to debunk these “myths” because they are essentially correct – although I have to say I’ve not heard them stated in exactly the way they appear above.

Given that the statements have a lot of truth in them, anyone trying to argue that they are myths – hence encouraging people to do something wrong – is doing a disservice to the Oracle community.

So let’s take the statements one at a time:

Do bitmap indexes take up a lot more space than B-Tree indexes ? Funnily enough I’ve heard the opposite (and also reasonably accurate ) statement more often, that “bitmap indexes are usually tiny compared to the corresponding B-tree index” and found that this sometimes encourages people to use them when they shouldn’t.

But it is fairly common for bitmap indexes to grow dramatically when the underlying tables are subject to DML. A single bitmap index entry can be as large as 3,900 bytes (in an 8K block), and changing the value of the indexed column for just one row of the table makes Oracle clone two such entries – which can cause two leaf block (node) splits, and each leaf block split could cause a branch block (node) split. Even in the absence of block splitting the volume of new index content results in a lot of undo and a lot of redo being generated. Try the following little test (in 8i or 9i; it will need a little editing):

create table t1 as
select
	rownum		id,
	mod(rownum,2)	bit_col
from
	all_objects		-- or your favourite row source
where
	rownum <= 10000
;                      

create unique index t1_pk on t1(id);
create bitmap index t1_bi on t1(bit_col);                      

-- gather statistics on the table                      

validate index t1_bi;                      

select
	name, height, lf_blks, lf_rows, btree_space, used_space
from
	index_stats
;                      

begin	-- update 250 rows
	for i in reverse 9500..10000 loop
		update t1
		set bit_col = 0
		where
			id = i
		and	bit_col = 1;
	end loop;
	commit;
end;
/                      

validate index t1_bi;                      

select
	name, height, lf_blks, lf_rows, btree_space, used_space
from
	index_stats
;

Oracle 9.2.0.8 seems to have a bug in the validate command, but in 8i and all other versions of 9i, you should see that the index has grown from one block to about 1.8M, generating about 6M of redo as it did so.

[Edited 21st Nov, see note 13]: To my mind, this “myth” hasn’t reached mythic proportions, and if the statement were changed to “bitmap indexes can grow to take up a lot more space than the  corresponding B*tree indexes as the underlying table is updated then it wouldn’t even qualify as a candidate for mythological status, it would simply be a statement of fact.

Note –  This demonstration is not intened to suggest that bitmap indexes always grow, it is merely showing the possible effects of a small number single row updates. Moreover, 10g has taken some steps to minimise the damage that this specific type of activity causes. The 10g change is designed to address the problems of batch-like processes that do lots of single row updates; even so, it doesn’t reduce the impact on undo and redo.

What about “myth” 2, the one about bitmap indexes only being suitable for data warehouses because of the nature of the data ? Again I haven’t heard it stated this way.  Most commonly I’ve heard the statement put the other way around: “bitmap indexes are usually inappropriate for OLTP systems”, and it’s not because of the nature of the data, it’s “because of the nature of the activity” (as we saw above …  but you can still do the wrong type of processing on data warehouses and get massive bitmap index problems).

The author of the piece provides an example to show that bitmap indexes can be effective on an OLTP system. The code fragment includes the execution plans for a query against a ‘usage flag’ column which holds only the values ‘J’ (yes) or ‘N’ (no). The plan with a bitmap index in place uses the index, the plan with a B-tree in place uses a full tablescan.

There are three flaws in the demonstration. First, many systems which query for a small number of rows with a certain flag value often change the value after finding the rows – and see above what happens when you change bitmap indexed columns.

Secondly, most of the work you do in getting data from a table usually comes from visiting the table, not from visiting the index. Finding a small number of entries in an index usually takes 2 or 3 logical I/O whether the index is a bitmap or a B-tree: so the fact that the path changes from an indexed access to a full tablescan as you switch from B-tree to bitmap is more likely to be about the optimizer’s arithmetic than it is to be about real resource usage. (And the switch could go either way).

Finally, if you check the cardinalities reported in the plan you will see that the optmizer has predicted 22 rows in the case of the bitmap index, but 13,415 in the case of the B-tree. Why the difference ? My best guess is that there were 26,430 rows in the table of which 22 were set to ‘J’ and the rest set to ‘N’, and the bitmap test case had a histogram on the column while the B-tree case did not.

Here’s some test data to demonstrate the point:

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 1000
)
select
	sysdate			date1,
	'N'			flag1,
	'N'			flag2
from
	generator	v1,
	generator	v2
where
	rownum <= 26830
;                

update t1 set
	flag1 = 'Y',
	flag2 = 'Y'
where
	rownum <= 22;                

create        index t1_i1 on t1(flag1);
create bitmap index t1_b1 on t1(flag2);                

On a 9.2.0.8 database, with an 8KB block size, and not using ASSM (automatic segment space management) a query for flagN = ‘Y’ used the index in both cases when I had histograms in place, and did a tablescan in both cases when I did not have histograms.

So let’s look at the third “myth”: do bitmap indexes have slow performance on DML. Again it’s not a myth, it’s true. In fact, it’s only part of the truth. As we saw in my first example DML on a bitmap indexed column can cause an excessive amount of change in the index, generating a huge amount of undo and redo. This does make serial DML slower – though not necessarily perceptibly slower in small cases.

However, all the excess garbage in the index will slow down simple queries. At the end of my example there were 250 (nearly identical) index chunks for the zero value, and another 250 chunks for the ones – in both cases 249 of the chunks were marked for deletion. A query for the zeros may have to walk through all 250 chunks to find the one chunk that is the correct chunk – possibly doing a load of extra work to check for read-consistency as it goes. (The 10g optimisation is that it doesn’t keep the full trail of chunks).

But the big problem with DML and bitmap performance is usually about contention, not about pure resource usage. In my original example I had one bitmap index chunk covering all 10,000 rows in the table. What’s going to happen if I update one of those rows (on the indexed column) and you update another – only one of us can change the bitmap index chunk, so the other update will have to wait.  Bitmap chunks in an 8K block can cover about 30,000 rows – virtually any concurrent DML is going to lead to locking issues, and make deadlocking very easy.

One final thought: in the code demonstrating his case the author demonstrates an example where the update with a bitmap index in place is actually quicker than the corresponding update with the B-tree in place. However, the B-tree test uses a full tablescan to find the data, the bitmap example uses the indexed access path – and I’ve written a little note about the different update costs you get using different access mechanisms. This explains the difference in redo that the author highlighted in this case – so again the issue isn’t really about whether the index is a bitmap or not.

In his conclusion the author of the article states that he sees no reason not to use bitmap indexes in an OLTP environment. I do hope he changes his mind and pulls the article.

21 Comments »

  1. “Mything in Action”?

    Oh, for shame …

    Comment by David Aldridge — December 20, 2006 @ 2:38 am GMT Dec 20,2006 | Reply

  2. Excellent!

    Comment by Mirjana — December 20, 2006 @ 7:18 am GMT Dec 20,2006 | Reply

  3. Can changing the 8K block size to a lower one resolve some contention?
    Or is better to have a bigger PCFREE on the index? (indeed I’m thinking in a “B-Tree way” even if the index is bitmap…uhm…)

    Comment by Antonio — December 20, 2006 @ 7:55 am GMT Dec 20,2006 | Reply

  4. Quote:

    “On a 9.2.0.8 database, with an 8KB block size, and not using ASSM a query for flagN = ‘Y’ used the index in both cases when I had histograms in place, and did a tablescan in both cases when I did not have histograms.”

    (How) does using ASSM change this?

    Thanks,

    Colin

    Comment by Colin 't Hart — December 20, 2006 @ 9:42 am GMT Dec 20,2006 | Reply

  5. Excellent ! i agree with you.

    Comment by Sachin — December 20, 2006 @ 12:16 pm GMT Dec 20,2006 | Reply

  6. Many thanks indeed Jonathan.
    How a little knowledge can indeed be a dangerous thing.

    Comment by SeanMacGC — December 20, 2006 @ 3:08 pm GMT Dec 20,2006 | Reply

  7. Colin, When more free space is needed, ASSM (automatic segment space management) formats 16 (consecutive) blocks from somewhere in the current extent – but the choice of 16 is somewhat randomised. This does make it harder to create exactly reproducible test cases, so I tend to avoid it for demonstrations of this type.

    In passing – the first time I ran this test it was on an early version of 9.2 with the bitmap index in an ASSM tablespace, the index segment grew to 650MB before crashing the session with an ‘out of space’ error.

    Antonio, using a 2K block size, you can still find each index entry covering several thousand consecutive rows – so it’s not likely to be a big help. However (on a completely different tack) if you are doing an occasional bulk update it is possible to survive with a bitmap index in place in which case I usually suggest setting PCTFREE to 67% as this helps to keep the index size stable (lots of empty space left in each block for the clones of the index entries) even though it makes the indexes larger then their optimum.

    Comment by Jonathan Lewis — December 20, 2006 @ 6:26 pm GMT Dec 20,2006 | Reply

  8. Jonathan, do you see much use of compressed b-tree indexes in your travels? I use them in DW for columns at the upper edge of bitmap index cardinality limits (say 1,000,000 rows and 300,000 distinct values), but I suppose there is an increased chance of contention in an OLTP system by virtue of packing more rows into fewer blocks.

    Comment by David Aldridge — December 20, 2006 @ 7:29 pm GMT Dec 20,2006 | Reply

  9. David, I rarely see any of the structural options (clusters, IOTs, etc.) being used to improve efficiency. But I often see cases where compressed indexes are an ‘obvious’ choice – albeit somewhere down the list in terms of significance. You are, of course, correct that there is a potential concurrency threat from packing more rows into a single block that should always be part of the risk assessment when considering the change. Apart from the time/sequence-based indexes, though, this is won’t be a show-stopper very often.

    Funnily enough I was on a site a couple of weeks ago where my first bit of advice was to build a particular index, and the second bit was to make sure it was compressed on the first column: with compression it used 1GB, without compression it used 2GB. 98% of the performance benefit came from the existence, an extra 2% from the compression.

    Comment by Jonathan Lewis — December 20, 2006 @ 8:57 pm GMT Dec 20,2006 | Reply

  10. Jonathan, thank you so much for these posts. One thing I’ve noticed lately is the plethora of guru’s who claim one thing over another. I appreciate you taking the time to write what, how, and provide examples and important information like the note on ASSM.

    On a side note, I take out of my day, everyday to read (or re-read) one of your posts. I consider it like “DBA Class”. I hope one day I can contribute to the Oracle community some small % of what I get here (I feel bad purely “consuming” information from you website and not adding anything.)

    Please keep up the good work.

    Comment by RobH — December 21, 2006 @ 6:30 pm GMT Dec 21,2006 | Reply

  11. Excellent article demonstrating the below:
    “bitmap indexes are usually inappropriate for OLTP systems”

    how about:
    “btree indexes in a Data warehouse?” – we are moving to Standard edition on our warehouse. Bitmap indexes are not supported with standard edition – so the plan is to convert it to btree and run some performance regression tests.

    Comment by vidya — January 3, 2007 @ 5:39 pm GMT Jan 3,2007 | Reply

  12. Vidya, at first glance it seems like a bad idea. The optimizer can do a “B-tree to bitmap conversion” in real time (though I haven’t checked if this is blocked in Standard Edition – can anyone confirm) so the precision of access into the table need not change when you change from bitmap to B-tree indexes. HOWEVER …

    … the size of a B-tree index is often vast compared to the size of the corresponding bitmap index, and this can have two side effects.

    First – the difference is size can make a dramatic difference to the amount of memory needed to buffer the more useful indexes – so a switch to B-tree indexes can result in far more physical I/O being needed. (Note: we are talking sizing factors of 10 to 100, not the odd 10% to 90% expansion that can occur as B-tree indexes degenerate and stablise under typical OLTP operations).

    Secondly – because the B-tree indexes are relatively huge, the optimizer is less likely to decide to use “just one more index” in its list of indexes to be combined (See chapter 8 of Cost Based Oracle – Fundamentals for more details); so the access paths into the tables are likely to be less precise, resulting in more random I/Os to the table which probably means more physical I/O, and slower response times.

    Comment by Jonathan Lewis — January 3, 2007 @ 6:59 pm GMT Jan 3,2007 | Reply

  13. Jonathan , thank you so much for taking the time to respond ; we are expecting a little slowdown in performance, would that be significant? we dont know at this point(hopefully we will know that soon from our regression tests) – the $ amount difference between Std and Enterprise Edition is significant for our systems here and we will need a very good reason if we move up to Enterprise.
    Thanks again.

    Comment by vidya — January 5, 2007 @ 4:14 pm GMT Jan 5,2007 | Reply

  14. Vidya, Unless you have a massive amount of memory available, I would expect a dramatic increase in the total I/O and a significant drop in performance.

    You could do some damage limitations by using index compression – any columns that were good candidates for bitmap indexes will probably compress quite well.

    Comment by Jonathan Lewis — January 5, 2007 @ 5:27 pm GMT Jan 5,2007 | Reply

  15. […] stories about bitmap index use (and other Oracle features, in fact).Some people (one of them being Jonathan Lewis) pointed out that I was not entirely correct (or thorough) in my statements about bitmap indexes. […]

    Pingback by AMIS Technology blog » Blog Archive » Myths on bitmap indexes — January 10, 2007 @ 11:47 am GMT Jan 10,2007 | Reply

  16. […] issue with a post regarding myths of Oracle’s bitmap indexes on the AMIS Technology blog. In this post on Oracle Scratchpad, Jonathan answers the AMIS blog’s points one–by–one and at some length, […]

    Pingback by Pythian Group Blog » Log Buffer #24: a Carnival of the Vanities for DBAs — March 10, 2007 @ 2:45 am GMT Mar 10,2007 | Reply

  17. […] what happens in 9.2.0.7 (script shamelessly copied from Jonathan Lewis’s blog: SQL> select * from v$version where banner like ‘Oracle%’;BANNER […]

    Pingback by Bitmap index size, or, Size does matter. « Klein Oracle denkraam — September 18, 2007 @ 3:26 pm BST Sep 18,2007 | Reply

  18. Jonathan,

    Your comment 13 seems to be saying that Bitmap Indexes are indeed smaller than B-Tree indexes, which is opposite of what you stated as Myth 1. In the main body of the blog, you seem to be arguing that Myth 1 is true. What gives?

    Comment by John Wu — November 21, 2007 @ 5:45 am GMT Nov 21,2007 | Reply

  19. John,

    Thanks for the comment.

    The statement proposed as Myth 1 says that bitmap indexes are bigger than B*tree indexes, and my first comment about this is that we usually hear exactly the opposite, i.e. that it’s the B*tree indexes that are usually the larger indexes – and that’s exactly what I said in comment 13.

    In the main body of the blog, I pointed out that bitmap indexes can grow dramatically if the underlying table is subject to certain types of DML – and then demonstrated this.

    Your comment, however, has drawn my attention to the fact that my closing statement on that myth was badly put ‘So the “myth” has been mis-quoted, and it’s not a myth.’ and I’ve rewritten it to avoid the ambiguity.

    Comment by Jonathan Lewis — November 21, 2007 @ 7:58 am GMT Nov 21,2007 | Reply

  20. Jonathan,

    Thanks for the explanation. I agree that there are cases where the bitmap index size grows unexpectedly. The moral of the story is that one has to use the bitmap index with more care than with the B-tree index.

    Another unexpected behavior of the bitmap index is that it spends very little time in I/O when answering a query. I have not seen many people talking about this, probably because people can not get into the system. To get to the guts of the system, we implemented our own version of the bitmap indexes. In theory, ORACLE’s bitmap index is a BBC compressed basic bitmap index. To make sure our implementation is up to par, we performed extensive tests and results show that our implementation generally use about the same amount of space and are about twice as fast in answering queries. Next, we measure the time spent in I/O as a fraction of the total time used to answer a query in our own system. The results are documented in this report at DOLAP 2002, and a summary is shown in this picture. Even on a ridiculously slow disk system with 2 MB/s throughput, BBC compressed indexes may use as little as 50% of total time on I/O. Note that this measurement is done with unmounting the file system after each query, therefore the I/O time is the longest possible. In a typical system, the file system is mounted already and disk system are much faster than 2MB/s. The I/O time would be even smaller fraction of the total query response time. The moral of this story, smaller bitmap indexes are not always faster!

    Comment by John Wu — November 21, 2007 @ 4:56 pm GMT Nov 21,2007 | Reply

  21. […] about time I wrote a sequel to Mything in Action – and funnily enough it’s also about bitmap indexes. It starts with a note on the OTN […]

    Pingback by Mything 2 « Oracle Scratchpad — June 24, 2011 @ 8:40 pm BST Jun 24,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

%d bloggers like this: