Oracle Scratchpad

January 17, 2014

Bitmap question

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 7:06 pm BST Jan 17,2014

If you know anything about bitmap indexes you probably know that a single entry in a bitmap index takes the form (key_value, starting rowid, ending rowid, BBC compressed bit string). So an entry covers a single value for a column over a range of rowids  in the table, and the string of bits for that (notional) range is reduce to a minimum by a compression mechanism that eliminate repeated zeros in multiples of 8.

So here’s a question – to which I don’t know the answer, although you may be surprised when you try to find it:

If you have a very large table and in one of its columns the first row and the last row (and no others) hold the value 0 (say) and you create a bitmap index on this column, what’s the largest number of rows you could have in the table before Oracle would HAVE to create two index entries in order to cover both rows ?

Follow-up question – once you start getting close to working out the answer, can you think of a way to provide an example without actually creating a table with that many rows in it ?

 

6 Comments »

  1. Hello Jonathan.
    Very interesting question:)
    Lets assume, that first ‘0’ has minimum ROWID in the table, and the second ‘0’ – maximum? and only one index entry in the block.
    Lets try to find out.
    Block size is 8192. In the dump I see that kdxlebksz is 8032. Probably we have 8032 bytes for index entry.
    The space, available for compressed bit string is:
    8036-1(flag byte)-1(lock byte)-1(column count)-5(3 columns length by 1 byte and 1 as 2 byte for bit tring)-2*6(ROWID’s)-2(row directory)=8014
    According to http://www.google.com/patents/US6205442
    bitmap should be devided into 2 groups:
    1. Short gap bit atom. It will be enough 1 Control byte to show it.
    2. Long gap bit atom. 1 control byte and 8012 bytes for the gap size. Only 7 bits of each byte used to for the length, so we have 56084 bits to show gap size.
    So the biggest number of LOGICAL rows between 2 ROWID’s, that can be saved in 8K block is 2^56084 plus 24 rows(from control byte Split). This number is unavailable so probably Oracle will never create 2 entries(in case that index builded and no DML operations after that).
    It will be interesting to get right answer
    P.S. The thing I didn’t get – how Oracle compute number of rows between ROWID’s. Probably Hakan FActor used for it, but what to do if ROWID’s from different datafiles?
    Thanks,
    Andrey.

    Comment by Andrey — January 23, 2014 @ 9:00 am BST Jan 23,2014 | Reply

  2. Sorry, some wrong calculation….
    I forgot 1 byte for the key value. So 8011 bytes, 56077bits. So the biggest number of gap BYTEs is (2^56077+24), and number of rows(1 bit is a row) is (2^56080+192)

    Comment by Andrey — January 23, 2014 @ 9:30 am BST Jan 23,2014 | Reply

    • Andrey,

      Thanks for the response – and especially for the link to the patent (which I shall resist reading for as long as possible).

      Two details – there is no “column count” on an index leaf block, so you can add one back; and there is a restriction that the index entry (for a bitmap index) can’t exceed roughly 50% of the space in the block -the largest bitmap string I’ve seen on “create bitmap index” is 3931 bytes. That still leads to such a big gap that it won’t be possible to prove – only disprove if Oracle has put in a small enough arbitrary boundary – and even testing with a very large hakan factor and one row per block that would be a lot of data.

      I hadn’t thought about any special handling for crossing files. The arithmetic certainly seems to assume the Hakan factor shows the number of rows per block – I set mine up to 2,000 on an 8KB block size to do some testing recently and the effects seemed to agree with that hypothesis. I had also assumed that Oracle basically ignored absolute block addresses and worked on extents and blocks to do the arithmetic, so the fact that you changed files as you walked through the extents would be irrelevant. I haven’t tested that thought, though.

      Comment by Jonathan Lewis — January 23, 2014 @ 5:32 pm BST Jan 23,2014 | Reply

      • Thanks for quick reply.
        Can you please share you knolwledge on calculation of gaps between rows in different files?
        For example
        1. File is same, Hakan factor is 100. It’s clear.
        Block Number Row Number
        12 0
        2312 656
        Gap between rows will be (BlockNumber2-BlockNumber1)*Hakan+(RowNumber2-RowNumber1) = (2312-12)*100+(656-0) = 230656 rows.

        2. Different files(real example)
        FileNo Block Number Row Number AbsoluteBlockAdress
        28 12 0 50331676
        29 2312 656 1107296285
        What to do with this? Shall we use AbsoluteBlockAdress?

        P.S. Please let me know if it is wrong place to ask questions.
        Thanks,
        Andrey.

        Comment by Andrey — January 24, 2014 @ 1:02 pm BST Jan 24,2014 | Reply

        • Andrey,

          Your question is a perfectly reasonable follow-up to the original posting – and it would be most rude of me to object after your contribution about the patent and doing the arithmetic.

          I hadn’t considered exactly what Oracle would do when a segment spreads across multiple data files, so I had to run up a little test to see what it looked like. I think your suggested about the block address is correct (although I think it may be the tablespace relative block address – I don’t have a database with an example of the absolute file number and relative file number being different at present, so I can’t check).

          I created a tablespace with two data files (with consecutive file numbers), and then created a table with just enough data to cross two extents (which was one from each tablespace because I’d set the extents to local uniform at 1MB). The “block gap count” recorded in the bitmap index between the last row in the first extent and the first row in the second extent was power(2,20) – 129, i.e. the size of the file behaving as if the second extent started power(2,20) blocks away from the start of the first extent. (I didn’t try it with a bigfile tablespace, but I would guess that we’d see the gap recorded as roughly power(2,30) blocks.

          Comment by Jonathan Lewis — January 24, 2014 @ 5:58 pm BST Jan 24,2014

        • Cool:)
          I did similar test several days ago but wasn’t able to decode “block gap count” correctly – I have just a small piece of experience:(
          I wonder you have enough time to do so much tests etc.
          Also I want to express thanks and respect to you for the work you do for Oracle community.

          Thanks,
          Andrey.

          Comment by Andrey — January 24, 2014 @ 6:25 pm BST Jan 24,2014


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,257 other followers