Oracle Scratchpad

January 28, 2016

Bitmap Efficiency

Filed under: bitmaps,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 1:02 pm GMT Jan 28,2016

An interesting observation came up on the Oracle-L list server a few days ago that demonstrated how clever the Oracle software is at minimising run-time work, and how easy it is to think you know what an execution plan means when you haven’t actually thought through the details – and the details might make a difference to performance.

The original question was about a very large table with several bitmap indexes and an anomaly that appeared as a query changed its execution plan.  Here are the critical sections from the plans supplied. They had been pulled from memory with rowsource execution statistics enabled:

--------------------------------------------------------------------------------------------------------
|  Id |Operation                        | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
|   6 |    TABLE ACCESS BY INDEX ROWID  |       FACT |      1 |      1 |     24 |00:00:00.01 |      31 |
|   7 |     BITMAP CONVERSION TO ROWIDS |            |      1 |        |     24 |00:00:00.01 |       7 |
|   8 |      BITMAP AND                 |            |      1 |        |      1 |00:00:00.01 |       7 |
|*  9 |       BITMAP INDEX SINGLE VALUE |     FACT_0 |      1 |        |      1 |00:00:00.01 |       3 |
|* 10 |       BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      1 |        |      4 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C0"=243001)
    10 - access("FACT"."C1"="DIMENSION1"."ID")


-------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
|   7 |    BITMAP CONVERSION TO ROWIDS |            |      5 |        |      8 |00:00:00.01 |     119 |
|   8 |     BITMAP AND                 |            |      5 |        |      1 |00:00:00.01 |     119 |
|*  9 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      5 |        |     20 |00:00:00.01 |      28 |
|* 10 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM2 |      5 |        |    140 |00:00:00.01 |      78 |
|* 11 |      BITMAP INDEX SINGLE VALUE |     FACT_0 |      5 |        |      5 |00:00:00.01 |      13 |
|  12 |   TABLE ACCESS BY INDEX ROWID  |       FACT |      8 |      1 |      8 |00:00:00.01 |       8 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C1"="DIMENSION1"."ID")
    10 - access("FACT"."C2"="DIMENSION2"."ID")
    11 - access("FACT"."C0"=243001)

The first plan shows the steps leading to a single access (Starts = 1) to the fact table after combining two bitmap indexes; the second plan shows the second child of a nested loop join where Oracle has combined three bitmaps indexes to access the fact table – operation 7 (and its descendants) execute 5 times in this case. I’ve included the related parts of the predicate section so that you can see that the predicates at operations 9 and 10 of the first plan are the same as the predicates at operations 9 and 11 of the second plan.

So here’s the question – if one access to index fact_dim1 requires 4 buffer visits, why does it take 28 buffer visits to do the same thing 5 times (and it is with the same value every time); conversely if one access to index fact_0 requires 3 buffer visits, why do 5 visits to do the same thing take only 13 buffer visits. (Note: the arithmetic is made a little more obscure by the way in which index branch blocks may be pinned during nested loop joins.)

Then there’s a further question – not visible in the plan – the A-Rows value in a  “BITMAP INDEX SINGLE VALUE” operation is the number of bitmap chunks in the rowsource and we can see that the key values for index fact_dim2 have a significant number of bitmap chunks for a single key (5 starts returned 140 bitmap chunks). This scale, though, is true of all three indexes – in fact a follow-up email pointed out that a typical key value in every single one of the three indexes consisted of about 100 bitmap chunks, so why don’t those hundreds of chunks appear in the execution plan ?

To sum up so far: we have an execution plan where we haven’t visited all the bitmap chunks for a bitmap key, and the order in which the bitmap indexes are used in the plan seems to have some effect on the number of leaf-blocks you visit when accessing the chunks. This raises two questions

  • Could a change in the order of indexes make a significant difference to the number of bitmap chunks you visit and the resulting performance?
  • Is there a way to control the order in which you visit the indexes?

This is where the note starts to get a bit technical – if you don’t want to read any more the answers are: (a) yes but probably not significantly and (b) yes.

Demo

To investigate what goes on inside the “BITMAP AND” operation I created a table with two bitmap indexes and used a very large setting for pctfree for the indexes so that they had to be stored with a large number of bitmap chunks per key. Here’s the code that I used, with some results from an instance of 12.1.0.2:


rem
rem     Script:         bitmap_control_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2012
rem

create table people
nologging
as
with generator as (
        select  --+ materialize 
                rownum id 
        from dual
        connect by
                level <= 1e4    -- > hint to avoid wordpress format issue
)
select
        rownum                  id,
        mod(rownum-1, 1e2)      id_town_home,
        trunc((rownum-1)/1e4)   id_town_work,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6    -- > hint to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PEOPLE',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index pe_home on people(id_town_home) nologging pctfree 95;
create bitmap index pe_work on people(id_town_work) nologging pctfree 95;

select
        index_name, distinct_keys, num_rows, leaf_blocks, avg_leaf_blocks_per_key
from
        user_indexes
where
        table_name = 'PEOPLE'
order by
        index_name
;


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
-------------------- ------------- ---------- ----------- -----------------------
PE_HOME                        100      30399       15200                     152
PE_WORK                        100       1800         907                       9

As you can see I’ve generated two columns (id_town_home, id_town_work) with 100 distinct values and 10,000 rows each, but with very different data distributions. The rows for any given value for id_town_home are uniformly spread across the entire table (every hundredth row) while the rows for any given value of id_town_work are very tightly clustered as a group of 10,000 consecutive rows. As a consequence of the different patterns the index entry (bitmap string) for a typical key value for id_town_home is enormous and has to be broken into 304 chunks spread across 152 leaf blocks (2 index entries per leaf block) while the index entry for a typical key value for id_town_work is much smaller requiring only 18 chunks spread across 9 leaf blocks.

So what will I see if I run the following query twice, forcing it to use a BITMAP AND of the two indexes in the two possible orders:

select
        /*+ index_combine(pe) */
        max(small_vc)
from
        people pe
where
        id_town_home = 50
and     id_town_work = 50
;

Based on a very simple interpretation of the typical execution plan and using the index stats shown above we might expect to see roughly A-Rows = 18 with 9 buffer gets (plus a few more for segment headers and branch blocks) on the id_town_work index and A-Rows = 304 with 152 buffer gets on the id_town_home index to allow Oracle to generate and compare the two bit strings – but here are the two plans with their execution stats, generated in 12.1.0.2, and each run after flushing the buffer cache:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     118 |    117 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     118 |    117 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     118 |    117 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      18 |     17 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      18 |     17 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      4 |00:00:00.01 |       4 |      4 |
-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     122 |    120 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     122 |    120 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     122 |    120 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      22 |     20 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      22 |     20 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      5 |00:00:00.01 |       8 |      7 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
-------------------------------------------------------------------------------------------------------------------

We have not touched anything like the entire bit-string for the id_town_home index – a bit-string that spans 152 leaf blocks! Clearly Oracle is doing something clever to minimise the work, and it’s so clever that switching the order of these two extremely different indexes in the plan has made virtually no difference to the work done. Obviously I can’t tell you exactly what the code is doing, but I think I can suggest a reasonable hypothesis about what’s going on.

The pe_work index has the smaller number of leaf blocks per key, which makes it the better starting choice for the BITMAP AND in this case, so the optimizer’s default starting action was to pick the first couple of chunks of that index key value. The run-time code then sees that the first rowid that it could possibly need in its result set is roughly in the middle of the table – remember that the “key” columns of a bitmap index are actually (real_key, first_rowid_of chunk, last_rowid_of_chunk, compressed_bitstring).

Since it now knows the lowest possible rowid that it could need Oracle can now use this rowid to probe the pe_home index by (id_town_home=50, {target_rowid}) – which will let it go to a bitmap index chunk that’s roughly in the middle of the full range of 152 leaf blocks. Then Oracle can expand the bitstrings from the chunks it has, reading new chunks as needed from each of the indexes until the 18 chunks / 9 leaf block from the pe_work index have been used up (and that range would have aligned with just two or three chunks from the pe_home index) at which point Oracle can see there’s no more rows in the table that could match both predicates and it doesn’t need to read the next 70+ chunks of the pe_home index.

Conversely when I forced Oracle to use the (inappropriate) pe_home index first it read the first couple of chunks, then read the first couple of chunks of the pe_work index, at which point it discovered that it didn’t need any of the pe_home index prior to (roughly) chunk 75, so it jumped straight to the right chunk (without reading all the leaf blocks in between) to align with pe_work and carried on from there. That’s why the forced, less efficient, plan that visited pe_home first visited just a couple more leaf blocks than the plan the optimizer selected for itself.

Bottom line on performance

Oracle is sufficiently smart about checking the start and end ranges on bitmap indexes (rather then arbitrarily expanding the entire bit string for each key) that even for very large bitmap index entries it will probably only access a couple of “redundant” leaf blocks per index even if it picks the worst possible order for using the indexes.

You’re far more likely (if you know your data) to notice Oracle picking the wrong indexes than you are to spot it using the right indexes in the wrong order – and given that bitmap indexes tend to be relatively small and well buffered (compared to the tables), and given the relatively large number of rows we pick by random I/O from fact tables, a little extra work in the bitmap indexes is unlikely to make a significant difference to the overall performance of most queries.

Footnote

In the unlikely circumstances that you do find a special case where it will make a difference (and it will probably be a difference in CPU usage) then you can dictate the order of the indexes with the undocumented bitmap_tree() hint.  I may get round to writing up the variations one day but, for this simple case, the index_combine() hint that I initially used to force the BITMAP AND turned into the following bitmap_tree() hint in the outline:


bitmap_tree(@sel$1            pe@sel$1                   and( (people.id_town_work)      (people.id_town_home)       ) )

bitmap_tree(@query_block      table_alias@query_block    and( ({first index definition}) ({second index definition}) ) )

Obviously this is not something you should throw into production code casually – check with Oracle support if you think it’s really necessary – but if you wanted to reverse the order of index usage in this case you could just swap the order of the index definitions. If you thought there was a third index that should be used you could include its definition (note that “index definition” consists of a list of “table_name.column_name”  in the brackets).

4 Comments »

  1. The rowid-based pruning sounds rather similar to a trick I’ve used from time to time when trying to speed up semijoin-using queries: Give the min and max join values separately where the CBO can recognize for sure what range of values it won’t need to go outside of, which helps with e.g. partition elimination. (This was a bigger deal in 10g; the 11g partition elimination supposedly got smarter IIRC.)

    WITH q AS (...)
    SELECT *
    FROM mytab t
    WHERE t.lookup_key IN (SELECT q.lookup_key FROM q)
      AND t.lookup_key >= (SELECT min(q.lookup_key) FROM q)
      AND t.lookup_key <= (SELECT max(q.lookup_key) FROM q)
    

    The extra subqueries are logically redundant but I’ve found them useful once in a while for performance.

    Comment by Jason Bucata — February 3, 2016 @ 6:42 pm GMT Feb 3,2016 | Reply

    • Jason,

      Thanks for the comment.
      I know this is a very slow follow-up – I must have missed it when you first posted it.

      It’s a good cunning trick and an excellent example of the general principle “can you state things differently to eliminate more data early). This one can be especially helpful when the lookup_key table has multiple lookup_types with very different ranges of values.

      Rediscovering your comment 5 years on, though, one thing I’d have to say is that with the newest versions you might have to be a bit careful about the optimizer getting too keen on unnesting subqueries and playing around with aggregate transformations.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — January 26, 2021 @ 11:37 am GMT Jan 26,2021 | Reply

  2. I traced both queries…it seems Oracle will alternate between the 2 indexes, but the end result is a similar count of blocks read.

    1. For the first explain plan (PE_HOME at the bottom of the explain plan):- 10 reads
    – Read 1 block from the PE_WORK
    – Read 2 blocks from PE_HOME
    – Read 6 blocks from PE_WORK
    – Read 1 block from PE_HOME

    2. For the second explain plan (PE_WORK at the bottom of the explain plan):- 11 reads
    – Read 1 block from the PE_HOME
    – Read 1 blocks from PE_WORK
    – Read 2 blocks from PE_HOME
    – Read 6 block from PE_WORK
    – Read 1 block from PE_HOME

    Note: Using 12.2 version of Oracle with 16K blocksize

    Comment by Brian — January 26, 2021 @ 1:23 am GMT Jan 26,2021 | Reply

    • Brian,

      Thanks for the comment.

      Allowing for the slight reduction in blocks visited in the change from 8KB blocksize to 16KB blocksize this shows us that 12.2 is still using the same strategy to minimise the index leaf blocks accessed.

      Inevitably your note prompted a thought about how big a bitmap index chunk could get. For an 8KB block it’s about half a block, but I don’t know whether that’s because there’s a “half block” limitation, or because there’s (e.g.) a 4KB limit which coincidentally looks like half a block.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — January 26, 2021 @ 11:15 am GMT Jan 26,2021 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.