Oracle Scratchpad

May 18, 2018

Bitmap Join Indexes

Filed under: bitmaps,CBO,Execution plans,Indexing,Oracle,Statistics — Jonathan Lewis @ 2:29 pm BST May 18,2018

I’ve been prompted by a recent question on the ODC database forum to revisit a note I wrote nearly five years ago about bitmap join indexes and their failure to help with join cardinalities. At the time I made a couple of unsupported claims and suggestions without supplying any justification or proof. Today’s article finally fills that gap.

The problem is this – I have a column which exhibits an extreme skew in its data distribution, but it’s in a “fact” table where most columns are meaningless ids and I have to join to a dimension table on its primary key to translate an id into a name. While there is a histogram on the column in the fact table the information in the histogram ceases to help if I do the join to the dimension and query by name, and the presence of a bitmap join index doesn’t make any difference. Let’s see this in action – some of the code follows a different pattern and format from my usual style because I started by copying and editing the example supplied in the database forum:


rem
rem     Script:         bitmap_join_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem
rem     Notes:
rem     Bitmap join indexes generate virtual columns on the fact table
rem     but you can't get stats on those columns - which means if the
rem     data is skewed you can have a histogram on the raw column but
rem     you don't have a histogram on the bitmap virtual column.
rem

drop table t1;
drop table dim_table;

create table dim_table (type_code number, object_type varchar2(10));

insert into dim_table values (1,'TABLE');
insert into dim_table values (2,'INDEX');
insert into dim_table values (3,'VIEW');
insert into dim_table values (4,'SYNONYM');
insert into dim_table values (5,'OTHER');

alter table dim_table add constraint dim_table_pk primary key (type_code) using index;

exec dbms_stats.gather_table_stats(user,'dim_table',cascade=>true);

create table t1 
nologging
as 
select 
        object_id, object_name, 
        decode(object_type, 'TABLE',1,'INDEX',2,'VIEW',3,'SYNONYM',4,5) type_code 
from 
        all_objects
where
        rownum <= 50000 -- > comment to bypass wordpress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;


create  bitmap index t1_b1 on t1(dt.object_type)
from    t1, dim_table dt
where   t1.type_code = dt.type_code
;

exec dbms_stats.gather_table_stats(null, 't1', cascade=>true, method_opt=>'for all columns size 254');


select
        dt.object_type, count(*)
from
        t1, dim_table  dt
where
        t1.type_code   = dt.type_code
group by
        dt.object_type
order by
        dt.object_type
;

I’ve started with a dimension table that lists 5 type codes and has a primary key on that type code; then I’ve used all_objects to generate a table of 400,000 rows using those type codes, and I’ve created a bitmap join index on the fact (t1) table based on the dimension (dim_table) table column. By choice the distribution of the five codes is massively skewed so after gathering stats (including histograms on all columns) for the table I’ve produced a simple aggregate report of the data showing how many rows there are of each type – by name. Here are the results – with the execution plan from 12.1.0.2 showing the benefit of the “group by placement” transformation:


OBJECT_TYP   COUNT(*)
---------- ----------
INDEX           12960
OTHER          150376
SYNONYM        177368
TABLE           12592
VIEW            46704

5 rows selected.

-------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |       |       |   735 |
|   1 |  SORT GROUP BY        |           |     5 |   125 |   735 |
|*  2 |   HASH JOIN           |           |     5 |   125 |   720 |
|   3 |    VIEW               | VW_GBF_7  |     5 |    80 |   717 |
|   4 |     HASH GROUP BY     |           |     5 |    15 |   717 |
|   5 |      TABLE ACCESS FULL| T1        |   400K|  1171K|   315 |
|   6 |    TABLE ACCESS FULL  | DIM_TABLE |     5 |    45 |     2 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ITEM_1"="DT"."TYPE_CODE")

Having established the basic result we can now examine some execution plans to see how well the optimizer is estimating cardinality for queries relating to that skewed distribution. I’m going to generate the execution plans for a simple select of all the rows of type ‘TABLE’ – first by code, then by name, showing the execution plan of each query:


explain plan for
select  t1.object_id
from
        t1
where
        t1.type_code = 1
;

select * from table(dbms_xplan.display(null,null,'outline'));


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 12592 |    98K|   281   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 12592 |    98K|   281   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."TYPE_CODE"=1)

Thanks to the histogram I generated on the type_code table the optimizer’s estimate of the number of rows is very accurate. So how well does the optimizer handle the join statistics:


prompt  =============
prompt  Unhinted join
prompt  =============

explain plan for
select  t1.object_id
from
        t1, dim_table  dt
where
        t1.type_code   = dt.type_code 
and     dt.object_type = 'TABLE'
;

select * from table(dbms_xplan.display(null,null,'outline'));

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           | 80000 |  1328K|   287  (10)| 00:00:01 |
|*  1 |  HASH JOIN         |           | 80000 |  1328K|   287  (10)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| DIM_TABLE |     1 |     9 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1        |   400K|  3125K|   277   (7)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."TYPE_CODE"="DT"."TYPE_CODE")
   2 - filter("DT"."OBJECT_TYPE"='TABLE')

Taking the default execution path the optimizer’s estimate of rows identified by type name is 80,000 – which is one fifth of the total number of rows. Oracle knows that the type_code is skewed in t1, but at compile time doesn’t have any idea which type_code corresponds to type ‘TABLE’, so it’s basically using the number of distinct values to dictate the estimate.

We could try hinting the query to make sure it uses the bitmap join index – just in case this somehow helps the optimizer (and we’ll see in a moment why we might have this hope, and why it is forlorn):


prompt  ===================
prompt  Hinted index access
prompt  ===================

explain plan for
select 
        /*+ index(t1 t1_b1) */
        t1.object_id
from
        t1, dim_table dt
where
        t1.type_code   = dt.type_code 
and     dt.object_type = 'TABLE'
;

select * from table(dbms_xplan.display(null,null,'outline'));

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       | 80000 |   625K|   687   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    | 80000 |   625K|   687   (1)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | T1_B1 |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."SYS_NC00004$"='TABLE')

The plan tells us that the optimizer now realises that it doesn’t need to reference the dimension table at all – all the information it needs is in the t1 table and its bitmap join index – but it still comes up with an estimate of 80,000 for the number of rows. The predicate section tells us what to do next – it identifies a system-generated column, which is the virtual column underlying the bitmap join index: let’s see what the stats on that column look like:


select
        column_name, histogram, num_buckets, num_distinct, num_nulls, sample_size
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_id
;


COLUMN_NAME          HISTOGRAM       NUM_BUCKETS NUM_DISTINCT  NUM_NULLS SAMPLE_SIZE
-------------------- --------------- ----------- ------------ ---------- -----------
OBJECT_ID            HYBRID                  254        50388          0        5559
OBJECT_NAME          HYBRID                  254        29224          0        5560
TYPE_CODE            FREQUENCY                 5            5          0      400000
SYS_NC00004$         NONE

4 rows selected.

There are no stats on the virtual column – and Oracle won’t try to collect any, and even if you write some in (using dbms_stats.set_column_stats) it won’t use them for the query. The optimizer seems to be coded to use the number of distinct keys from the index in this case.

 

Workaround

It’s very disappointing that there seems to be no official way to work around this problem – but Oracle has their own (undocumented) solution to the problem that comes into play with OLAP – the hint /*+ precompute_subquery() */. It’s possible to tell the optimizer to execute certain types of subquery as the first stage of optimising a query, then changing the query to take advantage of the resulting data:


explain plan for
select
        /*+
                qb_name(main)
                precompute_subquery(@subq)
        */
        t1.object_id
from
        t1
where
        t1.type_code in (
                select
                        /*+
                                qb_name(subq)
                        */
                        dt.type_code
                from    dim_table dt
                where   dt.object_type = 'TABLE'
        )
;

select * from table(dbms_xplan.display(null,null,'outline'));

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 12592 |    98K|   281   (8)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 12592 |    98K|   281   (8)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T1"."TYPE_CODE"=1)

Oracle hasn’t optimized the query I wrote, instead it has executed the subquery, derived a (very short, in this case) list of values, then optimized and executed the query I first wrote using the constant(s) returned by the subquery. And you can’t see the original subquery in the execution plan. Of course, with the literal values in place, the cardinality estimate is now correct.

It’s such a pity that this hint is undocumented, and one that you shouldn’t use in production.

 

June 13, 2016

Bitmap Counts

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 12:40 pm BST Jun 13,2016

A question came up on the Oracle-L list server a few days ago about a query whose plan showed several bitmap operations. The problem was that the A-Rows column reported by a call to dbms_xplan.display_cursor() was showing numbers that semed to be far too small. In fact the query was producing a parallel execution plan, so the “actuals” for the parallel server operations were reporting zeros because the OP had used the “allstats last” formatting option rather than just “allstats” – but the numbers were still far too small even after this error had been corrected.

This was a detail I’d not noticed before but there was an obvious(footnote 1) guess to explain why this apparent anomaly had appeared, viz: A-Rows counts rows in the rowsource for a table, index entries for a B-tree rowsource, and strings of bits when producing bitmaps. But even the most obvious guess should be checked so here’s some code that will  (at least) corroborate the hypothesis:


rem     Script:         bitmap_counts.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum as number(8,0)                             id,
        mod(rownum - 1,2) as number(8,0)                  n2,
        mod(rownum - 1,100) as number(8,0)                n100,
        lpad(rownum,10,'0') as varchar2(10)               v1,
        lpad('x',100,'x') as varchar2(100)                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
; 
create bitmap index t1_b2 on t1(n2) nologging;
create bitmap index t1_b100 on t1(n100) nologging;

begin 
        dbms_stats.gather_table_stats(
                ownname    => user,
                tabname    =>'T1',
                method_opt => 'for all columns size 1'
        );
end;
/

I’ve created a table that includes two columns that I’ve indexed with bitmap indexes. Because of the size of the table and the pattern of the data each distinct value for the two columns will require multiple bitmap index entries in the bitmap index.

The multiple bitmap chunks are important but before I comment further, here’s what the program does after creating the data:


select  index_name, distinct_keys, num_rows, clustering_factor
from    user_indexes
where   table_name = 'T1'
order by
        index_name
;

alter session set statistics_level = all;
set serveroutput off

select
        count(*)
from    t1
where   n2 = 0
and     n100 between 20 and 29
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

The first query reports some index stats – to confirm my comment about multiple index entries per key – the second query is the one that’s going to give me a useful plan. Let’s just check the results of the two queries first – the index stats are the only ones needing any comment:


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR
-------------------- ------------- ---------- -----------------
T1_B100                        100        800               800
T1_B2                            2         94                94


  COUNT(*)
----------
     50000

Index t1_b100 reports 800 index entries – the bitmap chunk for each value had to be split into 8 rowid ranges; we’re interested in 10 key values from this index.

Index t1_b2 shows 94 index entries – the bitmap for each value had to be split into 47 rowid ranges: we’re interested in one key value from this index.

And this is what the plan shows:


-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |      1 |        |      1 |00:00:00.01 |      69 |       |       |          |
|   1 |  SORT AGGREGATE              |         |      1 |      1 |      1 |00:00:00.01 |      69 |       |       |          |
|   2 |   BITMAP CONVERSION COUNT    |         |      1 |  55455 |      2 |00:00:00.01 |      69 |       |       |          |
|   3 |    BITMAP AND                |         |      1 |        |      2 |00:00:00.01 |      69 |       |       |          |
|*  4 |     BITMAP INDEX SINGLE VALUE| T1_B2   |      1 |        |     47 |00:00:00.01 |      25 |       |       |          |
|   5 |     BITMAP MERGE             |         |      1 |        |      2 |00:00:00.01 |      44 |  1024K|   512K|  291K (0)|
|*  6 |      BITMAP INDEX RANGE SCAN | T1_B100 |      1 |        |     80 |00:00:00.01 |      44 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("N2"=0)
   6 - access("N100">=20 AND "N100"<=29)


I can’t explain why the in-memory manipulation of the the bitstrings apparently produces two bitstrings at operations 3 and 5, but given the index stats we can understand that the 47 “rows” reported for operation 4 allow for one of the two key values in the index and the 80 “rows” reported for operation 6 allows for 10 of the key values in the index.   Q.E.D.

Bonus commentary

If you want to see table row counts (or their equivalent) appearing in a bitmap plan you’ll need to run a query that does a “bitmap conversion to rowids”, e.g.:


select
        count(n100)
from    t1
where   n2 = 0
and     n100 between 20 and 29
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |      1 |        |      1 |00:00:05.04 |      67 |       |       |          |
|   1 |  SORT AGGREGATE                |                  |      1 |      1 |      1 |00:00:05.04 |      67 |       |       |          |
|*  2 |   VIEW                         | index$_join$_001 |      1 |  55455 |  50000 |00:00:04.94 |      67 |       |       |          |
|*  3 |    HASH JOIN                   |                  |      1 |        |  50000 |00:00:04.75 |      67 |    25M|  4150K|   27M (0)|
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |      1 |  55455 |    500K|00:00:00.94 |      25 |       |       |          |
|*  5 |      BITMAP INDEX SINGLE VALUE | T1_B2            |      1 |        |     47 |00:00:00.01 |      25 |       |       |          |
|   6 |     BITMAP CONVERSION TO ROWIDS|                  |      1 |  55455 |    100K|00:00:00.19 |      42 |       |       |          |
|*  7 |      BITMAP INDEX RANGE SCAN   | T1_B100          |      1 |        |     80 |00:00:00.01 |      42 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("N100"<=29 AND "N2"=0 AND "N100">=20))
   3 - access(ROWID=ROWID)
   5 - access("N2"=0)
   7 - access("N100">=20 AND "N100"<=29)


Note how 80 “rows” at operation 7 turn into 100,000 rows at operation 6, and 47 “rows” at opration 5 turn into 500,000 rows at operation 4.

I’ve tested 11.2.0.4 and 12.1.0.2 with this code and they both behave the same way. Interestingly the final query (count with index hash join) took about 0.12 seconds to run with rowsource execution statistics disabled, but roughly 5 seconds with statistics enabled – and the extra time was all in the hash join.

Footnote 1:

“Obvious”: provided you’ve been working with the relevent bits of the Oracle software for several years(footnote 2) or have been reading the right books; even then something that’s “obvious” isn’t necessarily correct. ‘Oh, obvious,’ said Granny [Weatherwax]. ‘I’ll grant you it’s obvious. Trouble is, just because things are obvious doesn’t mean they’re true.’ – Wyrd Sisters: Terry Pratchett.

Footnote 2:

‘Oh, it’s largely intuitive, Archchancellor,’ said Ponder.  ‘Obviously you have to spend a lot of time learning it first, though.’ – Hogfather: Terry Pratchett

 

February 2, 2016

Partitioned Bitmap Join

Filed under: bitmaps,Bugs,Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:32 am GMT Feb 2,2016

If you don’t want to read the story, the summary for this article is:

If you create bitmap join indexes on a partitioned table and you use partition exchanges to load data into the table then make sure you create the bitmap join indexes on the loading tables in exactly the same order as you created them on the partitioned table or the exchange will fail with the (truthful not quite complete) error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION.

My story starts with this OTN posting from John Hall where he found after a year of successful batch loading one of his partition exchanges was raising error 14098. After an exchange of ideas, user rp0428 came up with a query against sys.jijoin$ (one of the tables behind bitmap join indexes) that allowed John Hall to see that the indexes on the exchange table had been created in a different order from that of the partitioned table. I did a quick test to see if this might be relevant (it shouldn’t be, it isn’t with “normal” indexes or function-based indexes, or virtual columns) and didn’t manage to reproduce the problem with two dimension tables and two bitmap join indexes.

Fortunately John didn’t take my word for it and tested the idea on a clone of the production system – and found that the order of creation did matter. His system, however, had 9 dimension tables and 33 bitmap join indexes – which shouldn’t have made any difference in principle, but maybe it was something to do with having several indexes on the same table,  maybe it was something to do with have far more tables or far more indexes than I had. So I built a larger test case with 6 dimension tables and six indexes per table – and reproduced the problem.

Then I started cutting back to see where the problem appeared, and found that all it took was one dimension with two indexes, or two dimensions with one index each – whatever I had done in my “quick test” I had clearly done it too quickly and done something wrong. (Unfortunately I had overwritten most of the code from the original quick test while building the larger test, so I couldn’t go back and see where the error was.)

Here, then, is the minimal test case that I finally ran to demonstrate that switching the order of index creation on the exchange table causes the exchange to fail:


rem
rem     Script:         bitmap_join_pt_bug.sql
rem     Dated:          Feb 2016
rem     Author:         J.P.Lewis
rem
rem     Last tested
rem             19.3.0.0
rem             18.1.0.0 (LiveSQL)
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem             10.2.0.5
rem

drop table pt_range purge;
drop table t1 purge;
drop table dim_1 purge;
drop table dim_2 purge;

prompt  =================
prompt  Partitioned table
prompt  =================

create table pt_range (
        id,
        grp1,
        grp2,
        padding
)
nologging
partition by range(id) (
        partition p2001 values less than (2001),
        partition p4001 values less than (4001),
        partition p6001 values less than (6001),
        partition p8001 values less than (8001)
)
as
select
        rownum                          id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where
        rownum &lt;= 8000 -- &gt; comment to avoid wordpress format issue.
;

prompt  ================================================
prompt  Exchange table - loaded to match partition p8001
prompt  ================================================

alter table pt_range
add constraint pt_pk primary key (id) using index local;

create table t1 (
        id,
        grp1,
        grp2,
        padding
)
as
select
        rownum + 6000                   id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where
        rownum &lt;= 2000 -- &gt; comment to avoid wordpress format issue.
;

alter table t1
add constraint t1_pk primary key (id);

execute dbms_stats.gather_table_stats(user,'pt_range')
execute dbms_stats.gather_table_stats(user,'t1')

prompt  ================
prompt  dimension tables
prompt  ================

create table dim_1
as
select distinct
        grp1,
        cast('A'||grp1 as varchar2(3)) agrp1,
        cast('B'||grp1 as varchar2(3)) bgrp1
from
        t1
;

create table dim_2 as select * from dim_1;

prompt  ===============================
prompt  Primary keys required for BMJIs
prompt  ===============================

alter table dim_1 add constraint d1_pk primary key (grp1);
alter table dim_2 add constraint d2_pk primary key (grp1);

execute dbms_stats.gather_table_stats(user,'dim_1')
execute dbms_stats.gather_table_stats(user,'dim_2')

prompt  ============================
prompt  Creating bitmap join indexes
prompt  ============================

create bitmap index pt_1a on pt_range(d1.agrp1) from pt_range pt, dim_1 d1 where d1.grp1 = pt.grp1 local ;
create bitmap index pt_2a on pt_range(d2.agrp1) from pt_range pt, dim_2 d2 where d2.grp1 = pt.grp2 local ;

prompt  ====================================================
prompt  Pick your index creation order on the exchange table
prompt  ====================================================

-- create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;
create bitmap index t1_2a on t1(d2.agrp1) from t1, dim_2 d2 where d2.grp1 = t1.grp2 ;
create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;

prompt  ==================
prompt  Exchanging (maybe)
prompt  ==================

alter table pt_range
        exchange partition p8001 with table t1
        including indexes
        without validation
;

I’ve got the same create statement twice for one of the bitmap join indexes – as it stands the indexes will be created in the right order and the exchange will work; if you comment out the first t1_1a create and uncomment the second the exchange will fail. (If you comment out the ‘including indexes’ then the exchange will succeed irrespective of the order of index creation, but that rather defeats the point of being able to exchange partitions.)

I’ve reproduced the problem in 12.2.0.1, 12.1.0.2, 11.2.0.4 and 10.2.0.5

Footnote

Running an extended trace didn’t help me work out how Oracle is detecting the mismatch, presumably it’s something that gets into the dictionary cache in a general “load the index definition” step; but it did show me that (in the “without validation” case) the code seems to check the correctness of the exchange table’s primary key data BEFORE checking whether the indexes match properly.

Update (May 2018)

This problem still exists in 18.1 (tested on LiveSQL). Of course you might be thinking at this point that you would avoid the problem by using the command “create table t1 for exchange with table pt_range;” (a possibility introduced in 12.2) but that command only creates the correct column structure for the table and doesn’t create any indexes or, even, constraints.

Funnily enough, though, the command fails anyway with Oracle error: ORA-14424: CREATE TABLE FOR EXCHANGE cannot be used with certain types of tables.” According to the oraus.msg file “certain types of tables” includes: “cluster, temporary, index-organized table (IOT), external tables”. I haven’t yet found anything that says tables with bitmap join indexes will be excluded.

Update (May 2020)

The behaviour is unchanged in 19.3

January 28, 2016

Bitmap Efficiency

Filed under: bitmaps,Indexing,Infrastructure,Oracle — Jonathan Lewis @ 1:02 pm GMT Jan 28,2016

An interesting observation came up on the Oracle-L list server a few days ago that demonstrated how clever the Oracle software is at minimising run-time work, and how easy it is to think you know what an execution plan means when you haven’t actually thought through the details – and the details might make a difference to performance.

The original question was about a very large table with several bitmap indexes and an anomaly that appeared as a query changed its execution plan.  Here are the critical sections from the plans supplied. They had been pulled from memory with rowsource execution statistics enabled:

--------------------------------------------------------------------------------------------------------
|  Id |Operation                        | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
--------------------------------------------------------------------------------------------------------
|   6 |    TABLE ACCESS BY INDEX ROWID  |       FACT |      1 |      1 |     24 |00:00:00.01 |      31 |
|   7 |     BITMAP CONVERSION TO ROWIDS |            |      1 |        |     24 |00:00:00.01 |       7 |
|   8 |      BITMAP AND                 |            |      1 |        |      1 |00:00:00.01 |       7 |
|*  9 |       BITMAP INDEX SINGLE VALUE |     FACT_0 |      1 |        |      1 |00:00:00.01 |       3 |
|* 10 |       BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      1 |        |      4 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C0"=243001)
    10 - access("FACT"."C1"="DIMENSION1"."ID")


-------------------------------------------------------------------------------------------------------
|  Id | Operation                      | Name       | Starts | E-Rows | A-Rows |     A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
|   7 |    BITMAP CONVERSION TO ROWIDS |            |      5 |        |      8 |00:00:00.01 |     119 |
|   8 |     BITMAP AND                 |            |      5 |        |      1 |00:00:00.01 |     119 |
|*  9 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM1 |      5 |        |     20 |00:00:00.01 |      28 |
|* 10 |      BITMAP INDEX SINGLE VALUE |  FACT_DIM2 |      5 |        |    140 |00:00:00.01 |      78 |
|* 11 |      BITMAP INDEX SINGLE VALUE |     FACT_0 |      5 |        |      5 |00:00:00.01 |      13 |
|  12 |   TABLE ACCESS BY INDEX ROWID  |       FACT |      8 |      1 |      8 |00:00:00.01 |       8 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
     9 - access("FACT"."C1"="DIMENSION1"."ID")
    10 - access("FACT"."C2"="DIMENSION2"."ID")
    11 - access("FACT"."C0"=243001)

The first plan shows the steps leading to a single access (Starts = 1) to the fact table after combining two bitmap indexes; the second plan shows the second child of a nested loop join where Oracle has combined three bitmaps indexes to access the fact table – operation 7 (and its descendants) execute 5 times in this case. I’ve included the related parts of the predicate section so that you can see that the predicates at operations 9 and 10 of the first plan are the same as the predicates at operations 9 and 11 of the second plan.

So here’s the question – if one access to index fact_dim1 requires 4 buffer visits, why does it take 28 buffer visits to do the same thing 5 times (and it is with the same value every time); conversely if one access to index fact_0 requires 3 buffer visits, why do 5 visits to do the same thing take only 13 buffer visits. (Note: the arithmetic is made a little more obscure by the way in which index branch blocks may be pinned during nested loop joins.)

Then there’s a further question – not visible in the plan – the A-Rows value in a  “BITMAP INDEX SINGLE VALUE” operation is the number of bitmap chunks in the rowsource and we can see that the key values for index fact_dim2 have a significant number of bitmap chunks for a single key (5 starts returned 140 bitmap chunks). This scale, though, is true of all three indexes – in fact a follow-up email pointed out that a typical key value in every single one of the three indexes consisted of about 100 bitmap chunks, so why don’t those hundreds of chunks appear in the execution plan ?

To sum up so far: we have an execution plan where we haven’t visited all the bitmap chunks for a bitmap key, and the order in which the bitmap indexes are used in the plan seems to have some effect on the number of leaf-blocks you visit when accessing the chunks. This raises two questions

  • Could a change in the order of indexes make a significant difference to the number of bitmap chunks you visit and the resulting performance?
  • Is there a way to control the order in which you visit the indexes?

This is where the note starts to get a bit technical – if you don’t want to read any more the answers are: (a) yes but probably not significantly and (b) yes.

Demo

To investigate what goes on inside the “BITMAP AND” operation I created a table with two bitmap indexes and used a very large setting for pctfree for the indexes so that they had to be stored with a large number of bitmap chunks per key. Here’s the code that I used, with some results from an instance of 12.1.0.2:


rem
rem     Script:         bitmap_control_02.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2012
rem

create table people
nologging
as
with generator as (
        select  --+ materialize 
                rownum id 
        from dual
        connect by
                level <= 1e4    -- > hint to avoid wordpress format issue
)
select
        rownum                  id,
        mod(rownum-1, 1e2)      id_town_home,
        trunc((rownum-1)/1e4)   id_town_work,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6    -- > hint to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PEOPLE',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index pe_home on people(id_town_home) nologging pctfree 95;
create bitmap index pe_work on people(id_town_work) nologging pctfree 95;

select
        index_name, distinct_keys, num_rows, leaf_blocks, avg_leaf_blocks_per_key
from
        user_indexes
where
        table_name = 'PEOPLE'
order by
        index_name
;


INDEX_NAME           DISTINCT_KEYS   NUM_ROWS LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY
-------------------- ------------- ---------- ----------- -----------------------
PE_HOME                        100      30399       15200                     152
PE_WORK                        100       1800         907                       9

As you can see I’ve generated two columns (id_town_home, id_town_work) with 100 distinct values and 10,000 rows each, but with very different data distributions. The rows for any given value for id_town_home are uniformly spread across the entire table (every hundredth row) while the rows for any given value of id_town_work are very tightly clustered as a group of 10,000 consecutive rows. As a consequence of the different patterns the index entry (bitmap string) for a typical key value for id_town_home is enormous and has to be broken into 304 chunks spread across 152 leaf blocks (2 index entries per leaf block) while the index entry for a typical key value for id_town_work is much smaller requiring only 18 chunks spread across 9 leaf blocks.

So what will I see if I run the following query twice, forcing it to use a BITMAP AND of the two indexes in the two possible orders:

select
        /*+ index_combine(pe) */
        max(small_vc)
from
        people pe
where
        id_town_home = 50
and     id_town_work = 50
;

Based on a very simple interpretation of the typical execution plan and using the index stats shown above we might expect to see roughly A-Rows = 18 with 9 buffer gets (plus a few more for segment headers and branch blocks) on the id_town_work index and A-Rows = 304 with 152 buffer gets on the id_town_home index to allow Oracle to generate and compare the two bit strings – but here are the two plans with their execution stats, generated in 12.1.0.2, and each run after flushing the buffer cache:

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     118 |    117 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     118 |    117 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     118 |    117 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      18 |     17 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      18 |     17 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      4 |00:00:00.01 |       4 |      4 |
-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |      1 |        |      1 |00:00:00.01 |     122 |    120 |
|   1 |  SORT AGGREGATE                      |         |      1 |      1 |      1 |00:00:00.01 |     122 |    120 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| PEOPLE  |      1 |    100 |    100 |00:00:00.01 |     122 |    120 |
|   3 |    BITMAP CONVERSION TO ROWIDS       |         |      1 |        |    100 |00:00:00.01 |      22 |     20 |
|   4 |     BITMAP AND                       |         |      1 |        |      1 |00:00:00.01 |      22 |     20 |
|*  5 |      BITMAP INDEX SINGLE VALUE       | PE_HOME |      1 |        |      5 |00:00:00.01 |       8 |      7 |
|*  6 |      BITMAP INDEX SINGLE VALUE       | PE_WORK |      1 |        |     18 |00:00:00.01 |      14 |     13 |
-------------------------------------------------------------------------------------------------------------------

We have not touched anything like the entire bit-string for the id_town_home index – a bit-string that spans 152 leaf blocks! Clearly Oracle is doing something clever to minimise the work, and it’s so clever that switching the order of these two extremely different indexes in the plan has made virtually no difference to the work done. Obviously I can’t tell you exactly what the code is doing, but I think I can suggest a reasonable hypothesis about what’s going on.

The pe_work index has the smaller number of leaf blocks per key, which makes it the better starting choice for the BITMAP AND in this case, so the optimizer’s default starting action was to pick the first couple of chunks of that index key value. The run-time code then sees that the first rowid that it could possibly need in its result set is roughly in the middle of the table – remember that the “key” columns of a bitmap index are actually (real_key, first_rowid_of chunk, last_rowid_of_chunk, compressed_bitstring).

Since it now knows the lowest possible rowid that it could need Oracle can now use this rowid to probe the pe_home index by (id_town_home=50, {target_rowid}) – which will let it go to a bitmap index chunk that’s roughly in the middle of the full range of 152 leaf blocks. Then Oracle can expand the bitstrings from the chunks it has, reading new chunks as needed from each of the indexes until the 18 chunks / 9 leaf block from the pe_work index have been used up (and that range would have aligned with just two or three chunks from the pe_home index) at which point Oracle can see there’s no more rows in the table that could match both predicates and it doesn’t need to read the next 70+ chunks of the pe_home index.

Conversely when I forced Oracle to use the (inappropriate) pe_home index first it read the first couple of chunks, then read the first couple of chunks of the pe_work index, at which point it discovered that it didn’t need any of the pe_home index prior to (roughly) chunk 75, so it jumped straight to the right chunk (without reading all the leaf blocks in between) to align with pe_work and carried on from there. That’s why the forced, less efficient, plan that visited pe_home first visited just a couple more leaf blocks than the plan the optimizer selected for itself.

Bottom line on performance

Oracle is sufficiently smart about checking the start and end ranges on bitmap indexes (rather then arbitrarily expanding the entire bit string for each key) that even for very large bitmap index entries it will probably only access a couple of “redundant” leaf blocks per index even if it picks the worst possible order for using the indexes.

You’re far more likely (if you know your data) to notice Oracle picking the wrong indexes than you are to spot it using the right indexes in the wrong order – and given that bitmap indexes tend to be relatively small and well buffered (compared to the tables), and given the relatively large number of rows we pick by random I/O from fact tables, a little extra work in the bitmap indexes is unlikely to make a significant difference to the overall performance of most queries.

Footnote

In the unlikely circumstances that you do find a special case where it will make a difference (and it will probably be a difference in CPU usage) then you can dictate the order of the indexes with the undocumented bitmap_tree() hint.  I may get round to writing up the variations one day but, for this simple case, the index_combine() hint that I initially used to force the BITMAP AND turned into the following bitmap_tree() hint in the outline:


bitmap_tree(@sel$1            pe@sel$1                   and( (people.id_town_work)      (people.id_town_home)       ) )

bitmap_tree(@query_block      table_alias@query_block    and( ({first index definition}) ({second index definition}) ) )

Obviously this is not something you should throw into production code casually – check with Oracle support if you think it’s really necessary – but if you wanted to reverse the order of index usage in this case you could just swap the order of the index definitions. If you thought there was a third index that should be used you could include its definition (note that “index definition” consists of a list of “table_name.column_name”  in the brackets).

January 24, 2016

Semijoin_driver

Filed under: bitmaps,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:42 am GMT Jan 24,2016

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the bitmap merge operation so, to demonstrate a very simple case, I’ll start with a data set with just one bitmap index:

rem
rem     Script:         bitmap_merge.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2016
rem

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4    -- > comment to avoid wordpress format issue
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6   -- > comment to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t usually plan to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for “columns with a very small number of distinct values”. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it and this means that the rows for any given value are scattered widely across the table; but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, and there are 500 leaf blocks so two rows per leaf block.  (When I created the equivalent B-tree index on the column its size was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with rowsource execution statistics (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen? Since I’ve asked the question you’ve probably guessed that the answer is going to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        id, n1, rowid
from
        t1
where
        n1 in ( 
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;


-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

Results from the original query structure

         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Results from the rewritten query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a huge difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data but visits blocks in an inefficient order to acquire it we may be able to find ways of modifying the SQL to acquire the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table in the join appears twice in the rewrite, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

Update (August 2021)

Inevitably I end up finding my own blog notes when searching for stuff on the Internet, and when I found this one I re-ran the test on 19.11.0.0 to see if it would automatically take the bitmap merge strategy: it didn’.

 

 

April 23, 2015

Golden Oldies

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 8:45 am BST Apr 23,2015

I’ve just been motivated to resurrect a couple of articles I wrote for DBAZine about 12 years ago on the topic of bitmap indexes. All three links point to Word 97 documents which I posted on my old website (now saved on the Wayback Machine archive) in September 2003.

Despite their age they’re still surprisingly good.

Update: 26th April 2015

Prompted by my reply to comment #2 below to look at what I said about bitmap indexes in Practical Oracle 8i (published 15 years ago), and found this gem:

An interesting feature of bitmap indexes is that it is rather hard to predict how large the index segment will be. The size of a B-tree index is based very closely on the number of rows and the typical size of the entries in the index column. The size of a bitmap index is dictated by a fairly small number of bit-strings which may have been compressed to some degree depending upon the number of consecutive 1’s and 0’s.

To pick an extreme example, imagine a table of one million rows that has one column that may contain one of eight values ‘A’ to ‘H’ say, which has been generated in one of of the two following extreme patterns:

  • All the rows for a given value appear together, so scanning down the table we get 125,000 rows with ‘A’ followed by 125,000 rows of ‘B’ and so on.
  • The rows cycle through the values in turn, so scanning down the table we get ‘A’,’B’. . . ‘H’ repeated 125,000 times.

What will the bitmap indexes look like in the two cases case?

For the first example, the basic map for the ‘A’ value will be 125,000 one-bits, followed by 875,000 zero bits – which will be trimmed off. Splitting the 125,000 bits into bytes and adding the necessary overhead of about 12% we get an entry of the ‘A’ rows of 18K. A similar argument applies for each of the values ‘B’ to ‘H’, so we get a total index size of around 8 x 18K – giving 156K.

For the second example, the basic map for the ‘A’ value will be a one followed by 7 zeros, repeated 125,000 times. There is no chance of compression here, so the ‘A’ entry will start at 125,000 bytes. Adding the overhead this goes up to 140K, and repeating the argument for the values ‘B’ to ‘H’ we get a total index of 1.12 MB.

This wild variation in size looks like a threat, but to put this into perspective, a standard B-tree index on this column would run to about 12 Mb irrespective of the pattern of the data. It would probably take about ten times as long to build as well.

As we can see, the size of a bitmap index can be affected dramatically by the packing of the column it depends upon as well as the number of different possible values the column can hold and the number of rows in the table. The compression that is applied before the index is stored, and the amazing variation in the resulting index does mean that the number of different values allowed in the column can be much larger than you might first expect. In fact it is often better to think of bitmap indexes in terms of how many occurrences of each value there are, rather than in terms of how many different values exist. Viewing the issue from this direction, a bitmap is often better than a B-tree when each value occurs more than a few hundred times in the table (but see the note below following the description of bitmap index entries).

 

January 19, 2015

Bitmap Counts

Filed under: bitmaps,Indexing,Oracle,Performance,Troubleshooting — Jonathan Lewis @ 12:15 pm GMT Jan 19,2015

In an earlier (not very serious) post about count(*) I pointed out how the optimizer sometimes does a redundant “bitmap conversion to rowid” when counting. In the basic count(*) example I showed this wasn’t a realistic issue unless you had set cursor_sharing to “force” (or the now-deprecated “similar”). There are, however, some cases where the optimizer can do this in more realistic circumstances and this posting models a scenario I came across a few years ago. The exact execution path has changed over time (i.e. version) but the anomaly persists, even in 12.1.0.2.

First we create a “fact” table and a dimension table, with a bitmap index on the fact table and a corresponding primary key on the dimension table:


rem
rem     Script:         bitmap_conversion.sql    
rem     Dated:          Nov 2008
rem     Author:         Jonathan Lewis
rem

create table area_sales (
	area		varchar2(10)	not null,
	dated		date		not null,
	category	number(3)	not null,
	quantity	number(8,0),
	value		number(9,2),
	constraint as_pk primary key (dated, area),
	constraint as_area_ck check (area in ('England','Ireland','Scotland','Wales'))
)
;

insert into area_sales
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 3000 -- > comment to avoid wordpress format issue
)
select
	decode(mod(rownum,4),
		0,'England',
		1,'Ireland',
		2,'Scotland',
		3,'Wales'
	),
	sysdate + 0.0001 * rownum,
	mod(rownum-1,300),
	rownum,
	rownum
from
	generator,
	generator
where
	rownum <= 1e6 -- > comment to avoid wordpress format issue
;

create bitmap index as_bi on area_sales(category) pctfree 0;

create table dim (
	id	number(3) not null,
	padding	varchar2(40)
)
;

alter table dim add constraint dim_pk primary key(id);

insert into dim
select
	distinct category, lpad(category,40,category)
from	area_sales
;

commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'AREA_SALES',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'DIM',
		method_opt 	 => 'for all columns size 1',
		cascade		 => true
	);
end;
/

Now we run few queries and show their execution plans with rowsource execution statistics. First a query to count the number of distinct categories used in the area_sales tables, then a query to list the IDs from the dim table that appear in the area_sales table, then the same query hinted to run efficiently.


set trimspool on
set linesize 156
set pagesize 60
set serveroutput off

alter session set statistics_level = all;

select
	distinct category
from
	area_sales
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

==========================================
select  distinct category from  area_sales
==========================================
---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |      1 |        |    300 |00:00:00.01 |     306 |       |       |          |
|   1 |  HASH UNIQUE                 |       |      1 |    300 |    300 |00:00:00.01 |     306 |  2294K|  2294K| 1403K (0)|
|   2 |   BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |   1000K|    600 |00:00:00.01 |     306 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

As you can see, Oracle is able to check the number of distinct categories very quickly by scanning the bitmap index and extracting ONLY the key values from each of the 600 index entries that make up the whole index (the E-rows figure effectively reports the number of rowids identified by the index, but Oracle doesn’t evaluate them to answer the query).


=======================================================================
select  /*+   qb_name(main)  */  dim.* from dim where  id in (   select
   /*+     qb_name(subq)    */    distinct category   from
area_sales  )
========================================================================

----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    300 |00:00:10.45 |     341 |       |       |          |
|*  1 |  HASH JOIN SEMI               |       |      1 |    300 |    300 |00:00:10.45 |     341 |  1040K|  1040K| 1260K (0)|
|   2 |   TABLE ACCESS FULL           | DIM   |      1 |    300 |    300 |00:00:00.01 |      23 |       |       |          |
|   3 |   BITMAP CONVERSION TO ROWIDS |       |      1 |   1000K|    996K|00:00:02.64 |     318 |       |       |          |
|   4 |    BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |        |    599 |00:00:00.01 |     318 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------

What we see here is that (unhinted) oracle has converted the IN subquery to an EXISTS subquery then to a semi-join which it has chosen to operate as a HASH semi-join. But in the process of generating the probe (sescond) table Oracle has converted the bitmap index entries into a set of rowids – all 1,000,000 of them in my case – introducing a lot of redundant work. In the original customer query (version 9 or 10, I forget which) the optimizer unnested the subquery and converted it into an inline view with a distinct – but still performed a redundant bitmap conversion to rowids. In the case of the client, with rather more than 1M rows, this wasted a lot of CPU.


=====================================================================
select  /*+   qb_name(main)  */  dim.* from (  select   /*+
qb_name(inline)    no_merge    no_push_pred   */   distinct category
from   area_sales  ) sttv,  dim where  dim.id = sttv.category
=====================================================================

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |      1 |        |    300 |00:00:00.02 |     341 |       |       |          |
|*  1 |  HASH JOIN                     |       |      1 |    300 |    300 |00:00:00.02 |     341 |  1969K|  1969K| 1521K (0)|
|   2 |   VIEW                         |       |      1 |    300 |    300 |00:00:00.01 |     306 |       |       |          |
|   3 |    HASH UNIQUE                 |       |      1 |    300 |    300 |00:00:00.01 |     306 |  2294K|  2294K| 2484K (0)|
|   4 |     BITMAP INDEX FAST FULL SCAN| AS_BI |      1 |   1000K|    600 |00:00:00.01 |     306 |       |       |          |
|   5 |   TABLE ACCESS FULL            | DIM   |      1 |    300 |    300 |00:00:00.01 |      35 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------

By introducing a manual unnest in the original client code I avoided the bitmap conversion to rowid, and the query executed much more efficiently. As you can see the optimizer has predicted the 1M rowids in the inline view, but used only the key values from the 600 index entries. In the case of the client it really was a case of manually unnesting a subquery that the optimizer was automatically unnesting – but without introducing the redundant rowids.

In my recent 12c test I had to include the no_merge and no_push_pred hints. In the absence of the no_merge hint Oracle did a join then group by, doing the rowid expansion in the process; if I added the no_merge hint without the no_push_pred hint then Oracle did a very efficient nested loop semi-join into the inline view. Although this still did the rowid expansion (predicting 3,333 rowids per key) it “stops early” thanks to the “semi” nature of the join so ran very quickly:


=========================================================================
select  /*+   qb_name(main)  */  dim.* from (  select   /*+
qb_name(inline)    no_merge   */   distinct category  from   area_sales
 ) sttv,  dim where  dim.id = sttv.category
=========================================================================

-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |    300 |00:00:00.02 |     348 |
|   1 |  NESTED LOOPS SEMI            |       |      1 |    300 |    300 |00:00:00.02 |     348 |
|   2 |   TABLE ACCESS FULL           | DIM   |      1 |    300 |    300 |00:00:00.01 |      35 |
|   3 |   VIEW PUSHED PREDICATE       |       |    300 |   3333 |    300 |00:00:00.01 |     313 |
|   4 |    BITMAP CONVERSION TO ROWIDS|       |    300 |   3333 |    300 |00:00:00.01 |     313 |
|*  5 |     BITMAP INDEX SINGLE VALUE | AS_BI |    300 |        |    300 |00:00:00.01 |     313 |
-------------------------------------------------------------------------------------------------

Bottom line on all this – check your execution plans that use bitmap indexes – if you see a “bitmap conversion to rowids” in cases where you don’t then visit the table it may be a redundant conversion, and it may be expensive. If you suspect that this is happening then dbms_xplan.display_cursor() may confirm that you are doing a lot of CPU-intensive work to produce a very large number of rowids that you don’t need.

Update Feb 2020

The redundant rowid conversion is still present in 19.3 if you leave the optimizer to choose its own path. The manual unnest with no_merge and no_push_pred hints is still the fastest option.

 

January 9, 2015

count(*) – again !

Filed under: bitmaps,humour,Indexing,Oracle,Troubleshooting,Tuning — Jonathan Lewis @ 12:56 pm GMT Jan 9,2015

Because you can never have enough of a good thing.

Here’s a thought – The optimizer doesn’t treat all constants equally.  No explanations, just read the code – execution plans at the end:


SQL> drop table t1 purge;
SQL> create table t1 nologging as select * from all_objects;
SQL> create bitmap index t1_b1 on t1(owner);

SQL> alter session set statistics_level = all;

SQL> set serveroutput off
SQL> select count(*) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> select count(1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> select count(-1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

SQL> alter session set cursor_sharing = force;
SQL> alter system flush shared_pool;

SQL> select count(1) from t1;
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

So, are you expecting to see the same results and performance from every single one of those queries ?


select count(*) from t1
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.01 |       9 |      5 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.01 |       9 |      5 |
|   2 |   BITMAP CONVERSION COUNT     |       |      1 |  84499 |     31 |00:00:00.01 |       9 |      5 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |      5 |
----------------------------------------------------------------------------------------------------------

select count(1) from t1
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.01 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.01 |       9 |
|   2 |   BITMAP CONVERSION COUNT     |       |      1 |  84499 |     31 |00:00:00.01 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

select count(-1) from t1
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.43 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.43 |       9 |
|   2 |   BITMAP CONVERSION TO ROWIDS |       |      1 |  84499 |  84499 |00:00:00.22 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

SQL> alter session set cursor_sharing = force;
SQL> alter system flush shared_pool;

select count(1) from t1
select count(:"SYS_B_0") from t1    -- effect of cursor-sharing
-------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |00:00:00.46 |       9 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |00:00:00.46 |       9 |
|   2 |   BITMAP CONVERSION TO ROWIDS |       |      1 |  84499 |  84499 |00:00:00.23 |       9 |
|   3 |    BITMAP INDEX FAST FULL SCAN| T1_B1 |      1 |        |     31 |00:00:00.01 |       9 |
-------------------------------------------------------------------------------------------------

Check operation 2 in each plan – with the bitmap index in place there are two possible ways to count the rows referenced in the index – and one of them converts to rowids and does a lot more work.

The only “real” threat in this set of examples, of course, is the bind variable one – there are times when count(*) WILL be faster than count(1). Having said that, there is a case where a redundant “conversion to rowids” IS a threat – and I’ll write that up some time in the near future.

Trick question: when is 1+1 != 2 ?
Silly answer: compare the plan for: “select count (2) from t1” with the plan for “select count(1+1) from t1”

Note: All tests above run on 12.1.0.2

June 16, 2014

Bitmap Nulls

Filed under: bitmaps,Indexing,Infrastructure,NULL,Oracle — Jonathan Lewis @ 9:08 am BST Jun 16,2014

It’s fairly well known that in Oracle B-tree indexes on heap tables don’t hold entries where all the indexed columns are all null, but that bitmap indexes will hold such entries and execution plans can for predicates like “column is null” can use bitmap indexes. Here’s a little test case to demonstrate the point (I ran this last on 12.1.0.1):


create table t1 (val number, n1 number, padding varchar2(100));

insert into t1
select
	decode(rownum,1,to_number(null),rownum), rownum, rpad('x',100)
from
	all_objects
where
	rownum <= 1000
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;

commit;

create bitmap index t1_b1 on t1(val);

execute dbms_stats.gather_table_stats(user,'t1');

set autotrace traceonly explain

select * from t1 where val is null;

set autotrace off

The execution plan for this query, in the system I happened to be using, looked like this:


Execution Plan
----------------------------------------------------------
Plan hash value: 1201576309

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |    32 |  3488 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    32 |  3488 |     8   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |       |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | T1_B1 |       |       |            |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("VAL" IS NULL)

Note that the predicate section shows us that we used the “column is null” predicate as an access predicate into the index.

Of course, this is a silly little example – I’ve only published it to make a point and to act as a reference case if you ever need to support a claim. Normally we don’t expect a single bitmap index to be a useful way to access a table, we tend to use combinations of bitmap indexes to combine a number of predicates so that we can minimise the trips to a table as efficiently as possible. (And we certainly DON’T create a bitmap index on an OLTP system because it lets us access NULLs by index — OLTP and bitmaps don’t get on well together.)

If you do a symbolic block dump, by the way, you’ll find that the NULL is represented by the special “length byte” of 0xFF with no following data.

April 18, 2014

Bitmap loading

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 12:43 pm BST Apr 18,2014

Everyone “knows” that bitmap indexes are a disaster (compared to B-tree indexes) when it comes to DML. But at an event I spoke at recently someone made the point that they had observed that their data loading operations were faster when the table being loaded had bitmap indexes on it than when it had the equivalent B-tree indexes in place.

There’s a good reason why this can be the case.  No prizes for working out what it is – and I’ll supply an answer in a couple of days time.  (Hint – it may also be the reason why Oracle doesn’t use bitmap indexes to avoid the “foreign key locking” problem).

Answer

As Martin (comment 3) points out, there’s a lot of interesting information in the statistics once you start doing the experiment. So here’s some demonstration code, first we create a table with one of two possible indexes:

rem
rem     Script:         bitmap_v_btree-load.sql
rem     Author:         Jonathan Lewis
rem     Dated:          April 2014
rem 

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        btree_col,
        mod(rownum,1000)        bitmap_col,
        rpad('x',100)           padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create        index t1_btree on t1(btree_col) nologging;
-- create bitmap index t1_bitmap on t1(bitmap_col) nolog

ging;

You’ll note that the two columns I’m going to build indexes on hold the same data in the same order – and it’s an order with maximum scatter because of the mod() function I’ve used to create it. It’s also very repetitive data, having 1000 distinct values over 1,000,0000 rows. With the data and (one of) the indexes in place I’m going to insert another 10,000 rows:

execute snap_my_stats.start_snap

insert /* append */ into t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        1e6 + rownum            id,
        mod(rownum,1000)        btree_col,
        mod(rownum,1000)        bitmap_col,
        rpad('x',100)           padding
from
        generator
;

execute snap_my_stats.end_snap

You’ll note that I’ve got an incomplete append hint in the code – I’ve tested the mechanism about eight different ways, and left the append in as a convenience, but the results I want to talk about (first) are with the hint disabled so that the insert is a standard insert. The snap_my_stats calls are my standard mechanism to capture deltas of my session statistics (v$mystat) – one day I’ll probably get around to using Tanel’s snapper routine everywhere – and here are some of the key results produced in the two tests:


11.2.0.4 with B-tree
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   31,403
DB time                                                                     64
db block gets                                                           31,195
consistent gets                                                            208
db block changes                                                        21,511
redo entries                                                            10,873
redo size                                                            3,591,820
undo change vector size                                                897,608
sorts (memory)                                                               2
sorts (rows)                                                                 1

11.2.0.4 with bitmap
====================
Name                                                                     Value
----                                                                     -----
session logical reads                                                   13,204
DB time                                                                     42
db block gets                                                            8,001
consistent gets                                                          5,203
db block changes                                                         5,911
redo entries                                                             2,880
redo size                                                            4,955,896
undo change vector size                                              3,269,932
sorts (memory)                                                               3
sorts (rows)                                                            10,001

As Martin has pointed out, there are a number of statistics that show large differences between the B-tree and bitmap approaches, but the one he didn’t mention was the key: sorts (rows). What is this telling us, and why could it matter so much ? If the B-tree index exists when the insert takes place Oracle locates the correct place for the new index entry as each row is inserted which is why you end up with so many redo entries, block gets and block changes; if the bitmap index exists, Oracle postpones index maintenance until the table insert is complete, but accumulates the keys and rowids as it goes then sorts them to optimize the rowid to bitmap conversion and walks the index in order updating each modified key just once.

The performance consequences of the two different strategies depends on the number of indexes affected, the number of rows modified, the typical number of rows per key value, and the ordering of the new data as it arrives; but it’s possible that the most significant impact could come from ordering.  As each row arrives, the relevant B-tree indexes are modified – but if you’re unlucky, or have too many indexes on the table, then each index maintenance operation could result in a random disk I/O to read the necessary block (how many times have you seen complaints like: “we’re only inserting 2M rows but it’s taking 45 minutes and we’re always waiting on db file sequential reads”). If Oracle sorts the index entries before doing the updates it minimises the random I/O because it need only update each index leaf block once and doesn’t run the risk of re-reading many leaf blocks many times for a big insert.

Further Observations

The delayed maintenance for bitmap indexes (probably) explains why they aren’t used to avoid the foreign key locking problem.  On a large insert, the table data will be arriving, the B-tree indexes will be maintained in real time, but a new child row of some parent won’t appear in the bitmap index until the entire insert is complete – so another session could delete the parent of a row that exists, is not yet committed, but is not yet visible. Try working out a generic strategy to deal with that type of problem.

It’s worth noting, of course, that when you add the /*+ append */ hint to the insert then Oracle uses exactly the same optimization strategy for B-trees as it does for bitmaps – i.e. postpone the index maintenance, remember all the keys and rowids, then sort and bulk insert them.  And when you’ve remembered that, you may also remember that the hint is (has to be) ignored if there are any enabled foreign key constraints on the table. The argument for why the hint has to be ignored and why bitmap indexes don’t avoid the locking problem is (probably) the same argument.

You may also recall, by the way, that when you have B-tree indexes on a table you can choose the optimal update or delete strategy by selecting a tablescan or index range scan as the execution path.  If you update or delete through an index range scan the same “delayed maintenance” trick is used to optimize the index updates … except for any indexes being used to support foreign key constraints, and they are maintained row by row.

In passing, while checking the results for this note I re-ran some tests that I had originally done in 2006 and added one more test that I hadn’t considered at the time; as a result I can also point out that index will see delayed maintenance if you drive the update or delete with an index() hint, but not if you drive it with an index_desc() hint.

January 17, 2014

Bitmap question

Filed under: bitmaps,Indexing,Oracle — Jonathan Lewis @ 7:06 pm GMT Jan 17,2014

If you know anything about bitmap indexes you probably know that a single entry in a bitmap index takes the form (key_value, starting rowid, ending rowid, BBC compressed bit string). So an entry covers a single value for a column over a range of rowids  in the table, and the string of bits for that (notional) range is reduce to a minimum by a compression mechanism that eliminate repeated zeros in multiples of 8.

So here’s a question – to which I don’t know the answer, although you may be surprised when you try to find it:

If you have a very large table and in one of its columns the first row and the last row (and no others) hold the value 0 (say) and you create a bitmap index on this column, what’s the largest number of rows you could have in the table before Oracle would HAVE to create two index entries in order to cover both rows ?

Follow-up question – once you start getting close to working out the answer, can you think of a way to provide an example without actually creating a table with that many rows in it ?

 

October 21, 2009

Bitmap Updates

Filed under: bitmaps,Indexing,Infrastructure,Oracle,Performance,redo,undo — Jonathan Lewis @ 5:33 pm BST Oct 21,2009

It is fairly well-known that bitmap indexes are very dense structures that can behave badly if their underlying tables are subject to even fairly low levels of insert, update or delete activity. Problems include contention, space management and performance, and these problens have spawned a couple of well-known guidelines relating to bitmap indexes:

  • Avoid concurrent modification of data by multiple processes – otherwise you can end up with processes dead-locking
  • Drop/disable bitmap indexes before data loads and rebuild them afterwards.

Of course, with a little care and experimentation, you may find that you don’t need to apply the second guideline in all cases – especially for bulk inserts.
(more…)

Website Powered by WordPress.com.