Oracle Scratchpad

July 1, 2011

Partitioned Bitmaps

Filed under: Index Joins,Indexing,Oracle,Partitioning — Jonathan Lewis @ 5:19 pm BST Jul 1,2011

The following question appeared in a comment to an earlier posting on multi-column bitmap indexes and the inability of Oracle to create a bitmap index join when (to the human eye) the strategy was an obvious choice.

    I have a query which is using 2 indexes both are bitmap indexes (sizes are 37 and 24 Mbs) and table size is 17gb. While i ran the following query which can very well get the index itself, it takes around 6-8 minutes and using pga around 3 gb.

could you please explain me why ?

SQL_ID  5z0a50a2yzdky, child number 0
-------------------------------------
select count(1) from (select distinct SRC_SYS,PROD_KEY from dds.REV_F)

Plan hash value: 867747470

--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                   |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                            |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                    |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     VIEW                          | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  5 |      HASH JOIN                    |                      |       |       |            |          |       |       |
|   6 |       PARTITION LIST ALL          |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|   7 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   8 |         BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|   9 |       PARTITION LIST ALL          |                      |    63M|  1209M| 13724   (1)| 00:02:45 |     1 |   145 |
|  10 |        BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |         BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
--------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(ROWID=ROWID)

28 rows selected.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2    610.89    1464.86     707459      17090          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4    610.90    1464.87     707459      17090          0           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=17090 pr=707459 pw=446115 time=1464867976 us)
  26066   VIEW  (cr=17090 pr=707459 pw=446115 time=1464795748 us)
  26066    HASH UNIQUE (cr=17090 pr=707459 pw=446115 time=1464769678 us)
63422824     VIEW  index$_join$_002 (cr=17090 pr=707459 pw=446115 time=1084846545 us)
63422824      HASH JOIN  (cr=17090 pr=707459 pw=446115 time=958000889 us)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=63423134 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)
   7112         BITMAP INDEX FULL SCAN REV_F_IDX1 PARTITION: 1 145 (cr=3561 pr=0 pw=0 time=155525 us)(object id 366074)
63422824       PARTITION LIST ALL PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=190268771 us)
63422824        BITMAP CONVERSION TO ROWIDS (cr=13529 pr=8864 pw=0 time=63553723 us)
 432700         BITMAP INDEX FULL SCAN REV_F_IDX5 PARTITION: 1 145 (cr=13529 pr=8864 pw=0 time=3157351 us)(object id 366658)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  direct path write temp                      29741        1.62        107.05
  db file sequential read                      8864        0.20          2.35
  direct path read temp                       46573        0.79        211.02
  SQL*Net message from client                     2       29.22         29.22

In this case Oracle is clearly doing the bitmap join, but it’s managing to do something very inefficient. The problem lies in the partitioning or, to be more precise, Oracle’s failure to take advantage of partitioning. The OP complains of using 3GB of memory and several minutes of elapsed time. The plan shows us that the we have 145 partitions (PARTITION LIST ALL PARTITION: 1 145), and we have been told that the table is about 17GB is size, so the “average” partition is about 120MB – so why isn’t Oracle using a partition-wise approach and processing the data one partition at a time ? The answer is simple – it can’t be done (at present).

An index join works by doing a hash join between rowids – and since we are using bitmap indexes we have to convert bitmaps to rowids as part of the plan. In this query we then want to count the number of distinct combintations of (SRC_SYS,PROD_KEY) – and the same combination may appear in different partitions, so Oracle has had to generate a plan that handles the entire data set in a single join rather than trying to handle each partition separately (notice how the “partition list all” operator appears twice, once for each index).

The “Row Source Operation” tells us we only had to scan a few thousand block, but we have to build a hash table of 64 million entries:

63422824        BITMAP CONVERSION TO ROWIDS (cr=3561 pr=0 pw=0 time=9554 us)

At 10 bytes per rowid (for a partitioned table), plus the length of the input column, plus linked list overheads (8 bytes per pointer) you can see the 3 GB beginning to appear out of the volume of work being done. And then the Oracle dumped the whole thing to disc (perhaps in anticipation of the impending “hash unique”, perhaps because it still needed to do a one-pass operation – it would have been interesting to see the full row source execution statistics from a call to dbms_xplan.display_cursor())

What we need to see is a plan more like the following:

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                      |       |       |   221K(100)|          |       |       |
|   1 |  SORT AGGREGATE                    |                      |     1 |       |            |          |       |       |
|   2 |   VIEW                             |                      | 24533 |       |   221K  (6)| 00:44:22 |       |       |
|   3 |    HASH UNIQUE                     |                      | 24533 |   479K|   221K  (6)| 00:44:22 |       |       |
|   4 |     PARTITION LIST ALL             |                      |    63M|  1209M|  3591   (1)| 00:00:44 |     1 |   145 |
|*  5 |      HASH UNIQUE                   |                      |       |       |            |          |       |       |
|   6 |       VIEW                         | index$_join$_002     |    63M|  1209M|   212K  (2)| 00:42:28 |       |       |
|*  7 |        HASH JOIN                   |                      |       |       |            |          |       |       |
|   8 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M|  3591   (1)| 00:00:44 |       |       |
|   9 |          BITMAP INDEX FULL SCAN    | REV_F_IDX1           |       |       |            |          |     1 |   145 |
|  10 |         BITMAP CONVERSION TO ROWIDS|                      |    63M|  1209M| 13724   (1)| 00:02:45 |       |       |
|  11 |          BITMAP INDEX FULL SCAN    | REV_F_IDX5           |       |       |            |          |     1 |   145 |
-------------------------------------------- ------------------------------------------------------------------------------

Line 4 shows us that we do something for each partition in turn. The sub-plan to line 4 tells us that we are collecting the unique combinations of (SRC_SYS,PROD_KEY) for a given partition. Line 3 tells us that we are collating the results from the different partitions and generating the set of values that is unique across all partitions.

The problem is this: can we engineer a strange piece of SQL that makes plan appear – because the optimizer isn’t going to do it automatically (yet).

Obviously it would be pretty easy to write some sort of solution using pl/sql and pipelined functions - perhaps a function that takes a table_name loops through each partition of the table in turn returning the distinct combinations for that partition, as this would allow you to “select count(distinct(…)) from table_function(…);” to get your result.

You might be able to avoid pl/sql by creating a piece of SQL joining the table’s metadata to the table by partition identifier – except you would probably need to use a lateral view, which Oracle doesn’t support, and make the partition extended syntax part of the lateral dependency .. which Oracle definitely doesn’t support.

So is there an alternative, purely SQL, strategy ?

I’m thinking about it – I have a cunning plan, but I haven’t had time to test it yet.

to be continued …

Update (7th July 2011):

I’ve finally managed to make a bit of time to write up my notes about this – and in the interim Randolf Geist has said everything that needs to be said, and the OP has pointed out that a full tablescan works faster anyway. However, here goes …

My cunning plans involved finding ways of forcing Oracle into doing a single partition hash join for each partition in turn by playing clever  tricks with dbms_mview.pmarker or the dba_tab_partitions view, or even the tbl$or$idx$part$num() function. But I realised I was in trouble as soon as I wrote the first  little test that was supposed to do an index hash join on a single explicitly referenced partition. Here’s my table description, index definitions, query and execution plan:

SQL> desc t1
 Name                    Null?    Type
 ----------------------- -------- ----------------
 NVCUSTATUS                       VARCHAR2(20)        -- list partitined on this column, multiple values in some partitions
 FREQ_COLUMN                      NUMBER(4)
 HBAL_COLUMN                      NUMBER(8)
 RAND_COLUMN                      NUMBER(8)

create bitmap index t1_freq on t1(freq_column) local;
create bitmap index t1_hbal on t1(hbal_column) local;

select
	count(1)
from	(
	select
		/*+index_join(t1) */
		distinct freq_column, hbal_column
	from t1 partition (p6)
	)
;

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                  |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|   1 |  VIEW                          | index$_join$_001 |   894K|  6112K|  2966   (2)| 00:00:36 |       |       |
|*  2 |   HASH JOIN                    |                  |       |       |            |          |       |       |
|   3 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   208   (0)| 00:00:03 |   KEY |   KEY |
|   4 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   208   (0)| 00:00:03 |       |       |
|   5 |      BITMAP INDEX FULL SCAN    | T1_FREQ          |       |       |            |          |     6 |     6 |
|   6 |    PARTITION LIST SINGLE       |                  |   894K|  6112K|   299   (0)| 00:00:04 |   KEY |   KEY |
|   7 |     BITMAP CONVERSION TO ROWIDS|                  |   894K|  6112K|   299   (0)| 00:00:04 |       |       |
|   8 |      BITMAP INDEX FULL SCAN    | T1_HBAL          |       |       |            |          |     6 |     6 |
-------------------------------------------------------------------------------------------------------------------

Even though we address a single partition explicitly, and even though the optimizer recognises it as partition 6 in lines 5 and 8, the plan isn’t doing a “partition-wise” join between the two indexes – the hash join calls two independent partition operations. In effect, Oracle is joining two different tables, and there isn’t a join condition on the partitioning column.

If we change the index definitions, though, to append the partitioning column (and given the small number of distinct values in each partition this didn’t affect the index size by much), and then rewrite the query to include the join, we get a better result – but only if we do a manual rewrite of the query (and if the partitioning column is declared not null).


alter table t1 modify nvcustatus not null;

drop index t1_freq;
drop index t1_hbal;

create bitmap index t1_freq on t1(freq_column, nvcustatus) local;
create bitmap index t1_hbal on t1(hbal_column, nvcustatus) local;

select
	count(*)
from	(
	select
		/*+
			qb_name(main)
			no_eliminate_join(@SEL$96BB32CC t1@hbal)
			no_eliminate_join(@SEL$96BB32CC t1@freq)
		*/
		distinct ftb.freq_column, htb.hbal_column
	from
		(
		select
			/*+ qb_name(freq) index(t1 t1_freq) */
			nvcustatus, freq_column, rowid frid
		from
			t1
		)	ftb,
		(
		select
			/*+ qb_name(hbal) index(t1 t1_hbal) */
			nvcustatus, hbal_column, rowid hrid
		from
			t1
		)	htb
	where
		ftb.frid = htb.hrid
	and	ftb.nvcustatus= htb.nvcustatus
	)
;

-------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
-------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |         |      1 |        |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   1 |  SORT AGGREGATE                  |         |      1 |      1 |      1 |00:00:05.44 |     719 |   4380 |   3660 |       |       |          |         |
|   2 |   VIEW                           |         |      1 |  35118 |  23223 |00:00:05.50 |     719 |   4380 |   3660 |       |       |          |         |
|   3 |    HASH UNIQUE                   |         |      1 |  35118 |  23223 |00:00:05.46 |     719 |   4380 |   3660 |  1268K|  1268K| 2647K (0)|         |
|   4 |     PARTITION LIST ALL           |         |      1 |    330K|   1000K|00:00:18.44 |     719 |   4380 |   3660 |       |       |          |         |
|*  5 |      HASH JOIN                   |         |      8 |    330K|   1000K|00:00:15.34 |     719 |   4380 |   3660 |   283M|    16M|   27M (1)|   31744 |
|   6 |       BITMAP CONVERSION TO ROWIDS|         |      8 |   1000K|   1000K|00:00:00.99 |     296 |    289 |      0 |       |       |          |         |
|   7 |        BITMAP INDEX FULL SCAN    | T1_FREQ |      8 |        |   1831 |00:00:00.30 |     296 |    289 |      0 |       |       |          |         |
|   8 |       BITMAP CONVERSION TO ROWIDS|         |      7 |   1000K|   1000K|00:00:06.48 |     423 |    416 |      0 |       |       |          |         |
|   9 |        BITMAP INDEX FULL SCAN    | T1_HBAL |      7 |        |   7353 |00:00:00.41 |     423 |    416 |      0 |       |       |          |         |
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("NVCUSTATUS"="NVCUSTATUS" AND ROWID=ROWID)

As you can see from the plan, I ran one of my tests with rowsource execution statistics enabled, so that I could count starts.

My query basically breaks the implicit join of two indexes into an explcit join so that I can include the join on the partitioning column. You’ll notice that I now iterate through each partition – line 4 – and do a hash join for each pair of partitions. This is the target I was hoping for. The theory was that by doing (up to) eight small hash joins, each join will operate in memory, rather than flooding one large build table to disc. Of course, the thing I’m building the hash table with is now three values (rowid, freq_value, nvcustatus) rather than the two values from the implicit join (rowid, freq_value), so if I’m unlucky this version of the query could take even longer than the original because it may have to do more I/O.

If you’re wondering why I only started lines 8 and 9 seven times instead of eight, this was because one of may partitions was empty, so Oracle had to scan it once as the build table to find that it was empty, then didn’t need to scan it as the probe table.

Footnote: Although I got the plan I wanted – it really didn’t make much difference, and even with the million rows I was using it was fast to solve the problem with a tablescan. The relative benefit of the different plans would be dicatated by the length of rows, the lengths of the relevant columns, the number of partitions, and the available memory for hashing.

7 Comments »

  1. Well, the strategy of generating sub-results that are then propagated to a parent operation for doing the final result across all sub-results can already been seen when performing aggregates with parallel execution, for example a MAX() will be applied by each parallel slave on its own set, and a second, final MAX() will be done by the query coordinator across all sub-aggregates provided by the parallel slaves.

    It is a pity that Oracle doesn’t support this (yet) for above partition-wise join case.

    Although not the same as what you’ve proposed and not with the same efficiency, it would be interesting to see the plan when requesting parallel execution, something like

    select count(1) from (select /*+ parallel_index(REV_F) index_join(REV_F REV_F_IDX1 REV_F_IDX5) */ distinct SRC_SYS,PROD_KEY from dds.REV_F);

    It will probably still suffer from the same problem, but at least it will very likely show two levels of HASH UNIQUE operations (although not at the coordinator level, so I’m not entirely sure if it is the same as described above for the MAX() case), so it comes at least closer to your proposal.

    Randolf

    Comment by Randolf Geist — July 1, 2011 @ 9:03 pm BST Jul 1,2011 | Reply

  2. [...] Jonathan Lewis answers another burning question on on multi-column bitmap indexes and the inability of Oracle to create a bitmap index join when (to the human eye) the strategy was an obvious choice. [...]

    Pingback by Log Buffer #227, A Carnival of the Vanities for DBAs | The Pythian Blog — July 2, 2011 @ 6:25 am BST Jul 2,2011 | Reply

  3. On second thoughts I’m not entirely convinced that the problem has anything to do with the HASH UNIQUE operation. It seems to me more like the general issue that the index join is simply not capable of taking advantage of partition-wise joins, probably because the partition key is not part of the join condition (although it could be implied).

    Of course your plan proposed would be ideal since it could run the main HASH UNIQUE operation as part of the partition-wise operation, but since according to the original trace the main problem is the HASH JOIN, not the HASH UNIQUE, I think turning the HASH JOIN into a partition-wise join would eliminate most of the excess work.

    I’ve did a quick test with a manual re-write of an index join as you’ve proposed in your series about index joins and managed to arrive at a plan that does a partition-wise index join based on the two bitmap indexes and a single HASH UNIQUE operation on top of that. It’s not the plan you’ve proposed (since it does not a splitted HASH UNIQUE with one as part of the partition-wise operation) but I think solves the main problem described here.

    Simple index join without any other operation does not make use of partition-wise operations:

    ---------------------------------------------------------------------------
    | Id  | Operation                      | Name             | Pstart| Pstop |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT               |                  |       |       |
    |   1 |  VIEW                          | index$_join$_001 |       |       |
    |*  2 |   HASH JOIN                    |                  |       |       |
    |   3 |    PARTITION LIST ALL          |                  |     1 |     4 |
    |   4 |     BITMAP CONVERSION TO ROWIDS|                  |       |       |
    |   5 |      BITMAP INDEX FULL SCAN    | T_BIDX1          |     1 |     4 |
    |   6 |    PARTITION LIST ALL          |                  |     1 |     4 |
    |   7 |     BITMAP CONVERSION TO ROWIDS|                  |       |       |
    |   8 |      BITMAP INDEX FULL SCAN    | T_BIDX2          |     1 |     4 |
    ---------------------------------------------------------------------------
    

    The plan I’ve arrived at:

    ---------------------------------------------------------------------
    | Id  | Operation                         | Name    | Pstart| Pstop |
    ---------------------------------------------------------------------
    |   0 | SELECT STATEMENT                  |         |       |       |
    |   1 |  SORT AGGREGATE                   |         |       |       |
    |   2 |   VIEW                            |         |       |       |
    |   3 |    HASH UNIQUE                    |         |       |       |
    |   4 |     PARTITION LIST ALL            |         |     1 |     4 |
    |*  5 |      HASH JOIN                    |         |       |       |
    |   6 |       BITMAP CONVERSION TO ROWIDS |         |       |       |
    |   7 |        BITMAP INDEX FAST FULL SCAN| T_BIDX4 |     1 |     4 |
    |   8 |       BITMAP CONVERSION TO ROWIDS |         |       |       |
    |   9 |        BITMAP INDEX FAST FULL SCAN| T_BIDX3 |     1 |     4 |
    ---------------------------------------------------------------------
    

    It required however the bitmap indexes to include the partition key (and use it as part of the join condition in addition to the ROWID) which is certainly a nuisance.

    Randolf

    Comment by Randolf Geist — July 2, 2011 @ 2:38 pm BST Jul 2,2011 | Reply

  4. Jonathan

    I only asked this question in previous thread. Table size 35gb and bitmap index size on SRC_SYS,PROD_KEY are below 40mb. If i created btree indexes then it would be around 1gb size. while reading documents about tuning DWH systems, it said having individual bitmap indexes on columns, those indexes can participate in Bitmap AND/OR operation and will do much better than the btree indexes for adhoc queries which has multiple column on the predicates.

    I thought of instead of scanning 35 gb table / scanning btree indexes of 1GB on this column, create bitmap indexes and joining of 35 mb bitmap index with 40 mb bitmap index would be better than the above.

    But running this query, with Full table scan plan, finishes faster than the above plan. FTS plan, did not take this much of sort area.
    Bitmap join plan, converting into rowids took around 3Gig sort and took a long time.

    Is have bitmap indexes on DWH fact tables will not be benefitted on bitmap AND/OR operation?

    Comment by goiyala3 — July 6, 2011 @ 8:27 am BST Jul 6,2011 | Reply

    • Goiyala3,

      In general, bitmap indexes are very useful in data warehouse systems, and the most dramatic benefit you get from bitmap indexes appears when Oracle combines multiple bitmap indexes (which are often just single column indexes) to reduce the visits to table blocks to the smallest necessary set.

      In this case, though, it’s unfortunate that there is an “obvious” optimisation that should apply when you do a bitmap index join that simply hasn’t been allowed for in the optimizer code.

      Bear in mind, though, that the query you’re looking at would have to visit every row (hence block) in the table if the bitmap indexes weren’t there – so you’re not using bitmap indexes in the “standard” way of avoid unnecessary visits to table blocks, so you’ve just been a bit unlucky at the edges of the code when you combine bitmap indexes, partitioned tables, and count(*) optimisation.

      Comment by Jonathan Lewis — July 7, 2011 @ 10:47 am BST Jul 7,2011 | Reply

      • thanks for reply jonathan. I still confused about my actual plan.
        1. Is my query plan visited the actual table block ?
        2. if Bitmap index join can take place only when it is single column bitmap index, then all the bitmap join operations will suffer with this problem
        of building huge hash table?

        Comment by goiyala3 — July 8, 2011 @ 2:21 am BST Jul 8,2011 | Reply

        • 1. Your plan is not visiting the table – if it were you would (typically) see a “table access by rowid” operation being fed by the index operation.
          2. bitmap index join is not restricted to single-column bitmap indexes – your problem here is that the optimizer is losing sight of the “partition-wise” type of strategy that could be used.
          .

          Comment by Jonathan Lewis — July 25, 2011 @ 7:40 am BST Jul 25,2011


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,877 other followers