Oracle Scratchpad

December 9, 2013

Bitmap join indexes

Filed under: Indexing,Oracle,Statistics — Jonathan Lewis @ 6:01 pm BST Dec 9,2013

Here’s another of my “draft” notes that needs some exapansion and, most importantly, proof.

I have a fact table with a status id column that shows a massive skew. But I also have a dimension table that holds the “status code” so (in theory, at least) I have to do a join from the statuses table to the facts table to find rows of a given status. Unfortunately the join hides the skew:

select  f.*
from    facts f, statuses s
where   s.code = 'C'
and     f.status_id = s.status_id
;

The optimizer knows that the status_id column on the facts table has a highly skewed distribution and will create a histogram on it, but it can’t know which status code corresponds to which status_id, so the histogram doesn’t help in calculating the join cardinality.

Will a bitmap join index help ? Answer – NO.

It creates a virtual column on the facts table, but you can’t collect stats on the virtual column. The optimizer won’t use the stats even if you create them, all it will use is the number of distinct keys in the index.

For the optimizer to collect the stats the Oracle developers would have to come up with a new way to collect column stats by reading an index on that column – but it would be useful.

Possible workaround – don’t do this join, rewrite the query with an IN subquery with a /*+ precompute_subquery */ hint to generate an IN-list at optimisation time.

3 Comments »

  1. Is rewriting the code an option? If yes, you could select status_id for the given code in advance. Then include the selected value in the where-clause and avoid the join.

    Comment by Todor Botev — December 10, 2013 @ 6:12 pm BST Dec 10,2013 | Reply

  2. I have seen this issue during my investigation about star transformation feature (thanks Jonathan about the nice articles, which helped me a lot !). I ended up with using “faked” statistics on the dimension attribute columns, which represents the fact table’s skew into the dimension column’s histogram. In this way, without touching the code (which is the case with pre-build BI tools), I was able to “pass” the right cardinality estimation to the Optimizer.

    Comment by Edward Hayrabedian — December 11, 2013 @ 2:02 pm BST Dec 11,2013 | Reply

  3. Interesting post, maybe you’ll find useful to take a look at this : http://www.areaetica.com/index-tuning/

    Comment by Fra — December 12, 2013 @ 11:23 am BST Dec 12,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,877 other followers