Here’s another of my “draft” notes that needs some expansion and, most importantly, proof.
I have a fact table with a status id column that shows a massive skew. But I also have a dimension table that holds the “status code” so (in theory, at least) I have to do a join from the statuses table to the facts table to find rows of a given status. Unfortunately the join hides the skew:
select f.* from facts f, statuses s where s.code = 'C' and f.status_id = s.status_id ;
The optimizer knows that the status_id column on the facts table has a highly skewed distribution and will create a histogram on it, but it can’t know which status code corresponds to which status_id so the histogram doesn’t help in calculating the join cardinality.
Will a bitmap join index help? Answer: NO.
A bitmap join index will produce a virtual column on the facts table, but you can’t collect stats on the virtual column. In fact, the optimizer won’t use the stats even if you program them into existence with calls to dbms_stats.set_column_stats(); the only thing the optimizer will use is the number of distinct keys in the index.
For the optimizer to collect the stats the Oracle developers would have to come up with a new way to collect column stats by reading an index on that column – but it would be useful.
Possible workaround – don’t do this join, rewrite the query with an IN subquery with a /*+ precompute_subquery */ hint to make the optimizer generate an IN-list of literals at optimisation time.
Is rewriting the code an option? If yes, you could select status_id for the given code in advance. Then include the selected value in the where-clause and avoid the join.
Comment by Todor Botev — December 10, 2013 @ 6:12 pm GMT Dec 10,2013 |
I have seen this issue during my investigation about star transformation feature (thanks Jonathan about the nice articles, which helped me a lot !). I ended up with using “faked” statistics on the dimension attribute columns, which represents the fact table’s skew into the dimension column’s histogram. In this way, without touching the code (which is the case with pre-build BI tools), I was able to “pass” the right cardinality estimation to the Optimizer.
Comment by Edward Hayrabedian — December 11, 2013 @ 2:02 pm GMT Dec 11,2013 |
Edward,
This is a very late reply – but only because it’s breaking news.
You may find that this strategy doesn’t work anymore when you upgrade to 12.2 – there seems to be a sanity check that makes the optimizer ignore an “impossible” histogram and go back to “num distinct”.
Comment by Jonathan Lewis — September 27, 2018 @ 5:49 pm BST Sep 27,2018 |
[…] that f(500) will be evaluated in the same way that we can trigger in-list calculation with the precompute_subquery […]
Pingback by CBO catchup | Oracle Scratchpad — August 7, 2015 @ 1:10 pm BST Aug 7,2015 |
[…] been prompted by a recent question on the ODC database forum to revisit a note I wrote nearly five years ago about bitmap join indexes and their failure to help with join cardinalities. At the time I made a […]
Pingback by Bitmap Join Indexes | Oracle Scratchpad — May 18, 2018 @ 2:29 pm BST May 18,2018 |
[…] was over (in fact someone else had described their use of exactly his suggested approach in a comment on a much older blog note about this problem): take the histogram from the id_status column on the facts table and […]
Pingback by Hacking for Skew | Oracle Scratchpad — September 28, 2018 @ 1:24 pm BST Sep 28,2018 |