In an earlier article introducing the index join I raised a question that came up at the first ES2N virtual conference:
“If you hint an index hash join, is there any way of telling Oracle the order in which it should use the indexes?”
Consider the following example:
create table indjoin nologging as select rownum id, rownum val1, rownum val2, rownum val3, rpad('x',500) padding from all_objects where rownum <= 5000 ; /* alter table indjoin add constraint ij_pk primary key (id) */ create unique index ij_v1 on indjoin(id, val1); create unique index ij_v2 on indjoin(id, val2); create unique index ij_v3 on indjoin(id, val3); -- gather statistics: without histograms select /*+ index_join(ij) */ count(*) from indjoin ij where val1 between 100 and 200 and val2 between 50 and 150 and val3 between 250 and 550 ;
The query plan for this query is (thanks to the hint) a three-way index hash join:
----------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 74 | | 1 | SORT AGGREGATE | | 1 | 12 | | |* 2 | VIEW | index$_join$_001 | 1 | 12 | 74 | |* 3 | HASH JOIN | | | | | |* 4 | HASH JOIN | | | | | |* 5 | INDEX FAST FULL SCAN| IJ_V1 | 1 | 12 | 18 | |* 6 | INDEX FAST FULL SCAN| IJ_V2 | 1 | 12 | 18 | |* 7 | INDEX FAST FULL SCAN | IJ_V3 | 1 | 12 | 18 | ----------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("VAL1">=100 AND "VAL1"<=200 AND "VAL2">=50 AND "VAL2"<=150 AND "VAL3">=250 AND "VAL3"<=550) 3 - access(ROWID=ROWID) 4 - access(ROWID=ROWID) 5 - filter("VAL1"<=200 AND "VAL1">=100) 6 - filter("VAL2"<=150 AND "VAL2">=50) 7 - filter("VAL3"<=550 AND "VAL3">=250)
But what if you know the data better than Oracle, and know that the join order for the three indexes should be different – there are no extra direct hints you can add to the code to tell Oracle the best order for the hash join. (You might, of course, be able to make use of the cardinality() hint – or plan around with the undocumented, hence unsupported, opt_estimate() or column_stats() or index_stats() hints, but I wouldn’t be keen to use such an indirect approach.)
But you CAN rewrite the query to get the same mechanism working under your control. The code looks more complex – but we often have to make a trade between clarity (simplicity) and speed in critical cases, so you may find some examples where the complexity is acceptable:
select count(*) from ( select /*+ no_merge */ rowid from indjoin ij where val1 between 100 and 200 ) v1, ( select /*+ no_merge */ rowid from indjoin ij where val2 between 50 and 150 ) v2, ( select /*+ no_merge */ rowid from indjoin ij where val3 between 250 and 550 ) v3 where v2.rowid = v1.rowid and v3.rowid = v1.rowid ;
It’s another example of referencing a table twice (or three times) in the query because multiple references allow you to define a better execution path than a single reference. The execution we get from this plan (running under 10.2.0.3) is as follows:
------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 36 | 14 | | 1 | SORT AGGREGATE | | 1 | 36 | | |* 2 | HASH JOIN | | 102 | 3672 | 14 | |* 3 | HASH JOIN | | 102 | 2448 | 9 | | 4 | VIEW | | 102 | 1224 | 4 | |* 5 | INDEX FAST FULL SCAN| IJ_V1 | 102 | 1632 | 4 | | 6 | VIEW | | 102 | 1224 | 4 | |* 7 | INDEX FAST FULL SCAN| IJ_V2 | 102 | 1632 | 4 | | 8 | VIEW | | 302 | 3624 | 4 | |* 9 | INDEX FAST FULL SCAN | IJ_V3 | 302 | 4832 | 4 | ------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("V3".ROWID="V1".ROWID) 3 - access("V2".ROWID="V1".ROWID) 5 - filter("VAL1">=100 AND "VAL1"<=200) 7 - filter("VAL2">=50 AND "VAL2"<=150) 9 - filter("VAL3">=250 AND "VAL3"<=550)
By creating three explicit query blocks (which I’ve ring-fenced with no_merge hints), one for each index, I’ve made Oracle extract the same three sets of data that it was using in the index hash join. I’ve then left Oracle to join the three result sets – which it has done with hash joins. Interestingly this seems to have done a little less work than the original index join – the final complex filter action doesn’t appear in the manual rewrite.
Since I’ve now got a query that seems to be “just” a three table join, I can dictate the join order, guarantee the hash joins, and dictate which rowsources should be used as build rowsources, and which as probe. For example, let’s apply the following hints:
select /*+ leading (v1 v3 v2) use_hash(v3) no_swap_join_inputs(v3) use_hash(v2) swap_join_inputs(v2) */ count(*) from ....
The resulting plan is as follows:
------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost | ------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 36 | 14 | | 1 | SORT AGGREGATE | | 1 | 36 | | |* 2 | HASH JOIN | | 102 | 3672 | 14 | | 3 | VIEW | | 102 | 1224 | 4 | |* 4 | INDEX FAST FULL SCAN | IJ_V2 | 102 | 1632 | 4 | |* 5 | HASH JOIN | | 102 | 2448 | 9 | | 6 | VIEW | | 102 | 1224 | 4 | |* 7 | INDEX FAST FULL SCAN| IJ_V1 | 102 | 1632 | 4 | | 8 | VIEW | | 302 | 3624 | 4 | |* 9 | INDEX FAST FULL SCAN| IJ_V3 | 302 | 4832 | 4 | ------------------------------------------------------------------
The join order (you can check the trace file to confirm this) is: ij_v1, ij_v3, ij_v2 – but because of the swap_join_inputs(v2) hint the ij_v2 index appears first in the plan.
We build a hash table with ij_v2, then build a hash table with ij_v1 with we probe (join) ij_v3.
We then use the result of joining ij_v1/ij_v3 to probe (join) ij_v2 – which means v2 really is the last object in the join order.
It may look complex – but all we’ve done is describe an index-join in detail, and that has allowed us to specify which indexes are joined when. I’ve already pointed out that the manual version appears to be slightly more efficien than the original. It’s also more powerful, and addresses a defect in the current implementation of the index join. But that’s a topic for another blog.
Update Feb 2012: It occurred to me recently that when you hint an index_join, the full syntax includes a list of index identifiers. For a two-index join, the order in which you list the indexes seems to be the order of the hash join (and Oracle doesn’t swap the join inputs for the two indexes). I need to do further investigations into this – in particular with cases of more than two indexes since there is no (documented) option for telling the optimizer to swap join inputs (or not) when joining more than two indexes. For the moment, though, rewriting two-index joins as two-table-joins, isn’t always necessary – but do take a look at the other articles I’ve written about index hash joins.
In 11.2.0.2 both queries results in exactly the same number of buffer gets, so CBO works better in a simple case. I wonder why it is so – Column Projection info says both versions are the same.
Another thing to note is “convenient” version of the SQL is able to use cardinality feedback (although plan doesn’t change in this case).
Comment by Timur Akhmadeev — November 26, 2010 @ 7:34 pm GMT Nov 26,2010 |
Timur, I think the number of buffer gets has to be the same even in Jonathan’s 10.2.0.3. All three indexes are read by FFS in all cases. Indexes are small, they fit into hash workarea of your server process. There is no reason to have different number of buffer gets even in 10.2.0.3 or what ever version.
Comment by Pavol Babel — November 26, 2010 @ 10:18 pm GMT Nov 26,2010 |
Timur,
Thanks for checking the newer version.
As Pavol suggests, I wasn’t expecting the buffer visits to be different, it’s simply the execution of the extra predicate that appeared to be using unnecessary CPU. I’ve come across a few cases where generic strategies seem to do redundant work (or miss the obvious) in one version of Oracle, then gradually eliminate the redundancies as time passes. This looks like another example of that development process.
Nice thought about the cardinality feedback.
Comment by Jonathan Lewis — November 27, 2010 @ 6:57 pm GMT Nov 27,2010 |
Jonathan,
thank You for great explanation.
One more thing that bothers me is what to do in Oracle 9i .
I mean the leading hint can take only one table in 9i , so looks like
we should use ordered hint as a solution.
Am I right ?
Regards
Greg
Comment by Greg — November 26, 2010 @ 8:42 pm GMT Nov 26,2010 |
Greg,
I think you’ll have to look at each example individually and handle with care – but the ordered hint (perhaps inside an inline view containing this pseudo-hash-join) may be the only option you have. When using the ordered hint, though, always be careful of the side-effects of subqueries unnesting and confusing the issue.
Comment by Jonathan Lewis — November 27, 2010 @ 6:55 pm GMT Nov 27,2010 |
Jonathan,
we are manually rewriting SQL queries in similar way. But there are few problems with NO_MERGE hint in more complex queries from time to time. So you have to use QB_NAME hints to allow global hints work. Recently I faced a problem with QB_NAME, too. I’ll try explain my problem on your example.
Imagine you have view defined as follows and you are not allowed to change it’s definition:
How would you get exactly same execution plan as in your post? You can use any HINTS except of the NO_MERGE one.
I’m able to reproduce your execution plan only by using those ugly system generated query block names
Comment by Pavol Babel — November 26, 2010 @ 9:58 pm GMT Nov 26,2010 |
Here is the only way I’m able to reproduce your explain plan without NO_MERGE hint:
I’d like to get rid of system generated query block names, but do not know how. Referencing “ij@ind1” is not documented (maybe event not supported), too.
Comment by pbabel — November 26, 2010 @ 10:29 pm GMT Nov 26,2010 |
Pavel,
I think you’re just trying to make things difficult for me.
Do you have a rule against no_merge hints everywhere because of some bad experiences in the past, or is it exactly this type of code where things go wrong ? I’d be interested to see an example of no_merge misbehaviing (other than in ANSI SQL where virtually everything seems to go wrong with hints.)
Here’s a solution that you probably won’t like and probably shouldn’t use because it takes advantage of an undocumented hint (I wouldn’t use “main” as a name inside a view, by the way; too much risk of the same name being used twice and making Oracle eliminate names in complex code):
Comment by Jonathan Lewis — November 27, 2010 @ 7:02 pm GMT Nov 27,2010 |
Jonathan,
great job by using undocumented hint outline_leaf. I’m not going to use it in my production database, of course. But to be honest, it still looks much better than my hint using “@SEL$9B45E752”.
Actually, I’m quite surprised you’ve never faced any problems with NO_MERGE hint. I’ll try to show you another simplified example. Since my english is far from perfect, I hope you will get the point.
The first oddity is that optimizer on my 10.2.0.5 database is not able to use transitive rule int this simple case, that’s why both predicates have been used (t1.join1 = v.join2 AND t1.join1 = v.join3) . The query plan for this query (hope to be same on most 10g databases)
The optimizer does not have many opportunities to discover the little “hole” in table2, even with histograms on table t2. But the correct join order should be t1 -> t2 -> t3 .
Since only tables at the “same level” in FROM clause should be used in LEADING HINT, you are not able to change the join order between tables t2 and t3.
Unfortunately, the NO_MERGE hint on view would make it even worse, it would prevent from pushing the join predicate into VIEW. When NO_MERGE used, I’m unable to use any further documented hint to make things work. Here is my best try, but no success.
Note the INDEX FULL SCAN on line 8 making execution plan suboptimal.
The only way I’m able to force optimizer choose the best optimal execution plan is using system generated query block names, again.
Comment by pbabel — November 28, 2010 @ 2:23 am GMT Nov 28,2010
Pavel,
I didn’t say I hadn’t had problems with the no_merge hint – just that I was interested to see what problems you have had … and your’s is the same as the one that irritates me most: using the no_merge hint blocks join predicate pushdown. Things have improved in 11g, but even then the restriction has only been relaxed to cover “complex view merging (CVM)” – if you use the no_merge hint to block “simple view merging (SVM)” then join predicate push down (JPPD) is still blocked.
I’m not sure your complaint about transitive closure is valid (or at least, not in the context of the optimizer). Transitive closure only applies in Oracle for: “if colA = colB and colB = constant then colA = constant” – and I think you’re hoping that your example will cover “colA = colB and colB = colC”, where it doesn’t apply.
Comment by Jonathan Lewis — November 28, 2010 @ 12:21 pm GMT Nov 28,2010 |
Pavel,
I’ve run your example on my laptop and checked a few details.
The plan you want to see is one that the optimizer considers – but it doesn’t like the results of arithmetic. This means we can get the optimizer to do what you want by changing some numbers. The key thing is that Oracle thinks using the index on t3 is cheaper than the index on t2 and that the join result from t3 is much smaller than the join result on t2. So we change its mind.
If this is a general problem with many queries, we could get the effect we want by adjusting the clustering_factor (or blevel) on the t2 and t3 indexes. But if we just want to fix this specific example we’re a bit lucky because we can put in cardinality hints that Oracle can apply without having to use derived query block names.
We tell the optimizer how many rows we think will come from t1, then how many will come from each of the next joins – claiming that the join to t3 will produce a lot more data than the join to t2. With these figures, this is the plan I got:
Comment by Jonathan Lewis — November 28, 2010 @ 2:49 pm GMT Nov 28,2010 |
Now I understand, you had some problems with no_merge in the past, that’s what I initially expected. Maybe I just didn’t understand your remark “I think you’re just trying to make things difficult for me” Join push predicate is probably the biggest issue correlated to no_merge hint.
You’re absolutely right with transitive closure my complaint is really wrong, sorry for that.
Back to my example. The optimizer is considering my preferred execution plan, of course. I would like to point out – I just wanted to create some simple example to show join predicate pushing problem. My initial intention was just to say “imagine you need use index on t2, but optimizer chooses index on t3″. But then I invented simple trick with data in table t2, and oracle really did not choose the best index, making by example better.
Real queries in our productional system are much more complex, imagine joining 5 views, each of them based on join of two or three tables. Some of them are still causing me headaches. I would desperately like to have control over join order of tables in some cases, but it seems to be not doable without some little help of undocumented hints (t@qb_name reference is undocumented, too and not approved by Oracle Support in SR).
Hacking, or let me say “improving”, statistics with dbms_stats.set_%_stats procedures is really great technique helping in many situations. But as you suggests, it is not usable every time.
Your next suggestion is to use CARDINALITY hint. We have tried to use OPT_ESTIMATE hint (or the older one CARDINALITY) hints several times in the past, but we didn’t get the approval from Oracle Support. To be honest, I do not know why and it is quite irritating us. There are many situations where you know optimizer’s arithmetic is wrong, so why are you not allowed to correct it in simple way? Most of our customers are not going to pay extra licenses for SQL TUNING ADVISOR (part of TUNING PACK, which requires also DIAGNOSTICS PACK) just because of lack opt_estimate/cardinality hint support.
Comment by Pavol Babel — November 28, 2010 @ 11:49 pm GMT Nov 28,2010 |
One more remark. You have made me running 10053 on query from my example just by using abbreviation “JPPD”. Last time I ran 10053 on query with blocked join predicate pushing (by no_merge hint), it was few years ago, in 9iR2 or 10gR1. As you know, there were no explanations of abbrevations Like JPPD, OJPPD, SPJ (etc.) even in 10gR. It was sometimes really difficult to address any Query Transformation related issue. The LEGEND in 10053 in 10gR2 and in 11g is great, I wish I had it few years ago! :)
Comment by Pavol Babel — November 29, 2010 @ 12:02 am GMT Nov 29,2010 |
Pavel,
“(t@qb_name reference is undocumented, too and not approved by Oracle Support in SR).”
In the 11.2 Performance Tuning Guide Section 19.2.2, there is an example of using the execution plan to discover the query block name of a query block that you have not named (admittedly the example happens to be a boring “sel$4”)
Then, in Section 19.2.3, example 19-4, we see this:
To avoid this issue, Oracle recommends that you specify a query block in the hint using the @SEL notation:
SELECT /*+ LEADING(A@SEL$1 B@SEL$2 C@SEL$2) */
FROM a a, v v
WHERE a.id = v.id;
I find it faintly amazing that Oracle support should tell you that “alias@qb_name” is not approved.
Comment by Jonathan Lewis — November 29, 2010 @ 11:02 pm GMT Nov 29,2010 |
Jonathan,
the SR I’ve mentioned had been created few years ago. I’m not sure if it was created before the 11gr1 or not. I’ll try to find it in my archives.
But things have obviously changed in 11gR2, thank you very much for this link Most of our databases are still running 10.2.0.4 or 10.2.0.5, I have never cheked this part of documentation in 11g.
The most crazy thing about example 19-4 is, that it is exactly the select taken from SR, which I was talking about. Tables A, B, C, where view V is defined as join of tables B and C. And join of table A with view V. And alias a for table a, alias v for view v. No one in the world would be so crazy to use such aliases, except of me. Yes it must be taken from my SR, I’m not joking!!! :-) :-)
Comment by Pavol Babel — November 29, 2010 @ 11:57 pm GMT Nov 29,2010 |
Jonathan,
I’ve just found mentioned SR. It was created by me as SR 2-4140118 (legacy SR number 6614847.993, Opened Dec 4, 2007 9:14 PM, Last Updated Feb 13, 2008 1:40 AM). Unfortunately, I’m not able to see the content (oracle has already deleted it ), but following bug was created because of my SR: BUG 6692628 IGNORED LEADING HINTS WITH VIEWS . The bug is available to public.
So I’m pleased to now that I have now my own example in 11gR2 documentation, thank you Jonathan!! :-) :-) (btw, it is not in 11gR1 documentation). The bug was closed as “not a bug”, but someone has obviously added this “issue” to 11gR2 documentation without any mention in BUG document.
Comment by Pavol Babel — November 30, 2010 @ 12:38 am GMT Nov 30,2010 |