There’s a question on the ODC database forum about column groups that throws up an interesting side point. The OP is looking at a query like the following and asking about which column groups might help the optimizer get the best plan:
select a.*, b.*, c.* from a, b, c where a.id = b.id and a.id1 = b.id1 and a.id = c.id and b.id2 = c.id2 and a.id4 = 66 and b.id7 = 44 and c.id88 = 88 ;
I’m going to start by being a bit boring about presentation and meaning (although this query has fairly obviously being engineered to conceal any meaningful column and table names) and do a cosmetic edit of the query because if I had a from clause reading “a, b, c” it would be because I thought the optimizer should identify that as the best join order, in which case I would also have written the predicate section to display the order in which the predicates would be used:
select a.*, b.*, c.* from a, b, c where a.id4 = 66 -- and b.id = a.id and b.id1 = a.id1 and b.id7 = 44 -- and c.id = a.id and c.id2 = b.id2 and c.id88 = 88 ;
Having (to my mind) cosmetically enhanced the query, I’ll now ask the question: “Would it make sense to create column groups on a(id, id1), b(id, id1) and c(id, id2) ?”
I’ve written various articles on cases where column groups have effects (or not): “out of range” predicates, “is null” predicates, “histograms issues”, “statistics at both ends of the join”, and “multi-column indexes vs. column groups” are just some of the key areas. Assuming there are no reasons to stop a particular column group from working we can look at the join from table A to table B: it’s a two-column join so if there’s some strong correlation between the id and id1 columns of these two tables then creating the two column groups (one at each end of the join) can make a difference to the optimizer’s calculations with the most likely effect that the cardinality estimate on the join will go up and, as a side effect the join order and join method may change.
If we then consider the join to table C – we note that it involves two columns from table C being joined to one column from table A and one from table B so, while we could create a column group on those two columns at the table C end of the join, a column group is simply not possible at the A/B of the join. This means that one end of the join may have a selectivity that is hugely increased (far fewer combinations) because the column group has quantified the correlation, but the selectivity at the other end is simply based on the two separate selectivities from a.id and b.id2, and that’s likely to be smaller than the selectivity of (c.id, c.id2), and the optimizer will choose the smaller join selectivity hence producing a lower cardinality estimate.
This is where a collateral point appears – and it’s a point which also justifies the careful rearrangement of the SQL text – there is an opportunity for transitive closure that the human eye can see but the optimizer is not coded to manipulate. We have two predicates: “b.id = a.id” and “c.id = a.id” but they can only both be true when “c.id = b.id”, so let’s replace “c.id = a.id” with “c.id = b.id” and the join predicate to table C becomes:
and c.id = b.id and c.id2 = b.id2
Both left hand sides reference table C, both right hand sides reference table B – so if we now create a column group on c(id, id2) and an additional column group on b(id, id2) then we may give Oracle some better information about this join as well. In fact, even if we create NO column groups at all this specific change may be enough to result in a change in the selectivity calculations with a subsequent change in cardinality estimate and execution plan.