Oracle Scratchpad

August 19, 2020

Subquery with OR

Filed under: 12c,Execution plans,Hints,Oracle,Performance — Jonathan Lewis @ 2:03 pm BST Aug 19,2020

I’ve written a couple of notes in the past about the problems of optimising queries with predicates of the form “or exists {subquery}”. A recent question on the Oracle Developer Community forum brought to my attention an improvement in this area in (very precisely) 12.2, as well as giving me a cute example of how the first cut of a new feature doesn’t always cover every detail, and creating a nice example of how the new technology enhances the old technology.

We start with some data and a simple query running under 12.2.0.1:

rem
rem     Script:         exists_with_or_4.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem             12.1.0.2  -- feature not implemented
rem

create table cat_contact(
        contact_method_id       varchar2(1) not null,
        contact_id              number(8,0) not null,
        small_vc                varchar2(10),
        padding                 varchar2(100)
);

alter table cat_contact add constraint cc_pk primary key(contact_id);
create index cc_i1 on cat_contact(contact_method_id);

insert into cat_contact
select
        chr(64 + case when rownum <= 10 then rownum else 26 end),
        rownum,
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_contact where contact_method_id in ('A','B','C');

create table cat_item(
        contact_id      number(8,0) not null,
        item_category   varchar2(1) not null,
        small_vc        varchar2(10),
        padding         varchar2(100),
        constraint ci_ref_cc foreign key(contact_id) references cat_contact
)
;

alter table cat_item add constraint ci_pk primary key(contact_id, item_category);
create index ci_i1 on cat_item(item_category);

insert into cat_item 
select
        rownum,
        chr(88 + case when rownum <= 10 then mod(rownum,2) else 2 end),
        lpad(rownum,10),
        rpad('x',100,'x')
from
        all_objects
where
        rownum <= 10000
;

select count(*) from cat_item where item_category in ('X','Y');

execute dbms_stats.gather_table_stats(user,'cat_contact')
execute dbms_stats.gather_table_stats(user,'cat_item')

I’ve created and populated two tables (the table and column names come from the ODC thread). There’s a foreign key relationship defined between cat_item and cat_contact, both tables have primary keys declared, with a couple of extra columns declared not null.

I’ve populated the two tables with a small amount of data and each table has one column rigged with very skewed data:

  • cat_contact.contact_method_id is mostly ‘Z’ with one row each of ‘A’ to ‘J’ ,
  • cat_item.item_category (the second column in the primary key) is mostly ‘Z’ with 5 rows each of ‘X’ and ‘Y’

After populating each table I’ve queried it in a way which means the subsequent stats gathering will create frequency histograms on these two columns and the optimizer will be able to take advantage of the skew in its arithmetic, which means it may choose to use the indexes I’ve created on those skewed columns if the right values appear in the queries.

So here’s the query we’re interested in:

SELECT  /*+ 
                qb_name(main) 
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR      c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

Here’s the default execution plan (in 12.2.0.1 with my settings for system stats and various other optimizer-related figures that MIGHT make a difference) pulled from memory after executing the query to return 10 rows.


-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |       |       |    34 (100)|          |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | CAT_CONTACT | 10000 |  1152K|    34   (6)| 00:00:01 |
|   3 |   INLIST ITERATOR   |             |       |       |            |          |
|*  4 |    INDEX UNIQUE SCAN| CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter((INTERNAL_FUNCTION("C"."CONTACT_METHOD_ID") OR  IS NOT NULL))
   4 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR
              "I"."ITEM_CATEGORY"='Y')))

For every row in the cat_contact table Oracle has checked whether or not the contact_method is an ‘A’, ‘B’, or ‘C’ and passed any such rows up to its parent, for all other rows it’s then executed the subquery to see if the row with the matching contact_id in contact_item has an ‘X’ or ‘Y’ as the item_category. It’s had to run the subquery 9,997 times (there were only three rows matching ‘A’,’B’,’C’) and the INLIST ITERATOR at operation 3 means that it’s probed the index nearly 20,000 timtes. This does not look efficient.

I’ve said in previous articles that when you need to optimize queries of this shape you need to rewrite them as UNION ALL queries to separate the two parts of the complex OR predicate and then make sure that you don’t report any items twice – which you do by making use of the lnnvl() function. So let’s do this – but let’s do it the lazy “new technology” way by upgrading to 19c and executing the query there; here’s the plan I got in 19.3.0.0:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

The optimizer has used the new “cost-based OR-expansion” transformation to rewrite the query as a UNION ALL query. We can see an efficient access into cat_contact to identify the ‘A’,’B’,’C’ rows, and then we can see that the second branch of the union all handles the existence subquery but the optimizer has unnested the subquery to select the 10 rows from cat_item where the item_category is ‘X’ or ‘Y’ and used those rows in a nested loop to drive into the cat_contact table using the primary key. We can also see the use of the lnnvl() function in operation 13 that ensures we don’t accidentally report the ‘A’,’B’,’C’ rows again.

So let’s go back to 12.2.0.1 and see what happens if we just add the /*+ or_expand(@main) */ hint to the query. Here’s the resulting execution plan:


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    14 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    14   (8)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    INLIST ITERATOR                        |                 |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED   | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                     | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|   6 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   7 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   8 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   9 |       INLIST ITERATOR                     |                 |       |       |            |          |
|  10 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|* 11 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|* 12 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|* 13 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  11 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
  12 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  13 - filter((LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C')))

We get exactly the plan we want – with the same cost as the 19c cost, which happens to be less than half the cost of the default plan that we got from 12.2.0.1. So it looks like there may be case where you will need to hint OR-expansion because is might not appear by default.

Other Observations 1 – ordering

You may have noticed that my query has, unusually for me, put the existence subquery first and the simple filter predicate second in the where clause. I don’t like this pattern as (over time, and with different developers modifying queries) it’s too easy in more complex cases to “lose” the simple predicate; a one-liner can easily drift, change indents, get bracketed with another predicate that it shouldn’t be connected with and so on. I’ve actually seen production systems producing wrong results because little editing accidents like this (counting brackets is the classic error) have occured – so I’m going to rerun the test on 12.2.0.1 with the predicates in the order I would normally write them.

Here’s the “corrected” query with its execution plan:


SELECT  /*+ 
                qb_name(main) 
                or_expand(@main)
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                c.contact_method_id IN ('A', 'B', 'C')  
        OR
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        )
;  


-------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                 |       |       |    16 (100)|          |
|   1 |  VIEW                                     | VW_ORE_231AD113 |    13 |   962 |    16   (7)| 00:00:01 |
|   2 |   UNION-ALL                               |                 |       |       |            |          |
|   3 |    NESTED LOOPS                           |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   4 |     NESTED LOOPS                          |                 |    10 |  1240 |    10  (10)| 00:00:01 |
|   5 |      SORT UNIQUE                          |                 |    10 |    60 |     4   (0)| 00:00:01 |
|   6 |       INLIST ITERATOR                     |                 |       |       |            |          |
|   7 |        TABLE ACCESS BY INDEX ROWID BATCHED| CAT_ITEM        |    10 |    60 |     4   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN                  | CI_I1           |    10 |       |     3   (0)| 00:00:01 |
|*  9 |      INDEX UNIQUE SCAN                    | CC_PK           |     1 |       |     0   (0)|          |
|  10 |     TABLE ACCESS BY INDEX ROWID           | CAT_CONTACT     |     1 |   118 |     1   (0)| 00:00:01 |
|* 11 |    FILTER                                 |                 |       |       |            |          |
|  12 |     INLIST ITERATOR                       |                 |       |       |            |          |
|  13 |      TABLE ACCESS BY INDEX ROWID BATCHED  | CAT_CONTACT     |     3 |   354 |     4   (0)| 00:00:01 |
|* 14 |       INDEX RANGE SCAN                    | CC_I1           |     3 |       |     3   (0)| 00:00:01 |
|  15 |     INLIST ITERATOR                       |                 |       |       |            |          |
|* 16 |      INDEX UNIQUE SCAN                    | CI_PK           |     1 |     6 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))
   9 - access("I"."CONTACT_ID"="C"."CONTACT_ID")
  11 - filter(LNNVL( IS NOT NULL))
  14 - access(("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C'))
  16 - access("I"."CONTACT_ID"=:B1 AND (("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')))

The execution plan has jumped from 14 lines to 17 lines, the cost has gone up from 14 to 16, and both branches of the plan now report access to cat_contact and cat_item (though only through its primary key index in the second branch). What’s happened?

Oracle 12.2.0.1 has rewritten the query as a UNION ALL working from the bottom up – so in this case the first branch of the rewrite handles the original filter subquery, unnesting it to drive efficient from cat_item to cat_contact. This means the second branch of the rewrite has to find the ‘A’,’B’,’C’ rows in cat_contact and then check that the filter subquery hadn’t previously reported them – so the optimizer has applied the lnnvl() function to the filter subquery – which you can nearly see in the Predicate Information for operation 11.

To make it clearer, here’s what you get as the predicate information for that operation after calling explain plan and dbms_xplan.display()

  11 - filter(LNNVL( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1)))

In 12.2 the order of predicates in your query seems to be important – unless told otherwise the optimizer is working from the bottom-up (then rewriting top-down). But there is hope (though not documented hope). I added the /*+ or_expand(@main) */ hint to the query to force OR-expansion. Checking the Outline Information of the plan I could see that this had been expanded to /*+ or_expand(@main (1) (2)) */. Taking a wild guess as the significance of the numbers and changing the hint to /*+ or_expand(@main (2) (1) */ I re-ran the test and back to the more efficient plan – with the filter subquery branch appearing second in the UNION ALL view and the lnnvl() applied to the simpler predicate.

So the OR-expansion code is not fully cost-based in 12.2.0.1, but you can modify the behaviour through hinting. First to force it to appear (which may not happen even if it seems to be the lower cost option), and secondly to control the ordering of the components of the UNION ALL. As with all things relating to hints, though, act with extreme caution: we do not have sufficient documentation explaining exactly how they work, and with some of them we don’t even know whether the code path is even complete yet.

Other Observations 2 – 12cR1

The or_expand() hint and cost-based OR-expansion appeared specifically in 12.2.0.1; prior to that we had a similar option in the use_concat() hint and concatenation – which also attempts to rewrite your query to produce a union all of disjoint data sets. But there are restrictions on what concatentation can do. I rarely remember what all the restrictions are, but there are two critical restrictions:

  • first, it will only appear by default if there is an indexed access path available to drive every branch of the rewrite
  • secondly, it will not apply further transformations to the separate branches that it produces

If we try adding the or_expand() hint to our query in 12.1.0.2 it will have no effect, so let’s add a suitable use_concat() hint and see what happens:

explain plan for
SELECT  /*+ 
                qb_name(main) 
                use_concat(@main 8 or_predicates(1))
--              use_concat(@main   or_predicates(1))
        */ 
        *  
FROM    cat_contact c  
WHERE   (
                exists  (  
                        SELECT  /*+ qb_name(subq) */
                                *  
                        FROM    cat_item i  
                        WHERE   i.contact_id = c.contact_id  
                        AND     i.item_category in ('X', 'Y')  
                )
        OR
                c.contact_method_id IN ('A', 'B', 'C')  
        )
;  

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             | 10000 |  1152K|    40   (3)| 00:00:01 |
|   1 |  CONCATENATION                        |             |       |       |            |          |
|   2 |   INLIST ITERATOR                     |             |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| CAT_CONTACT |     3 |   354 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | CC_I1       |     3 |       |     3   (0)| 00:00:01 |
|*  5 |   FILTER                              |             |       |       |            |          |
|*  6 |    TABLE ACCESS FULL                  | CAT_CONTACT |  9997 |  1151K|    35   (6)| 00:00:01 |
|   7 |    INLIST ITERATOR                    |             |       |       |            |          |
|*  8 |     INDEX UNIQUE SCAN                 | CI_PK       |     1 |     6 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C"."CONTACT_METHOD_ID"='A' OR "C"."CONTACT_METHOD_ID"='B' OR
              "C"."CONTACT_METHOD_ID"='C')
   5 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ") */ 0 FROM "CAT_ITEM" "I" WHERE
              ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y') AND "I"."CONTACT_ID"=:B1))
   6 - filter(LNNVL("C"."CONTACT_METHOD_ID"='A') AND LNNVL("C"."CONTACT_METHOD_ID"='B') AND
              LNNVL("C"."CONTACT_METHOD_ID"='C'))
   8 - access("I"."CONTACT_ID"=:B1 AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y'))

26 rows selected.

As you can see by forcing concatentation I’ve got my “union all” view with lnnvl() applied in the second branch. But the second branch was the “select where exists()” branch and the optimizer has not been able (allowed?) to do the unnesting that would let it drive efficiently from the cat_item table to the cat_contact table. The effect of this is that the plan still ends up with a full tablescan of cat_contact running a filter subquery on virtually every row- so concatenation doesn’t save us anything.

The significance of the “8” in the hint, by the way is (I believe) that it tells the optimizer to use inlist iterators when possible. If I had omitted the “8” the plan would have had 4 branches – one each for ‘A’, ‘B’, and ‘C’ and the fourth for the filter subquery. I could also have added a hint /*+ use_concat(@subq or_predicates(1)) */ to replace operations 7 and 8 with a single index range scan with a filter predicate for the ‘X’/’Y’ check (and that might, in any case, be slightly more efficient than the iteration approach).

Footnote(s)

The “legacy” OR-expansion (“concatenation” a.k.a. LORE in the optimizer trace file) can be controlled through the hints use_concat(), and no_expand().

The new cost-based OR-expansion (now ORE in the optimizer trace file) can be controlled through the hints or_expand() and no_or_expand().

The new cost-based OR-expansion has some restrictions, for example it is explicitly blocked in a MERGE statement, even in 19c, as reported in this blog note by Nenad Noveljic. As the blog note shows, concatenation is still possible but you (may) have to disable cost based OR-expansion.

I scanned the executable for the phrase “ORE: bypassed” to see if there were any messages that would suggest other reasons why cost-based OR-expansion would not be used; unfortunately the only relevant string was “ORE: bypassed – %s” [update (see comment 5 below): after ignoring case there was a second option: “ORE: Bypassed for disjunct chain: %s.”] – in other words all the possible bypass messages would be filled in on demand. I found a list of messages that might be relevant; I’d be a little cautious about trusting it but if you don’t see the feature appearing when you’re expecting it then it might be worth checking whether one of these could apply.

  • Old OR expansion hints present
  • Semi join hint present
  • QB has FALSE predicate
  • QB marked for NO Execution
  • Full Outer join QB
  • Rownum found in disjunction
  • Anti/semi/outer join in disjunction
  • Opposite Range in disjunction
  • No Index or Partition driver found
  • Predicate chain has all constant predicates
  • Negated predicate found
  • Long bitmap inlist in OR predicate
  • PRIOR expression in OR predicate
  • All 1-row tables found
  • No valid predicate for OR expansion
  • Disjunctive subquery unnesting possible
  • Subquery unnesting possible
  • Subquery coalesced query block
  • Merge view query block

Finally – here’s another reference blog note comparing LORE with ORE from Mohamed Houri.

 

14 Comments »

  1. Jonathan,

    Isn’t it strange that the total cost of the FILTER operation in the first execution plan is lower than the sum of its children? In other words, it seems that the cost of the second child hasn’t been factored in in the parent.

    Best regards,
    Nenad

    Comment by Nenad Noveljic — August 20, 2020 @ 7:38 pm BST Aug 20,2020 | Reply

    • Nenad,

      Oracle has a long history of not catering for the costs of scalar subqueries. Often the cost of executing the subquery once shows up close to the subquery and MAY be added once to the total cost, sometimes (as here) the cost simply disappears. Only in fairly recent versions of Oracle do you see some cases where the total cost of a query includes an estimate of “cost of subquery x estimated calls to subquery”.

      Your question prompted me to check the CBO trace file in 12.2.0.1 to see if the total reported cost of 34 and the full tables scan cost of 34 were two different values with rounding confusing the issue. (e.g. tablescan = 33.6, total cost = 34.4, cost of filter = 0.0001).

      What I found, most interstingly, was that Oracle had examined cost-based OR expansion, with the following transformation appearing in the trace file (edited for readability):

      
      ORE: Starting iteration 2, state space = [{ 2 }]
      ORE: Transformed query
      ******* UNPARSED QUERY IS *******
      SELECT 
              /*+ QB_NAME ("MAIN") */ 
              "C"."PADDING" "ITEM_1","C"."SMALL_VC" "ITEM_2","C"."CONTACT_ID" "ITEM_3","C"."CONTACT_METHOD_ID" "ITEM_4" 
      FROM    "TEST_USER"."CAT_CONTACT" "C" 
      WHERE   EXISTS (
                      SELECT /*+ QB_NAME ("SUBQ") */ 0 
                      FROM "TEST_USER"."CAT_ITEM" "I" 
                      WHERE "I"."CONTACT_ID"="C"."CONTACT_ID" 
                      AND ("I"."ITEM_CATEGORY"='X' OR "I"."ITEM_CATEGORY"='Y')
              ) 
      AND     LNNVL("C"."CONTACT_METHOD_ID"='A') 
      AND     LNNVL("C"."CONTACT_METHOD_ID"='B') 
      AND     LNNVL("C"."CONTACT_METHOD_ID"='C')
      
      

      So 12.2 is heading in the right direction – but then loses its way.

      Comment by Jonathan Lewis — August 22, 2020 @ 4:15 pm BST Aug 22,2020 | Reply

      • Jonathan,

        We should see CBORE in 12.2, too – Oracle introduced the transformation in this release. Underestimated FILTER cost is a reason for discarding CBORE.

        Unfortunately, I also observerd the problem in higher releases. A similar test case produces a plan without CBORE in 19.7: https://nenadnoveljic.com/blog/disjunctive-subquery-optimization/ .

        CBO trace is the only place where we can see if the transformation was bypassed or discarded.

        Comment by Nenad Noveljic — August 22, 2020 @ 6:58 pm BST Aug 22,2020 | Reply

        • Nenad,

          Just checking details:
          In the 12.2 trace:
          * ORE now means cost-based OR-expansion – which is (apparently) what was reported in my trace.
          * LORE is the “legacy” OR-expansion, i.e.concatenation.

          Comment by Jonathan Lewis — August 22, 2020 @ 7:00 pm BST Aug 22,2020

  2. Jonathan,

    Exactly. Here are further details for understanding ORE steps in CBO trace: https://nenadnoveljic.com/blog/dissecting-cost-based-or-expansion/

    Comment by Nenad Noveljic — August 22, 2020 @ 7:13 pm BST Aug 22,2020 | Reply

    • Nenad,

      Thanks for the follow-up.
      I misinterpreted your comment to mean “it’s not happening”, rather than “it’s happening but not working properly”.

      I see two problems in my 12c example:
      a) The ORE mechanism doesn’t consider unnesting after the first transformation
      b) The original path (filter subquery, no OR expansion) is under-costed so the ORE costed path doesn’t (thanks to (a)) have a lower cost – so shouldn’t be taken.

      In your example I’m not sure you have a fair test of ORE because of (a). If you add the or_expand() hint to the Oracle code then OR-expansion takes place, but the first branch is a full tablescan of t_large and the second branch is a full tablescan of t_large with a filter subquery – so the ORE cost is clearly going to be higher than the original full tablescan with filter. If you add an index on t_large(a) then 19c will automatically use the ORE path because it does unnest after expanding, though 12c will stick with the original path.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — August 23, 2020 @ 8:15 am BST Aug 23,2020 | Reply

      • Jonathan,

        The filter cost looks *massively* underestimated – even in 19c. Let me explain. The real cost isn’t just a single execution of the subquery, (the FTS of t_small). The subquery has to be executed for every row in t_large. Therefore, a calculation similar to one descried in https://jonathanlewis.wordpress.com/2019/06/06/scalar-subquery-costing/ would be appropriate in this case as well, wouldn’t it?

        Comment by Nenad Noveljic — August 23, 2020 @ 2:51 pm BST Aug 23,2020 | Reply

        • Nenad,

          Totally agree with your comment, but the point I wanted to make was that you can’t say “ORE isn’t working” when actually the problem is more like: “Filter subquery costing isn’t working but ORE believes that it is.”

          Regards
          Jonathan Lewis

          Comment by Jonathan Lewis — August 23, 2020 @ 3:45 pm BST Aug 23,2020

    • Nenad,

      Thanks for the “dissecting ORE costing” link. I’ll be taking a close look at that.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — August 23, 2020 @ 8:16 am BST Aug 23,2020 | Reply

  3. Jonathan,

    Your formulation is, of course, 100% correct.

    It wasn’t until I read your article that I realized that wrong filter costing is actually the reason for ORE not appearing in the execution plan.

    Comment by Nenad Noveljic — August 23, 2020 @ 4:23 pm BST Aug 23,2020 | Reply

    • Jonathan,

      After careful rereading of your answers and some additional testing with peeking into the optimizer trace, I have to correct my statement above:
      – The main issue in 12.2 (and in 18c, too, by the way) is, as you said, that the optimizer doesn’t unnest the subquery – unless with the or_expand() hint.
      – Wrong FILTER costing is a secondary problem without any impact on the execution plan.

      Thanks a lot for this very insightful discussion.

      Comment by Nenad Noveljic — August 24, 2020 @ 3:26 pm BST Aug 24,2020 | Reply

  4. […] to hint it with the or_expand() hint in both 12.2 and 18c to make the transformation appear.  See this blog note for further details and an example, but be a little careful about bugs in the earlier […]

    Pingback by Subquery with OR | Oracle Scratchpad — August 27, 2020 @ 9:06 am BST Aug 27,2020 | Reply

  5. The following bypass reason seems to be missing on your list in the footnote: “ORE: Bypassed for disjunct chain: No Index or Partition driver found.”
    I observed it on 19.7.0.0.200414

    Comment by Nenad Noveljic — August 27, 2020 @ 11:21 am BST Aug 27,2020 | Reply

    • Nenad,

      Thanks for that – I must forgotten to use “-i” in my “grep”. There’s a second starting string int he executable: “ORE: Bypassed for disjunct chain: %s.”. (note the capital B). Your example uses one of the items in my list with that prefix.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — August 27, 2020 @ 12:02 pm BST Aug 27,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a reply to Nenad Noveljic Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.