12c Scalar Subquery

December 9, 2015

12c Scalar Subquery

Filed under: 12c,Execution plans,Oracle — Jonathan Lewis @ 2:25 pm GMT Dec 9,2015

Every version of the optimizer enhances existing mechanisms and introduces new features and 12c has introduced some of the most sophisticated transformation to date; in this note I want to demonstrate an enhancement to subquery unnesting that could give a significant performance boost to a certain query pattern but which might, unfortunately, result in worse performance.

Historically subquery unnesting turned subqueries (correlated or not) in the where clause into joins. In 12c subquery unnesting can also turn scalar subqueries in the select list into joins – we’ll discuss why this could be a good thing but might occasionally be a bad thing later on in the article, but let’s start with a test case.

Sample data.

In my demonstration I’m going to use three tables which, for convenience, are three clones of the same data.


rem
rem     Script: 12c_scalar_subq.sql
rem     Dated:  April 2014
rem     Author: J.P.Lewis
rem

create table t1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,     
        mod(rownum-1,100)               mod100,
        trunc((rownum - 1)/100)         trunc100,
        rownum                          n1,
        rownum                          n2,
        lpad(rownum,6,'0')              vc1,
        rpad('x',100)                   padding
from
        generator
where
        rownum <= 10000
;

create table t2 as select * from t1;
create table t3 as select * from t1;

create index t1_i1 on t1(id);
create index t2_i1 on t2(id,mod100);
create index t3_i1 on t3(id,trunc100);

begin
        dbms_stats.gather_table_stats(user,'t1');
        dbms_stats.gather_table_stats(user,'t2');
        dbms_stats.gather_table_stats(user,'t3');
end;
/

I’ll be examining a query against t1 that includes two correlated scalar subqueries in the select list that reference one each of t2 and t3:


explain plan for
select
        /*+
                qb_name(main)
        */
        n1, n2,
        (
                select  /*+ qb_name(sq1) */
                        max(mod100)
                from    t2
                where   t2.id = t1.id
        ) new_n1,
        (
                select  /*+ qb_name(sq2) */
                        max(trunc100)
                from    t3
                where   t3.id = t1.id
        ) new_n2
from
        t1
where
        t1.id between 101 and 200
;

select * from table(dbms_xplan.display);

11g Plan

This is the execution plan you might expect to see from 11g – in my case, with my system stats etc. and running 11.2.0.4:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |   101 |  1212 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |     7 |            |          |
|   2 |   FIRST ROW                  |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| T2_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   4 |  SORT AGGREGATE              |       |     1 |     7 |            |          |
|   5 |   FIRST ROW                  |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN (MIN/MAX)| T3_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   7 |  TABLE ACCESS BY INDEX ROWID | T1    |   101 |  1212 |     4   (0)| 00:00:01 |
|*  8 |   INDEX RANGE SCAN           | T1_I1 |   101 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("T1"."ID">=101 AND "T1"."ID"<=200)

As you can see, the operations for the subqueries appear first in the plan (lines 1-3, and 4-6), with the operations for the main query appearing as the last section of the plan (lines 7-8). You might note that the total cost of the plan doesn’t cater for the cost of the subqueries – technically we might expect to see the optimizer producing a cost of something like 408 on the basis that it’s going to run each subquery an estimated 101 times and each subquery has a cost of 2, and the 101 rows are generated from a query with a cost of 4 giving: 4 + 101 * (2 + 2) = 408.

12c Plan

On the upgrade to 12c, the same code produces the following plan:


--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |   101 |  6464 |     8   (0)| 00:00:01 |
|*  1 |  HASH JOIN OUTER                      |          |   101 |  6464 |     8   (0)| 00:00:01 |
|*  2 |   HASH JOIN OUTER                     |          |   101 |  3838 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   101 |  1212 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_I1    |   101 |       |     2   (0)| 00:00:01 |
|   5 |    VIEW                               | VW_SSQ_2 |   101 |  2626 |     2   (0)| 00:00:01 |
|   6 |     HASH GROUP BY                     |          |   101 |   707 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                 | T2_I1    |   101 |   707 |     2   (0)| 00:00:01 |
|   8 |   VIEW                                | VW_SSQ_1 |   101 |  2626 |     2   (0)| 00:00:01 |
|   9 |    HASH GROUP BY                      |          |   101 |   707 |     2   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN                  | T3_I1    |   101 |   707 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ITEM_1"(+)="T1"."ID")
   2 - access("ITEM_2"(+)="T1"."ID")
   4 - access("T1"."ID">=101 AND "T1"."ID"<=200)
   7 - access("T2"."ID">=101 AND "T2"."ID"<=200)
  10 - access("T3"."ID">=101 AND "T3"."ID"<=200)

As you can see, the separate plans for the subqueries have disappeared and the plan is showing a three-table join (with two outer joins, and two of the “tables” being the non-mergeable view vw_ssq_2 and vw_ssq_1).

There are several details to pick up in this plan (apart from the unnesting). The cost is only 8 – but in this case it isn’t the effect of the optimizer “ignoring” the cost of the subqueries, it’s the optimizer correctly working out the cost of the unnested subqueries with joins. The cost happens to be low in this case because the optimizer has used transitive closure to pass the predicate from the driving query into the subqueries – so we need only do a couple of short index range scans to get all the data we need in the unnested subqueries.

The outer joins are needed because it is valid for the original scalar subquery mechanism to return no data for a subquery and still report a row (with nulls) for t1. If the rewrite didn’t introduce the outer join then t1 rows for which there were no matching t2 or t3 rows would disappear from the result set.

Threats and workarounds

In this (lightweight) example it looks as if this transformation is a good idea, but it’s always possible that the optimizer might choose to do this when it’s a bad idea. In fact, a quick check of the optimizer trace (10053) suggests that this is an uncosted transformation that will take place “because it can”. Here are six highly suggestive consecutive lines from the trace file:


SU: Unnesting query blocks in query block MAIN (#1) that are valid to unnest.
Subquery Unnesting on query block MAIN (#1)
SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block MAIN (#1).
SU:   Unnesting  scalar subquery query block SQ2 (#2)
Registered qb: SEL$2E540226 0x50b1a950 (SUBQ INTO VIEW FOR COMPLEX UNNEST SQ2)

Even if this transformation is cost-based rather than heuristic it’s always possible for the optimizer to make a very poor estimate of cost and do the wrong thing. Fortunately it’s possible to block the unnesting with the “traditional” /*+ no_unnest */ hint:


select
        /*+
                qb_name(main)
                no_unnest(@sq1)
                no_unnest(@sq2)
        */
        n1, n2, 
...

With these hints in place the execution plan changes back to the 11g form – though there is a notable change in the estimated final cost of the query:


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   101 |  1212 |   206   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                     |       |     1 |     7 |            |          |
|   2 |   FIRST ROW                         |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)       | T2_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   4 |  SORT AGGREGATE                     |       |     1 |     7 |            |          |
|   5 |   FIRST ROW                         |       |     1 |     7 |     2   (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN (MIN/MAX)       | T3_I1 |     1 |     7 |     2   (0)| 00:00:01 |
|   7 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1212 |     4   (0)| 00:00:01 |
|*  8 |   INDEX RANGE SCAN                  | T1_I1 |   101 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("T1"."ID">=101 AND "T1"."ID"<=200)

It’s a little surprising that the final cost is (4 + 202) rather than the (4 + 404) that we calculated earlier, but a few variations on the test suggest that the optimizer is using half the cost of each of the scalar subqueries in the final cost estimate – perhaps as a nod in the direction of scalar subquery caching.

As always it is important to remember that you cannot look at a query like this and know which plan is going to perform better unless you know the data content, the pattern of data distribution, and the probably effect of scalar subquery caching. In some cases it may be that an extremely “lucky” data pattern will mean that a scalar subquery will run a very small number of times while, with the same data arranged in a different order the benefit of scalar subquery caching will disappear and the unnesting approach may be much better.

It is a convenience to be able to recognize that there is a new optimizer feature in 12c that might give you a performance boost but might hit you with an unlucky performance penalty (especially in cases, perhaps, where you have previously engineered your code very carefully to take advantage of scalar subquery caching) and know that there is a workaround available. In your search for pre-empting production performance problems, this code structure is another pattern to scan for in your source code.

Comments (16)

16 Comments »

Hmm… that puts a dampener on one of my tricks for effectively forcing a late nested loop on some rowsources when i KNOW that the final result will always be a relatively small number of rows or highly repetitive.

Comment by Dom Brooks — December 9, 2015 @ 5:40 pm GMT Dec 9,2015 | Reply
Away from my desk and therefore just wondering aloud… what about bad subqueries which might normally return a too many rows error? Or perhaps the transformation is restricted to those guaranteed to return one row either through uniqueness or aggregation? And is there a fix control, event or param to turn it off?

Comment by Dom Brooks — December 9, 2015 @ 6:06 pm GMT Dec 9,2015 | Reply
- Dom,
  
  Thanks for the comments – the earlier one means that it was worth writing the article: it looks like exactly the example that demonstrates why you need to know about this transformation in advance of the upgrade.
  
  The “bad subquery” question is an interesting one – I would assume that (at least in the first releases of the transformation) there’s a fairly stringent single-row requirement.
  
  I think the (or a) related parameter is “_optimizer_unnest_scalar_sq” which appeared in 12.1.0.2 and defaults to true. The description is “enables unnesting of of [sic] scalar subquery”
  
  Comment by Jonathan Lewis — December 9, 2015 @ 10:07 pm GMT Dec 9,2015 | Reply
  - Yes, the parameter is “_optimizer_unnest_scalar_sq”, using the same test case I confirmed it was effective when set to false.
    
    For the bad subquery, I did a simple / silly modification of your test case above:
    
    ( select /*+ qb_name(sq1) */ max(mod100) from t2 --where t2.id = t1.id group by t2.id ) new_n1
    
    Subquery was not unnested.
    
    Note in Optimizer trace was:
    
    SU: bypassed: Scalar subquery may return more than one row.
    
    Comment by Dom Brooks — December 10, 2015 @ 9:49 am GMT Dec 10,2015 | Reply
    - I also played around with a few variations of unique indexes, unique constraints and primary keys on a different scalar subquery.
      None of them unnested. All trace files had the same note as above.
      
      Comment by Dom Brooks — December 10, 2015 @ 10:03 am GMT Dec 10,2015
    - Dom,
      
      Thanks for the follow-up.
      
      Comment by Jonathan Lewis — December 11, 2015 @ 12:41 pm GMT Dec 11,2015
[…] problems if only you can spot them in time in your code. (The enhancement to subquery unnesting I described in my previous post is such an example – do you know of any code in your application that was carefully […]

Pingback by Upgrades | Oracle Scratchpad — December 10, 2015 @ 8:42 am GMT Dec 10,2015 | Reply
I’ve just discovered I wrote something about this feature a few months ago, and in that posting I said that the transformation was costed. Clearly I’m going to have to find some time to work out whether or not it is.

Comment by Jonathan Lewis — December 11, 2015 @ 12:42 pm GMT Dec 11,2015 | Reply
[…] version of the optimizer enhances existing mechanisms and introduces new features, while 12c has introduced […]

Pingback by Log Buffer #453: A Carnival of the Vanities for DBAs | InsideMySQL — December 11, 2015 @ 10:40 pm GMT Dec 11,2015 | Reply
[…] explain how this transformation works because these guys did it better than I would ever do…: Jonathan Lewis Tanel […]

Pingback by Oracle 12c: Scalar Subquery Unnesting and ROWNUM | Paweł Chyliński's Oracle Blog — September 18, 2017 @ 11:50 am BST Sep 18,2017 | Reply
[…] picture: the inline scalar subqueries have disappeared. In 12c the optimimzer can unnest scalar subqueries in the select list and turn them into outer joins, and we can see that there are 6 “Nested […]

Pingback by Case Study | Oracle Scratchpad — September 30, 2018 @ 7:59 pm BST Sep 30,2018 | Reply
Good explanation

Comment by Anonymous — December 16, 2018 @ 5:00 pm GMT Dec 16,2018 | Reply
Interesting addition to the query transformation by optimizer. So does it mean that a sql plan baseline created for the sql with scalar subqueries, can be attached to and will be used by the sql that uses left outer join(s) instead of scalar subqueries? Don’t think that would work but I am guessing it would help if we were able to influence sql using left outer join to follow the execution plan of sql using scalar subqueries and vice versa.

Comment by Narendra — February 8, 2020 @ 10:10 pm GMT Feb 8,2020 | Reply
- Narenda,
  
  If you mean – could you use one of the “outline swapping” routines to attach the baseline from a query that had been written with scalar subqueries but run with outer joins to a query that had been run with outer joins, I think the basic answer is no. By the time the baseline was produced the query block names for the tables in the scalar subqueries would have gone through a couple of internal generation steps, while the hand-written views for the outer join would have query block names that were either the ones you supplied or the default pattern of sel$NNN – so the optimizer wouldn’t be able to line up the detailed hints in the baseline.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — February 11, 2020 @ 10:12 am GMT Feb 11,2020 | Reply
[…] join. That’s a lot of scalar subqueries so it’s worth asking whether the code should be rewritten to use joins (though in newer vesions of Oracle some of the subqueries might be transformed to outer joins […]

Pingback by Case Study | Oracle Scratchpad — September 30, 2022 @ 12:21 pm BST Sep 30,2022 | Reply
[…] features have been set back to 11.2.0.4. In 12.2.0.1 (possibly 12.1.0.1) Oracle introduced code to unnest scalar subqueries in the select list and it’s possible that an internal rewrite might manage to do something […]

Pingback by Case Study | Oracle Scratchpad — June 26, 2023 @ 11:39 am BST Jun 26,2023 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome. Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Oracle Scratchpad

December 9, 2015