Oracle Scratchpad

January 29, 2020

SSQ Unnesting

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:17 pm GMT Jan 29,2020

I hesitate to call something a bug simply because Oracle doesn’t do what I thought it would do; but when a trace file says:

“I’m not going to do X because P is not true

followed a little later by

“I’m going to do Y because P is true

then I think it’s safe to say there’s a bug there somewhere – even if it’s only a bug in the code that writes the trace file.

The example is this note is a feature that appeared in 12c (possibly only 12.2) – the ability to unnest scalar subqueries in the select list and transform them into outer joins. Here’s an example to demonstrate the mechanism:

rem
rem     Script:         ssq_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table companies as
select
        rownum  id,
        dbms_random.string('U',30)      name
from
        all_objects
where
        rownum <= 1e4 -- > comment to avoid wordpress format issue
;

alter table companies add constraint com_pk primary key(id);

create table orders (
        id              number(10,0),
        id_company      number(10,0)    not null,
        date_placed     date,
        status          varchar2(1),
        items           number(3,0),
        valuation       number(6,2),
        constraint ord_fk_com foreign key(id_company) references companies,
        constraint ord_pk primary key(id)
)
/

insert into orders
select
        rownum,
        trunc(dbms_random.value(1,1e4))  id_company,
        sysdate - 100 + rownum/100        date_placed,
        chr(64+dbms_random.value(1,6))    status,
        trunc(dbms_random.value(1,11))    items,
        round(dbms_random.value(1,250),2) valuation
from
        all_objects
where
        rownum <= 1e4 -- > comment to avoid wordpress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'orders',
                method_opt  => 'for all columns size 1'
        );
end;
/

set serveroutput off
set linesize 180
set pagesize 60
set trimspool on

select
        /*+ 
                qb_name(main)
        */
        ord.id,
        (
        select 
                /*+ qb_name(company) */ 
                max(com.name)
        from    companies  com  
        where   com.id  = ord.id_company
        )   company,
        ord.valuation,
        ord.status
from
        orders          ord
where
        ord.date_placed > trunc(sysdate) - 1
/

select * from table(dbms_xplan.display_cursor(null,null,'alias'))
/

I’ve created an orders table with an id_company column that is declared as a foreign key to a companies table. When I’ve queried the orders table and reported the company associated with an order I’ve been a little silly and used a correlated scalar subquery in the select list to query the companies table instead of doing a simple join. In fact I’ve been more than a little silly because I’ve used an aggregate when the query is by primary key and can only return one row.

Here’s the execution plan (produced by 12.2 or 19.3)


--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |       |       |    33 (100)|          |
|*  1 |  HASH JOIN OUTER   |           |   143 |  8294 |    33  (22)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| ORDERS    |   143 |  3289 |    22  (23)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| COMPANIES | 10000 |   341K|    10  (10)| 00:00:01 |
--------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$EBD4C958
   2 - SEL$EBD4C958 / ORD@MAIN
   3 - SEL$EBD4C958 / COM@COMPANY

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("COM"."ID"="ORD"."ID_COMPANY")
   2 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-1)

Note
-----
   - this is an adaptive plan

The optimizer has taken my query and turned it into a simple (outer) join between the two tables. I’ve included the Query Block / Alias information in the output so that you can see that Oracle really has generated a new query block by transforming the two separate query blocks in the original query.

Oracle has been very clever here – it has even recognised that the join is going to use a unique scan of a unique key so it has taken out the redundant aggregation step. In many cases where this type of scalar subquery unnesting is used you’re more likely to see a plan with one of the following shapes:


----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |       |       |    36 (100)|          |
|*  1 |  HASH JOIN OUTER     |           |   144 |   286K|    36  (28)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | ORDERS    |   144 |  3312 |    22  (23)| 00:00:01 |
|   3 |   VIEW               | VW_SSQ_1  | 10000 |    19M|    13  (31)| 00:00:01 |
|   4 |    HASH GROUP BY     |           | 10000 |   341K|    13  (31)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| COMPANIES | 10000 |   341K|    10  (10)| 00:00:01 |
----------------------------------------------------------------------------------

-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |       |       |    36 (100)|          |
|*  1 |  HASH JOIN OUTER      |           |   144 |   286K|    36  (28)| 00:00:01 |
|   2 |   JOIN FILTER CREATE  | :BF0000   |   144 |  3312 |    22  (23)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | ORDERS    |   144 |  3312 |    22  (23)| 00:00:01 |
|   4 |   VIEW                | VW_SSQ_1  | 10000 |    19M|    13  (31)| 00:00:01 |
|   5 |    HASH GROUP BY      |           | 10000 |   341K|    13  (31)| 00:00:01 |
|   6 |     JOIN FILTER USE   | :BF0000   | 10000 |   341K|    10  (10)| 00:00:01 |
|*  7 |      TABLE ACCESS FULL| COMPANIES | 10000 |   341K|    10  (10)| 00:00:01 |
-----------------------------------------------------------------------------------

---------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |       |       |    34 (100)|          |
|   1 |  HASH GROUP BY      |           |   144 |  8784 |    34  (24)| 00:00:01 |
|*  2 |   HASH JOIN OUTER   |           |   144 |  8784 |    33  (22)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| ORDERS    |   144 |  3744 |    22  (23)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| COMPANIES | 10000 |   341K|    10  (10)| 00:00:01 |
---------------------------------------------------------------------------------

The first variation shows the creation of an aggregate view that is used in the join – note the generated view name vw_ssq_1 (ssq = scalar sub query). In the second variation Oracle has used a Bloom filter to reduce the volume of data passed up from the view to the hash join operator, and in the third variation Oracle has used complex view merging to handle the join before performing the aggregation.

The anomaly

Clearly the optimizer is smarter than I am with this query – it spotted that I didn’t need that max() aggregation and took it out. So maybe I should take a hint from the optimizer and edit the query to remove the max(). Here’s the plan I get if I do:


----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   167 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| COMPANIES |     1 |    35 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | COM_PK    |     1 |       |     1   (0)| 00:00:01 |
|*  3 |  TABLE ACCESS FULL          | ORDERS    |   145 |  3335 |    22  (23)| 00:00:01 |
-----------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - COMPANY / COM@COMPANY
   2 - COMPANY / COM@COMPANY
   3 - MAIN    / ORD@MAIN

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("COM"."ID"=:B1)
   3 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-1)

The optimizer doesn’t unnest the subquery (and it produces an execution plan that has a far higher cost than the unnested version). You can see from the Query Block information that the final query still consists of the two original query blocks, and the plan shows the standard “subquery in the select list” pattern – the main query block at the end of the plan with the scalar subquery(ies) above it. Even if I insert an /*+ unnest */ hint in the subquery the optimizer will not unnest the scalar subquery.

This really looks like a bug – which means I have to take a look at the CBO (10053) trace file; and here are the critical lines (from the 19.3 trace file which is significantly more informative than the 12.2 file):

...
SU: Checking validity of scalar subquery unnesting for query block COMPANY (#0)
SU: bypassed: Scalar subquery may return more than one row.
...

When we used an aggregate subquery the optimizer knew the aggregation was redundant because it was querying with equality on the primary key, so it eliminated the aggregation step from the plan; but when we don’t specify an aggregate the optimizer thinks the primary key will return more than one row!

So do we have a “documentation” bug where the trace file is simply reporting the wrong reason for bypassing unnesting, or do we have a logic bug where the optimizer makes a mistake when checking for uniqueness ? (I’d lile it to be the latter, and see a fix some time soon, of course.)

Conclusion

There are cases where the optimizer ought to be unnesting scalar subqueries in the select list but fails to do so for what seems to be a completely spurious reason. Unfortunately some people find it very convenient to write SQL that does primary key lookups as in-line scalar subqueries instead of joins; so if you find examples like this then (for the short term, at least) you might see some performance benefit by introducing a redundant max() operation in the scalar subquery.

Footnote

If you want to see more examples of how Oracle does, or doesn’t handle scalar subqueries in the select list there are a couple of articles on Nenad Noveljic’s blog one comparing how Oracle and SQL Server handle a particular case, the other looking at several other cases.

 

3 Comments »

  1. There have been times when I’ve used a SELECT-list SSQ merely out of convenience–some GROUP BY situations come to mind.

    But more than a few times, I’ve found it important to use one for performance. There have been times when I’ve needed an SSQ because a more ordinary join would cause the CBO to start picking some bad plan, or to fail to notice some other optimization. I also seem to recall that this was more likely to happen with a large number of tables being joined in the main query.

    My experience has been on older database versions, up through 11.2.0.4. It could be that 12c+ does a better job such that SSQs for performance aren’t nearly as important. But I’m not surprised to hear about them being somewhat common in the wild. If 12c+ does a better job of handling the ostensibly better way to write the query, even so it should do at least as good of a job of handling SSQs since it’s still likely to encounter them in older queries.

    Comment by Jason Bucata — January 29, 2020 @ 5:07 pm GMT Jan 29,2020 | Reply

    • Jason,

      Thanks for the comment.

      As ever it’s the problem of trying to balance the various evils.

      Aggregate in-line scalar subqueries were really quite useful as a tuning tool because the alternative was either a join with massive aggregate or a no-mergeable aggregate view in the FROM clause with a join that may or may not have used a predicate pushdown – and very often both options had a nasty downside that you could avoid with the inline SSQ.

      Putting a lightweight (basicaly a non-aggregate) SSQ in the select list to reduce the number of joins in the FROM clause was (I felt) always a bit of a last-ditch strategy for helping the optimizer by reducing the number of tables it had to consider in the join – better than stacking the query with loads of hints, but not as nice as working out why a bad join order had appeared. I’m not sure, though, that many people were thinking that when they used the strategy, I suspect many of them were simply trying to find what (to them) was an easier way of expressing the requirement. That’s also a totally valid argument for doing it, of course,, but it’s nice that the optimizer is now (in theory) able to decide whether or not to run the query that way or move the inline to be a join.

      It definitely a good thing that the /*+ unnest */ and /*+ no_unnest */ hints can be used so that people can write the query the way they’d like to read it but tell the optimizer which strategy to adopt for executing.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — January 29, 2020 @ 5:32 pm GMT Jan 29,2020 | Reply

  2. I can’t help but quickly try Nenad’s code in Postgres. I have only 9.6.2 on my laptop, so one should check what has changed in newer versions. But still looks like Postgres does SSQ it in its own special way :)

    create table t_1k ( n1 integer ) ;
    
    create table t_100k ( n1 integer ) ;
    
    insert into t_1k
      select *
        from generate_series(1,1000);
    
    insert into t_100k
     select *
        from generate_series(1,100000);
    
    
    analyze t_1k;
    analyze t_100k;
    
    
    explain analyze
    select 
      case when exists ( 
        select null
    	  from t_1k 
    	  where t_1k.n1 = t_100k.n1 
      ) then 1 else 0 end
      from t_100k ;
    
    "Seq Scan on t_100k  (cost=0.00..1751443.00 rows=100000 width=4) (actual time=0.482..31.024 rows=100000 loops=1)"
    "  SubPlan 1"
    "    ->  Seq Scan on t_1k  (cost=0.00..17.50 rows=1 width=0) (never executed)"
    "          Filter: (n1 = t_100k.n1)"
    "  SubPlan 2"
    "    ->  Seq Scan on t_1k t_1k_1  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.004..0.207 rows=1000 loops=1)"
    "Planning time: 0.054 ms"
    "Execution time: 46.570 ms"
    
    

    We can see two SubPlan branches, one for full access (Seq Scan) with filter, another one that returns all rows from t_1k. SubPlan is a plan tree driven by a parent row source. Loops=1 (it is analog of Starts in Oracle’s DISPLAY_CURSOR) in SubPlan 2 suggests that Postgres fetched all rows in one go, though it also considered running the subquery for every row from the driving table in SubPlan 1, but left this idea, the branch is “(never executed)”.
    Though, it still not clear in this version which exact algorithm Postgres used to join the data

    Comment by Viacheslav Andzhich — February 15, 2020 @ 12:04 pm GMT Feb 15,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply to Jason Bucata Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.