I hesitate to call something a bug simply because Oracle doesn’t do what I thought it would do; but when a trace file says:
“I’m not going to do X because P is not true“
followed a little later by
“I’m going to do Y because P is true“
then I think it’s safe to say there’s a bug there somewhere – even if it’s only a bug in the code that writes the trace file.
The example in this note is a feature that appeared in 12c (possibly only 12.2) – the ability to unnest scalar subqueries in the select list and transform them into outer joins. Here’s an example to demonstrate the mechanism:
rem rem Script: ssq_anomaly.sql rem Author: Jonathan Lewis rem Dated: Jan 2020 rem rem Last tested rem 19.3.0.0 rem 12.2.0.1 rem create table companies as select rownum id, dbms_random.string('U',30) name from all_objects where rownum <= 1e4 -- > comment to avoid wordpress format issue ; alter table companies add constraint com_pk primary key(id); create table orders ( id number(10,0), id_company number(10,0) not null, date_placed date, status varchar2(1), items number(3,0), valuation number(6,2), constraint ord_fk_com foreign key(id_company) references companies, constraint ord_pk primary key(id) ) / insert into orders select rownum, trunc(dbms_random.value(1,1e4)) id_company, sysdate - 100 + rownum/100 date_placed, chr(64+dbms_random.value(1,6)) status, trunc(dbms_random.value(1,11)) items, round(dbms_random.value(1,250),2) valuation from all_objects where rownum <= 1e4 -- > comment to avoid wordpress format issue ; begin dbms_stats.gather_table_stats( ownname => null, tabname => 'orders', method_opt => 'for all columns size 1' ); end; / set serveroutput off set linesize 180 set pagesize 60 set trimspool on select /*+ qb_name(main) */ ord.id, ( select /*+ qb_name(company) */ max(com.name) from companies com where com.id = ord.id_company ) company, ord.valuation, ord.status from orders ord where ord.date_placed > trunc(sysdate) - 1 / select * from table(dbms_xplan.display_cursor(null,null,'alias')) /
I’ve created an orders table with an id_company column that is declared as a foreign key to a companies table. When I’ve queried the orders table and reported the company associated with an order I’ve been a little silly and used a correlated scalar subquery in the select list to query the companies table instead of doing a simple join. In fact I’ve been more than a little silly because I’ve used an aggregate when the query is by primary key and can only return one row.
Here’s the execution plan (produced by 12.2 or 19.3)
-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 33 (100)| | |* 1 | HASH JOIN OUTER | | 143 | 8294 | 33 (22)| 00:00:01 | |* 2 | TABLE ACCESS FULL| ORDERS | 143 | 3289 | 22 (23)| 00:00:01 | | 3 | TABLE ACCESS FULL| COMPANIES | 10000 | 341K| 10 (10)| 00:00:01 | -------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - SEL$EBD4C958 2 - SEL$EBD4C958 / ORD@MAIN 3 - SEL$EBD4C958 / COM@COMPANY Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("COM"."ID"="ORD"."ID_COMPANY") 2 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-1) Note ----- - this is an adaptive plan
The optimizer has taken my query and turned it into a simple (outer) join between the two tables. I’ve included the Query Block / Alias information in the output so that you can see that Oracle really has generated a new query block by transforming the two separate query blocks in the original query.
Oracle has been very clever here – it has even recognised that the join is going to use a unique scan of a unique key so it has taken out the redundant aggregation step. In many cases where this type of scalar subquery unnesting is used you’re more likely to see a plan with one of the following shapes:
---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 36 (100)| | |* 1 | HASH JOIN OUTER | | 144 | 286K| 36 (28)| 00:00:01 | |* 2 | TABLE ACCESS FULL | ORDERS | 144 | 3312 | 22 (23)| 00:00:01 | | 3 | VIEW | VW_SSQ_1 | 10000 | 19M| 13 (31)| 00:00:01 | | 4 | HASH GROUP BY | | 10000 | 341K| 13 (31)| 00:00:01 | | 5 | TABLE ACCESS FULL| COMPANIES | 10000 | 341K| 10 (10)| 00:00:01 | ---------------------------------------------------------------------------------- ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 36 (100)| | |* 1 | HASH JOIN OUTER | | 144 | 286K| 36 (28)| 00:00:01 | | 2 | JOIN FILTER CREATE | :BF0000 | 144 | 3312 | 22 (23)| 00:00:01 | |* 3 | TABLE ACCESS FULL | ORDERS | 144 | 3312 | 22 (23)| 00:00:01 | | 4 | VIEW | VW_SSQ_1 | 10000 | 19M| 13 (31)| 00:00:01 | | 5 | HASH GROUP BY | | 10000 | 341K| 13 (31)| 00:00:01 | | 6 | JOIN FILTER USE | :BF0000 | 10000 | 341K| 10 (10)| 00:00:01 | |* 7 | TABLE ACCESS FULL| COMPANIES | 10000 | 341K| 10 (10)| 00:00:01 | ----------------------------------------------------------------------------------- --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 34 (100)| | | 1 | HASH GROUP BY | | 144 | 8784 | 34 (24)| 00:00:01 | |* 2 | HASH JOIN OUTER | | 144 | 8784 | 33 (22)| 00:00:01 | |* 3 | TABLE ACCESS FULL| ORDERS | 144 | 3744 | 22 (23)| 00:00:01 | | 4 | TABLE ACCESS FULL| COMPANIES | 10000 | 341K| 10 (10)| 00:00:01 | ---------------------------------------------------------------------------------
The first variation shows the creation of an aggregate view that is used in the join – note the generated view name vw_ssq_1 (ssq = scalar sub query). In the second variation Oracle has used a Bloom filter to reduce the volume of data passed up from the view to the hash join operator, and in the third variation Oracle has used complex view merging to handle the join before performing the aggregation.
The anomaly
Clearly the optimizer is smarter than I am with this query – it spotted that I didn’t need that max() aggregation and took it out. So maybe I should take a hint from the optimizer and edit the query to remove the max(). Here’s the plan I get if I do:
---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 167 (100)| | | 1 | TABLE ACCESS BY INDEX ROWID| COMPANIES | 1 | 35 | 2 (0)| 00:00:01 | |* 2 | INDEX UNIQUE SCAN | COM_PK | 1 | | 1 (0)| 00:00:01 | |* 3 | TABLE ACCESS FULL | ORDERS | 145 | 3335 | 22 (23)| 00:00:01 | ----------------------------------------------------------------------------------------- Query Block Name / Object Alias (identified by operation id): ------------------------------------------------------------- 1 - COMPANY / COM@COMPANY 2 - COMPANY / COM@COMPANY 3 - MAIN / ORD@MAIN Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("COM"."ID"=:B1) 3 - filter("ORD"."DATE_PLACED">TRUNC(SYSDATE@!)-1)
The optimizer doesn’t unnest the subquery (and it produces an execution plan that has a far higher cost than the unnested version). You can see from the Query Block information that the final query still consists of the two original query blocks, and the plan shows the standard “subquery in the select list” pattern – the main query block at the end of the plan with the scalar subquery(ies) above it. Even if I insert an /*+ unnest */ hint in the subquery the optimizer will not unnest the scalar subquery.
This really looks like a bug – which means I have to take a look at the CBO (10053) trace file; and here are the critical lines (from the 19.3 trace file which is significantly more informative than the 12.2 file):
... SU: Checking validity of scalar subquery unnesting for query block COMPANY (#0) SU: bypassed: Scalar subquery may return more than one row. ...
When we used an aggregate subquery the optimizer knew the aggregation was redundant because it was querying with equality on the primary key, so it eliminated the aggregation step from the plan; but when we don’t specify an aggregate the optimizer thinks the primary key will return more than one row!
So do we have a “documentation” bug where the trace file is simply reporting the wrong reason for bypassing unnesting, or do we have a logic bug where the optimizer makes a mistake when checking for uniqueness ? (I’d lile it to be the latter, and see a fix some time soon, of course.)
Conclusion
There are cases where the optimizer ought to be unnesting scalar subqueries in the select list but fails to do so for what seems to be a completely spurious reason. Unfortunately some people find it very convenient to write SQL that does primary key lookups as in-line scalar subqueries instead of joins; so if you find examples like this then (for the short term, at least) you might see some performance benefit by introducing a redundant max() operation in the scalar subquery.
Footnote
If you want to see more examples of how Oracle does, or doesn’t handle scalar subqueries in the select list there are a couple of articles on Nenad Noveljic’s blog one comparing how Oracle and SQL Server handle a particular case, the other looking at several other cases.
Update (May 2021)
Running a quick check on 19.11.0.0 (and 21.3.0.0) – the 10053 trace file still shows the message:
SU: Considering subquery unnesting in query block MAIN (#0) ******************** Subquery Unnest (SU) ******************** SU: Considering bottom-up subquery unnesting SU: Checking validity for Null Accepting Semi JoinUnnesting for query block MAIN(#0) SU: Checking validity of scalar subquery unnesting for query block COMPANY (#0) SU: bypassed: Scalar subquery may return more than one row.
Even though the subquery is accessing a table by primary key.
There have been times when I’ve used a SELECT-list SSQ merely out of convenience–some GROUP BY situations come to mind.
But more than a few times, I’ve found it important to use one for performance. There have been times when I’ve needed an SSQ because a more ordinary join would cause the CBO to start picking some bad plan, or to fail to notice some other optimization. I also seem to recall that this was more likely to happen with a large number of tables being joined in the main query.
My experience has been on older database versions, up through 11.2.0.4. It could be that 12c+ does a better job such that SSQs for performance aren’t nearly as important. But I’m not surprised to hear about them being somewhat common in the wild. If 12c+ does a better job of handling the ostensibly better way to write the query, even so it should do at least as good of a job of handling SSQs since it’s still likely to encounter them in older queries.
Comment by Jason Bucata — January 29, 2020 @ 5:07 pm GMT Jan 29,2020 |
Jason,
Thanks for the comment.
As ever it’s the problem of trying to balance the various evils.
Aggregate in-line scalar subqueries were really quite useful as a tuning tool because the alternative was either a join with massive aggregate or a no-mergeable aggregate view in the FROM clause with a join that may or may not have used a predicate pushdown – and very often both options had a nasty downside that you could avoid with the inline SSQ.
Putting a lightweight (basicaly a non-aggregate) SSQ in the select list to reduce the number of joins in the FROM clause was (I felt) always a bit of a last-ditch strategy for helping the optimizer by reducing the number of tables it had to consider in the join – better than stacking the query with loads of hints, but not as nice as working out why a bad join order had appeared. I’m not sure, though, that many people were thinking that when they used the strategy, I suspect many of them were simply trying to find what (to them) was an easier way of expressing the requirement. That’s also a totally valid argument for doing it, of course,, but it’s nice that the optimizer is now (in theory) able to decide whether or not to run the query that way or move the inline to be a join.
It definitely a good thing that the /*+ unnest */ and /*+ no_unnest */ hints can be used so that people can write the query the way they’d like to read it but tell the optimizer which strategy to adopt for executing.
Regards
Jonathan Lewis
Comment by Jonathan Lewis — January 29, 2020 @ 5:32 pm GMT Jan 29,2020 |
I can’t help but quickly try Nenad’s code in Postgres. I have only 9.6.2 on my laptop, so one should check what has changed in newer versions. But still looks like Postgres does SSQ it in its own special way :)
We can see two SubPlan branches, one for full access (Seq Scan) with filter, another one that returns all rows from t_1k. SubPlan is a plan tree driven by a parent row source. Loops=1 (it is analog of Starts in Oracle’s DISPLAY_CURSOR) in SubPlan 2 suggests that Postgres fetched all rows in one go, though it also considered running the subquery for every row from the driving table in SubPlan 1, but left this idea, the branch is “(never executed)”.
Though, it still not clear in this version which exact algorithm Postgres used to join the data
Comment by Viacheslav Andzhich — February 15, 2020 @ 12:04 pm GMT Feb 15,2020 |
[…] should be rewritten to use joins (though in newer vesions of Oracle some of the subqueries might be transformed to outer joins anyway). We also know that a distinct may be a hint that there’s a possible logic error that […]
Pingback by Case Study | Oracle Scratchpad — September 30, 2022 @ 10:45 am BST Sep 30,2022 |