Here’s a little script to demonstrate an observation that appeared in my email this morning about a missed opportunity for avoiding work:
rem rem Script: local_minus.sql rem Author: Jonathan Lewis rem Dated: Feb 2014 rem create table t1 as select * from all_objects where rownum = 1; delete from t1; commit; create table t2 as select * from all_objects where rownum <= 100000 -- > comment to avoid wordpress format issue ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); dbms_stats.gather_table_stats( ownname => user, tabname =>'T2', method_opt => 'for all columns size 1' ); end; / alter system flush buffer_cache; alter session set events '10046 trace name context forever, level 8'; prompt ====================== prompt And now the test query prompt ====================== select * from t1 minus select * from t2 ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); alter session set events '10046 trace name context off';
Clearly the first query block in the test query will return no rows, and since the MINUS operator returns rows from the first result set that do not appear in the second result set there is no need for Oracle to run the second query block. Well, guess what …
The ‘create where rownum = 1’ followed by ‘delete’ is a lazy workaround to avoid side effects of deferred segment creation so that you can run the script on any (recent) version of Oracle. The flush, combined with 10046 trace, allowed me to see waits that showed which objects Oracle scanned and when, and the display_cursor() was just icing on the cake.
I’ve checked 11.2.0.4, 12.1.0.2, and 12.2.0.1 and all three scan t1 first and then scan t2 unnecessarily. [Ed: but see footnote – later tests showed 12.2 behaving nicely]
This surprised me slightly given how smart the optimizer can be, but I guess it’s one of those boundary cases where the optimizer has just one strategy for an entire class of queries. I couldn’t think of any “legal” way to control the effect, but here’s the first dirty trick that came to my mind. If you’re sure that the first subquery is going to be cheap and you’re worried that the second subquery is expensive, you could do the following:
select v2.* from (select * from t1 where rownum = 1) v1, ( select * from t1 minus select * from t2 ) v2 ;
Introduce a spurious query to return one row from the first subquery and join it do the MINUS query. If the inline view doesn’t return any rows Oracle short-circuits the join, as shown by the following execution path with stats:
----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 2 | 32 | | | | | 1 | MERGE JOIN CARTESIAN | | 1 | 1 | 0 |00:00:00.01 | 2 | 32 | | | | | 2 | VIEW | | 1 | 1 | 0 |00:00:00.01 | 2 | 32 | | | | |* 3 | COUNT STOPKEY | | 1 | | 0 |00:00:00.01 | 2 | 32 | | | | | 4 | TABLE ACCESS FULL | T1 | 1 | 1 | 0 |00:00:00.01 | 2 | 32 | | | | | 5 | BUFFER SORT | | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 73728 | 73728 | | | 6 | VIEW | | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | | | 7 | MINUS | | 0 | | 0 |00:00:00.01 | 0 | 0 | | | | | 8 | SORT UNIQUE | | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | 73728 | 73728 | | | 9 | TABLE ACCESS FULL| T1 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | | | 10 | SORT UNIQUE | | 0 | 70096 | 0 |00:00:00.01 | 0 | 0 | 73728 | 73728 | | | 11 | TABLE ACCESS FULL| T2 | 0 | 70096 | 0 |00:00:00.01 | 0 | 0 | | | | ----------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter(ROWNUM=1)
The only thing to watch out for is that the “rownum = 1” doesn’t make the optimizer switch to an unsuitable “first_rows(1)” execution plan.
Footnote:
If you follow the pingback in comment 4 below you’ll see a note that this strategy was a deliberate choice by the developers of the optimizer.
Update (Sept 2020)
I’ve just tested the script against 19.3.0.0 and t2 is no longer scanned if t1 is empty.
Before running the 19.3.0.0 test I re-ran the script against 12.2.0.1 and found that that version also behave properly (i.e. no redundant scan of t2). I”m not sure if this is a consequence of some patches being applied to the 12.2, or whether it was an earlier mistake. In both versions t2 was scanned redundantly if the query was hint to run parallel.
the same holds for “intersect”
replace all occurrencies of “minus” by “intersect” in your article and it’s also true
Comment by Matthias Rogel — February 5, 2014 @ 7:26 pm GMT Feb 5,2014 |
the second trick would probably be
?
( avoiding probable first_rows-side-effects )
Comment by Matthias Rogel — February 5, 2014 @ 7:41 pm GMT Feb 5,2014 |
Matthias,
That looks viable, but I think the optimizer will still use first_rows(1) for existence.
Comment by Jonathan Lewis — February 5, 2014 @ 7:46 pm GMT Feb 5,2014 |
Jonathan,
I think it’s just yet another case when set-to-join conversions are more preferrable:
Comment by Sayan Malakshinov — February 12, 2014 @ 2:01 pm GMT Feb 12,2014 |
[…] little while ago I highlighted a special case with the MINUS operator (that one of the commentators extended to include the INTERSECT operator) relating to the way the […]
Pingback by Empty Hash | Oracle Scratchpad — February 28, 2014 @ 6:45 pm GMT Feb 28,2014 |