No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.
Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:
rem rem Script: semi_join_caching.sql rem Dated: July 2015 rem Author: J.P.Lewis rem create table chi as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum - 1 id, trunc((rownum-1)/10) n1, trunc(dbms_random.value(0,1000)) n2, rpad('x',1000) padding from generator ; create table par as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum - 1 id, rpad('x',1000) padding from generator where rownum <= 1e3 ; alter table par modify id not null; alter table par add constraint par_pk primary key(id) -- deferrable ; -- Now gather stats on the tables.
The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.
You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).
So here’s the test code:
set serveroutput off set linesize 156 set trimspool on set pagesize 60 alter session set statistics_level = all; prompt ============================= prompt Strictly ordered driving data prompt ============================= select /*+ leading(@sel$5da710d3 chi@sel$1 par@sel$2) full (@sel$5da710d3 chi@sel$1) use_nl (@sel$5da710d3 par@sel$2) index (@sel$5da710d3 par@sel$2 (par.id)) */ count(*) from chi where exists ( select null from par where par.id = chi.n1 ) ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost')); prompt ============================= prompt Randomly ordered driving data prompt ============================= select /*+ leading(@sel$5da710d3 chi@sel$1 par@sel$2) full (@sel$5da710d3 chi@sel$1) use_nl (@sel$5da710d3 par@sel$2) index (@sel$5da710d3 par@sel$2 (par.id)) */ count(*) from chi where exists ( select null from par where par.id = chi.n2 ) ; select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost')); set serveroutput on alter session set statistics_level = typical;
In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4
============================= Strictly ordered driving data ============================= -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 190 (100)| 1 |00:00:00.14 | 1450 | 1041 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.14 | 1450 | 1041 | | 2 | NESTED LOOPS SEMI | | 1 | 10065 | 190 (4)| 10000 |00:00:00.12 | 1450 | 1041 | | 3 | TABLE ACCESS FULL| CHI | 1 | 10065 | 186 (2)| 10000 |00:00:00.07 | 1434 | 1037 | |* 4 | INDEX UNIQUE SCAN| PAR_PK | 1000 | 1048 | 0 (0)| 1000 |00:00:00.01 | 16 | 4 | -------------------------------------------------------------------------------------------------------------- ============================= Randomly ordered driving data ============================= ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 190 (100)| 1 |00:00:00.12 | 5544 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.12 | 5544 | | 2 | NESTED LOOPS SEMI | | 1 | 10065 | 190 (4)| 10000 |00:00:00.10 | 5544 | | 3 | TABLE ACCESS FULL| CHI | 1 | 10065 | 186 (2)| 10000 |00:00:00.02 | 1434 | |* 4 | INDEX UNIQUE SCAN| PAR_PK | 4033 | 1048 | 0 (0)| 4033 |00:00:00.02 | 4110 | -----------------------------------------------------------------------------------------------------
Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.
As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.
Time to upgrade to 12.1.0.2 and see what happens:
============================= Strictly ordered driving data ============================= -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 189 (100)| 1 |00:00:00.22 | 1448 | 1456 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.22 | 1448 | 1456 | | 2 | NESTED LOOPS | | 1 | 10000 | 189 (4)| 10000 |00:00:00.20 | 1448 | 1456 | | 3 | TABLE ACCESS FULL| CHI | 1 | 10000 | 185 (2)| 10000 |00:00:00.03 | 1432 | 1429 | |* 4 | INDEX UNIQUE SCAN| PAR_PK | 10000 | 1 | 0 (0)| 10000 |00:00:00.06 | 16 | 27 | -------------------------------------------------------------------------------------------------------------- ============================= Randomly ordered driving data ============================= -------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | Cost (%CPU)| A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 189 (100)| 1 |00:00:00.22 | 11588 | 1429 | | 1 | SORT AGGREGATE | | 1 | 1 | | 1 |00:00:00.22 | 11588 | 1429 | | 2 | NESTED LOOPS | | 1 | 10000 | 189 (4)| 10000 |00:00:00.19 | 11588 | 1429 | | 3 | TABLE ACCESS FULL| CHI | 1 | 10000 | 185 (2)| 10000 |00:00:00.03 | 1432 | 1429 | |* 4 | INDEX UNIQUE SCAN| PAR_PK | 10000 | 1 | 0 (0)| 10000 |00:00:00.07 | 10156 | 0 | --------------------------------------------------------------------------------------------------------------
Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.
The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.
Footnote
There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation. This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.
It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available to standard joins (in cases such as “join to parent”, perhaps).
Hello Jonathan,
There is a fix_control 17088819 which can be used to prevent semi-to-inner transformation.
I have performed my tests in 12.1.0.2 with DBBP 12.1.0.2.7 applied (Patch 20698050: DATABASE PATCH FOR ENGINEERED SYSTEMS AND DB IN-MEMORY 12.1.0.2.7 (APR2015)):
Best regards,
Mikhail.
Comment by Mikhail Velikikh — August 14, 2015 @ 6:17 am BST Aug 14,2015 |
Mikhail,
Thanks for that. It’s nice to know there’s a workaround.
Comment by Jonathan Lewis — August 18, 2015 @ 12:43 pm BST Aug 18,2015 |
Hi Jonathan,
If you dont mind me posting one of the case when optimizer chooses NESTED LOOPS over NESTED LOOPS SEMI.
My original investigations was motivated by your blog post.
Some background info: our application developers mostly prefer IN over EXISTS.
I checked applicability of your blog post to queries with IN and found some cases when optimizer prefers NESTED LOOPS over NESTED LOOPS SEMI.
I slightly modified your test to emulate the problem which I discover in my application:
Notice that primary key on PAR table contains ID and PADDING columns.
Now I execute a query:
Optimizer chooses NESTED LOOPS:
But NESTED LOOPS SEMI is more efficient (lesser Starts column in line 4, lesser LIO as a consequence):
I observe the same behaviour in 11.2.0.4/12.1.0.2 database instances.
Best regards,
Mikhail.
Comment by Mikhail Velikikh — August 19, 2015 @ 9:33 am BST Aug 19,2015 |
Mikhail,
Thanks for the follow-up.
Comment by Jonathan Lewis — August 19, 2015 @ 10:11 am BST Aug 19,2015 |
[…] will be using a slightly modified version of the example used by Jonathan Lewis here to demonstrate the enhancement introduced in 12c to the semi-join to inner-join […]
Pingback by Semi-join to Inner-join enhacement in 12C | Hatem Mahmoud Oracle's blog — March 11, 2016 @ 11:03 am GMT Mar 11,2016 |
[…] (In fact, it’s also capable of using the same caching mechanism as scalar subquery caching so it can be even more efficient than just “stop on first match” – it can even stop before trying because […]
Pingback by Execution Plans | Oracle Scratchpad — April 27, 2020 @ 11:57 am BST Apr 27,2020 |