Oracle Scratchpad

July 20, 2015

12c Downgrade

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:12 pm BST Jul 20,2015

No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


rem
rem     Script:   semi_join_caching.sql
rem     Dated:    July 2015
rem     Author:   J.P.Lewis
rem

create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum <= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.

You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.

As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.

The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation.  This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available  to standard joins (in cases such as “join to parent”, perhaps).

5 Comments »

  1. Hello Jonathan,

    There is a fix_control 17088819 which can be used to prevent semi-to-inner transformation.
    I have performed my tests in 12.1.0.2 with DBBP 12.1.0.2.7 applied (Patch 20698050: DATABASE PATCH FOR ENGINEERED SYSTEMS AND DB IN-MEMORY 12.1.0.2.7 (APR2015)):

    SQL> select * from v$system_fix_control where bugno=17088819;
    
         BUGNO   VALUE SQL_FEATURE                    DESCRIPTION                                                      OPTIMIZER_FEATURE_ENABLE       EVENT IS_DEFAULT     CON_ID
    ---------- ------- ------------------------------ ---------------------------------------------------------------- ------------------------- ---------- ---------- ----------
      17088819       1 QKSFM_FIRST_ROWS_13848786      inner join for unnesting exists subquery with unique column      12.1.0.2                           0          1          0
    
    select
         /*+ 
             use_nl(@sel$5da710d3 par@sel$2)
             leading(@sel$5da710d3 chi@sel$1 par@sel$2)
             full(@sel$5da710d3 chi@sel$1)
             index(@sel$5da710d3 par@sel$2 (par.id))
             opt_param( '_fix_control' '17088819:0')
         */
            count(*)
      from chi
     where exists (
             select null
               from par
              where par.id = chi.n1)
    
    Plan hash value: 1111382874
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |        |      1 |        |   399 (100)|      1 |00:00:00.02 |    1450 |
    |   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.02 |    1450 |
    |   2 |   NESTED LOOPS SEMI |        |      1 |  10000 |   399   (1)|  10000 |00:00:00.02 |    1450 |
    |   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   398   (1)|  10000 |00:00:00.01 |    1434 |
    |*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1000 |     0   (0)|   1000 |00:00:00.01 |      16 |
    -----------------------------------------------------------------------------------------------------
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    1 - SEL$5DA710D3
    3 - SEL$5DA710D3 / CHI@SEL$1
    4 - SEL$5DA710D3 / PAR@SEL$2
    Outline Data
    -------------
    /*+
    BEGIN_OUTLINE_DATA
    IGNORE_OPTIM_EMBEDDED_HINTS
    OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
    DB_VERSION('12.1.0.2')
    OPT_PARAM('_optimizer_aggr_groupby_elim' 'false')
    OPT_PARAM('_optimizer_reduce_groupby_key' 'false')
    OPT_PARAM('_fix_control' '17088819:0')
    ALL_ROWS
    OUTLINE_LEAF(@"SEL$5DA710D3")
    UNNEST(@"SEL$2")
    OUTLINE(@"SEL$1")
    OUTLINE(@"SEL$2")
    FULL(@"SEL$5DA710D3" "CHI"@"SEL$1")
    INDEX(@"SEL$5DA710D3" "PAR"@"SEL$2" ("PAR"."ID"))
    LEADING(@"SEL$5DA710D3" "CHI"@"SEL$1" "PAR"@"SEL$2")
    USE_NL(@"SEL$5DA710D3" "PAR"@"SEL$2")
    END_OUTLINE_DATA
    */
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    4 - access("PAR"."ID"="CHI"."N1")
    

    Best regards,
    Mikhail.

    Comment by Mikhail Velikikh — August 14, 2015 @ 6:17 am BST Aug 14,2015 | Reply

  2. Hi Jonathan,

    If you dont mind me posting one of the case when optimizer chooses NESTED LOOPS over NESTED LOOPS SEMI.
    My original investigations was motivated by your blog post.
    Some background info: our application developers mostly prefer IN over EXISTS.
    I checked applicability of your blog post to queries with IN and found some cases when optimizer prefers NESTED LOOPS over NESTED LOOPS SEMI.

    I slightly modified your test to emulate the problem which I discover in my application:

    SQL> create table chi
      2  as
      3  with generator as (
      4          select  --+ materialize
      5                  rownum  id
      6          from dual
      7          connect by
      8                  level <= 1e4
      9  )
     10  select
     11          rownum - 1                              id,
     12          trunc((rownum-1)/10)                    n1,
     13          trunc(dbms_random.value(0,1000))        n2,
     14          rpad('x',1000)                          padding
     15  from
     16          generator
     17  ;
    SQL> 
    SQL> create table par
      2  as
      3  with generator as (
      4          select  --+ materialize
      5                  rownum  id
      6          from dual
      7          connect by
      8                  level <= 1e4
      9  )
     10  select
     11          rownum - 1      id,
     12          rpad('x',1000)  padding
     13  from
     14          generator
     15  where
     16          rownum <= 1e3
     17  ;
    SQL> 
    SQL> alter table par add constraint par primary key (id, padding);
    SQL> 
    SQL> exec dbms_stats.gather_table_stats( '', 'chi')
    SQL> exec dbms_stats.gather_table_stats( '', 'par')
    

    Notice that primary key on PAR table contains ID and PADDING columns.
    Now I execute a query:

    select
      /*+ gather_plan_statistics
          use_nl(@sel$5da710d3 par@sel$2)
          leading(@sel$5da710d3 chi@sel$1 par@sel$2)
          index(@sel$5da710d3 par@sel$2 (par.id par.padding))
      */
           count(*)
      from chi
     where n2 in (
             select id
               from par
              where padding = 'x'
            )
    /
    

    Optimizer chooses NESTED LOOPS:

    SQL_ID  46s6n42zuar9t, child number 0
    -------------------------------------
    select   /*+ gather_plan_statistics       use_nl(@sel$5da710d3
    par@sel$2)       leading(@sel$5da710d3 chi@sel$1 par@sel$2)
    index(@sel$5da710d3 par@sel$2 (par.id par.padding))   */
    count(*)   from chi  where n2 in (          select id            from
    par           where padding = 'x'         )
    
    Plan hash value: 337306170
    
    ---------------------------------------------------------------------------------------------------
    | Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |      |      1 |        |   399 (100)|      1 |00:00:00.06 |   11436 |
    |   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.06 |   11436 |
    |   2 |   NESTED LOOPS      |      |      1 |     10 |   399   (1)|      0 |00:00:00.06 |   11436 |
    |   3 |    TABLE ACCESS FULL| CHI  |      1 |  10000 |   398   (1)|  10000 |00:00:00.01 |    1434 |
    |*  4 |    INDEX UNIQUE SCAN| PAR  |  10000 |      1 |     0   (0)|      0 |00:00:00.04 |   10002 |
    ---------------------------------------------------------------------------------------------------
    
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    
       1 - SEL$5DA710D3
       3 - SEL$5DA710D3 / CHI@SEL$1
       4 - SEL$5DA710D3 / PAR@SEL$2
    
    Outline Data
    -------------
    
      /*+
          BEGIN_OUTLINE_DATA
          IGNORE_OPTIM_EMBEDDED_HINTS
          OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
          DB_VERSION('12.1.0.2')
          OPT_PARAM('_optimizer_aggr_groupby_elim' 'false')
          OPT_PARAM('_optimizer_reduce_groupby_key' 'false')
          ALL_ROWS
          OUTLINE_LEAF(@"SEL$5DA710D3")
          UNNEST(@"SEL$2")
          OUTLINE(@"SEL$1")
          OUTLINE(@"SEL$2")
          FULL(@"SEL$5DA710D3" "CHI"@"SEL$1")
          INDEX(@"SEL$5DA710D3" "PAR"@"SEL$2" ("PAR"."ID" "PAR"."PADDING"))
          LEADING(@"SEL$5DA710D3" "CHI"@"SEL$1" "PAR"@"SEL$2")
          USE_NL(@"SEL$5DA710D3" "PAR"@"SEL$2")
          END_OUTLINE_DATA
      */
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - access("N2"="ID" AND "PADDING"='x')
    

    But NESTED LOOPS SEMI is more efficient (lesser Starts column in line 4, lesser LIO as a consequence):

    SQL_ID  5vxb7c4dk1xbh, child number 5
    -------------------------------------
    select   /*+ gather_plan_statistics       opt_param('_fix_control'
    '599680:0')       use_nl(@sel$5da710d3 par@sel$2)
    leading(@sel$5da710d3 chi@sel$1 par@sel$2)       index(@sel$5da710d3
    par@sel$2 (par.id par.padding))   */        count(*)   from chi  where
    n2 in (          select id            from par           where padding
    = 'x'         )
    
    Plan hash value: 514621541
    
    ---------------------------------------------------------------------------------------------------
    | Id  | Operation           | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
    ---------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |      |      1 |        |   399 (100)|      1 |00:00:00.04 |    5438 |
    |   1 |  SORT AGGREGATE     |      |      1 |      1 |            |      1 |00:00:00.04 |    5438 |
    |   2 |   NESTED LOOPS SEMI |      |      1 |     10 |   399   (1)|      0 |00:00:00.04 |    5438 |
    |   3 |    TABLE ACCESS FULL| CHI  |      1 |  10000 |   398   (1)|  10000 |00:00:00.02 |    1434 |
    |*  4 |    INDEX UNIQUE SCAN| PAR  |   4002 |      1 |     0   (0)|      0 |00:00:00.02 |    4004 |
    ---------------------------------------------------------------------------------------------------
    
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    
       1 - SEL$5DA710D3
       3 - SEL$5DA710D3 / CHI@SEL$1
       4 - SEL$5DA710D3 / PAR@SEL$2
    
    Outline Data
    -------------
    
      /*+
          BEGIN_OUTLINE_DATA
          IGNORE_OPTIM_EMBEDDED_HINTS
          OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
          DB_VERSION('12.1.0.2')
          OPT_PARAM('_optimizer_aggr_groupby_elim' 'false')
          OPT_PARAM('_optimizer_reduce_groupby_key' 'false')
          OPT_PARAM('_fix_control' '599680:0')
          ALL_ROWS
          OUTLINE_LEAF(@"SEL$5DA710D3")
          UNNEST(@"SEL$2")
          OUTLINE(@"SEL$1")
          OUTLINE(@"SEL$2")
          FULL(@"SEL$5DA710D3" "CHI"@"SEL$1")
          INDEX(@"SEL$5DA710D3" "PAR"@"SEL$2" ("PAR"."ID" "PAR"."PADDING"))
          LEADING(@"SEL$5DA710D3" "CHI"@"SEL$1" "PAR"@"SEL$2")
          USE_NL(@"SEL$5DA710D3" "PAR"@"SEL$2")
          END_OUTLINE_DATA
      */
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       4 - access("N2"="ID" AND "PADDING"='x')
    

    I observe the same behaviour in 11.2.0.4/12.1.0.2 database instances.

    Best regards,
    Mikhail.

    Comment by Mikhail Velikikh — August 19, 2015 @ 9:33 am BST Aug 19,2015 | Reply

  3. Mikhail,

    Thanks for the follow-up.

    Comment by Jonathan Lewis — August 19, 2015 @ 10:11 am BST Aug 19,2015 | Reply

  4. […] will be using a slightly modified version of the example used by Jonathan Lewis  here to demonstrate the enhancement introduced in 12c to the semi-join to inner-join […]

    Pingback by Semi-join to Inner-join enhacement in 12C | Hatem Mahmoud Oracle's blog — March 11, 2016 @ 11:03 am BST Mar 11,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.