Oracle Scratchpad

June 6, 2016

Merge Precision

Filed under: Oracle,Performance — Jonathan Lewis @ 12:39 pm GMT Jun 6,2016

This note is about a little detail I hadn’t noticed about the merge command until a question came up on the OTN database forum a few days ago. The question was about the impact of the clustering_factor on the optimizer’s choice of execution plan – but the example supplied in the question displayed an oddity I couldn’t explain. Here’s the code and execution plan as originally supplied:


MERGE INTO gtt_ord t1
    USING X t2 ON (t1.global_ext_id = t2.ext_id)
    WHEN MATCHED THEN
    UPDATE SET t1.ord_id = t2.ord_id;
 
-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |                    |       |       |   832 (100)|          |
|   1 |  MERGE               | GTT_ORD            |       |       |            |          |
|   2 |   VIEW               |                    |       |       |            |          |
|*  3 |    HASH JOIN         |                    |  1156 |   706K|   832   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| GTT_ORD            |  1152 |   589K|    36   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| X                  |   188K|    18M|   794   (2)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("t1"."GLOBAL_EXT_ID"="t2"."EXT_ID")

The plan doesn’t seem at all surprising so far but the OP had also told us that the X table had an index on (ext_id, ord_id) for which the stored statistics reported 699 leaf blocks. Look carefully at the query, and especially the columns used from table X, and ask yourself: why has the optimizer chosen a full tablescan at a cost of 794 when it could have done an index fast full scan on an index with only 699 leaf blocks.

Naturally I had to build a model (using 11.2.0.4, because that’s what the OP declared) to see if the behaviour was typical:


rem
rem     Script:         merge_precision.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem

create  table ord(
        ord_id          number(8,0),
        global_ext_id   number(8,0),
        v1              varchar2(10),
        padding         varchar2(100)
)
;


create table x (
        ord_id          number(8,0),
        ext_id          number(8,0),
        v1              varchar2(10),
        padding         varchar2(100)
);

alter table x add constraint x_pk primary key(ord_id);

create index x_idx1 on x(ext_id);
create unique index x_idx2 on x(ext_id, ord_id);

insert into x
select
        rownum,
        trunc(dbms_random.value(0,5e5)),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        dual
connect by
        level <= 1e5
;

insert into ord
select
        to_number(null),
        trunc(dbms_random.value(0,5e5)),
        lpad(rownum,10,'0'),
        rpad('x',100,'x')
from
        dual
connect by
        level <= 1e3 ; execute dbms_stats.gather_table_stats(user,'x',method_opt=>'for all columns size 1')
execute dbms_stats.gather_table_stats(user,'ord',method_opt=>'for all columns size 1')

explain plan for
merge
into    ord
using   x
on      (ord.global_ext_id = x.ext_id)
when matched then
        update set ord.ord_id = x.ord_id
;

select * from table(dbms_xplan.display(null,null,'projection'));

I’ve modified the table names a little (and I’m not using the global temporary table hinted at by the original table name), and I’ve made a couple of cosmetic changes to the merge statement. The three indexes I’ve created model the three indexes reported by the OP (with the assumption that the index with PK” in its name was the primary key and that any index including the primary key column would have been declared unique).

You’ll notice that in my call to dbms_xplan I’ve included the ‘projection’ formatting option – and that’s quite important in this case, even though I hardly ever find a need to use it. Here’s the plan I got:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |  1100 | 28600 |   234   (4)| 00:00:02 |
|   1 |  MERGE               | ORD  |       |       |            |          |
|   2 |   VIEW               |      |       |       |            |          |
|*  3 |    HASH JOIN         |      |  1100 |   256K|   234   (4)| 00:00:02 |
|   4 |     TABLE ACCESS FULL| ORD  |  1000 |   114K|     4   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| X    |   100K|    11M|   228   (4)| 00:00:02 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ORD"."GLOBAL_EXT_ID"="X"."EXT_ID")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   2 - "X"."ORD_ID"[NUMBER,22]
   3 - (#keys=1) "ORD"."GLOBAL_EXT_ID"[NUMBER,22],
       "X"."EXT_ID"[NUMBER,22], "ORD".ROWID[ROWID,10],
       "ORD"."ORD_ID"[NUMBER,22], "ORD"."PADDING"[VARCHAR2,100],
       "ORD"."V1"[VARCHAR2,10], "X"."ORD_ID"[NUMBER,22],
       "X"."PADDING"[VARCHAR2,100], "X"."V1"[VARCHAR2,10]
   4 - "ORD".ROWID[ROWID,10], "ORD"."ORD_ID"[NUMBER,22],
       "ORD"."GLOBAL_EXT_ID"[NUMBER,22], "ORD"."V1"[VARCHAR2,10],
       "ORD"."PADDING"[VARCHAR2,100]
   5 - "X"."ORD_ID"[NUMBER,22], "X"."EXT_ID"[NUMBER,22],
       "X"."V1"[VARCHAR2,10], "X"."PADDING"[VARCHAR2,100]

The anomaly appeared in my model. I have a statement that could have been satisfied by a fast full scan of the x_idx2 index but Oracle did a full tablescan instead. That’s where the projection information shows its value. Look at the columns projected at operation 5 – it’s the full set of columns in the table including some that I definitely don’t need. I had never checked the details of a merge command before, but if you had asked me I would have assumed that the optimizer would have worked out which columns were actually needed and optimized for those columns – but it doesn’t seem to do that here. My next step was to tell Oracle which columns I needed by expanding my ‘using’ clause:


explain plan for
merge
into    ord ord
using   (select ext_id, ord_id from x) x
on      (ord.global_ext_id = x.ext_id)
when matched then
        update set ord.ord_id = x.ord_id
;

select * from table(dbms_xplan.display(null,null,'projection'));

----------------------------------------------------------------------------------
| Id  | Operation               | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |        |  1100 | 28600 |    55  (10)| 00:00:01 |
|   1 |  MERGE                  | ORD    |       |       |            |          |
|   2 |   VIEW                  |        |       |       |            |          |
|*  3 |    HASH JOIN            |        |  1100 |   136K|    55  (10)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | ORD    |  1000 |   114K|     4   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| X_IDX2 |   100K|   976K|    49   (7)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ORD"."GLOBAL_EXT_ID"="EXT_ID")
   
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   2 - "X"."ORD_ID"[NUMBER,22]
   3 - (#keys=1) "ORD"."GLOBAL_EXT_ID"[NUMBER,22], "EXT_ID"[NUMBER,22],
       "ORD".ROWID[ROWID,10], "ORD"."ORD_ID"[NUMBER,22],
       "ORD"."PADDING"[VARCHAR2,100], "ORD"."V1"[VARCHAR2,10],
       "X".ROWID[ROWID,10], "ORD_ID"[NUMBER,22]
   4 - "ORD".ROWID[ROWID,10], "ORD"."ORD_ID"[NUMBER,22],
       "ORD"."GLOBAL_EXT_ID"[NUMBER,22], "ORD"."V1"[VARCHAR2,10],
       "ORD"."PADDING"[VARCHAR2,100]
   5 - "X".ROWID[ROWID,10], "ORD_ID"[NUMBER,22], "EXT_ID"[NUMBER,22]

Surprise, surprise! If you do the projection manually in the using clause you get the column elimination you need and the optimizer can take advantage of the covering index. I’ve no doubt that other people have discovered this in the past – and possibly even written about it – but when I checked the merge command in the SQL Reference manual there was no indication that it was a good idea to be as precise as possible in the using clause.

It seems likely that this observation isn’t going to be useful in many “real-life” examples of using the merge command – I think I’ve only ever seen it used when most of the columns in the source table are used, and I don’t often seen cases of people creating indexes that hold a large fraction of the columns in a table – but it’s worth knowing about, especially when you realize that you’re not just giving the optimizer the option for using “an index” or “an index fast full scan” in the execution plan, you’re making it possible for several other plans to appear. For example, if I didn’t have the x_idx2 covering index, here’s a possible plan for the statement:


----------------------------------------------------------------------------------------------
| Id  | Operation                 | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT           |                  |  1106 | 28756 |   598   (4)| 00:00:03 |
|   1 |  MERGE                    | ORD              |       |       |            |          |
|   2 |   VIEW                    |                  |       |       |            |          |
|*  3 |    HASH JOIN              |                  |  1106 |   137K|   598   (4)| 00:00:03 |
|   4 |     TABLE ACCESS FULL     | ORD              |  1000 |   114K|     4   (0)| 00:00:01 |
|   5 |     VIEW                  | index$_join$_006 |   100K|   976K|   591   (3)| 00:00:03 |
|*  6 |      HASH JOIN            |                  |       |       |            |          |
|   7 |       INDEX FAST FULL SCAN| X_PK             |   100K|   976K|   240   (3)| 00:00:02 |
|   8 |       INDEX FAST FULL SCAN| X_IDX1           |   100K|   976K|   329   (2)| 00:00:02 |
----------------------------------------------------------------------------------------------


More on an anomaly with this plan later, though. It should have appeared automatically when I dropped the x_idx2 index, but it didn’t.

Footnote:

The answer to the OP’s original questions are: the clustering_factor for the indexes starting with ext_id was probably high because of the order and degree of concurrency with which  the different values for ext_id arrived, combined with the effects of ASSM. If the ext_id values were arriving in a fairly well ordered fashion then setting the table preference table_cached_blocks to a value around 16 (or 16 x N for an N-node RAC cluster) and re-gathering stats on the indexes would probably produce a much more realistic clustering_factor that might persuade the optimizer to use an indexed access path into his table X.

The plans shown above were produced on an instance of 11.2.0.4; but the information is also accurate for 12.1.0.2

Update (about an hour after publication)

It didn’t take long for someone to point out that Alexander Anokhin had written about this phenomenon nearly four years ago, and had further commented on the fact that it wasn’t just the USING (source) table that projected irrelevant columns – the target table did as well, as did the join. Take a look at the projection on operations 3 and 4 in the original plan: you’ll see ord.padding and ord.v1 appearing in both of them (as well as x.padding and x.v1 appearing in operation 3).

Alexander showed the same workaround that I have above – but also highlighted the fact that it could be (and ought to be) applied to BOTH tables.


explain plan for
merge
into    (select ord.ord_id, ord.global_ext_id from ord) ord
using   (select ext_id, ord_id from x) x
on      (ord.global_ext_id = x.ext_id)
when matched then
        update set ord.ord_id = x.ord_id
;

select * from table(dbms_xplan.display(null,null,'outline projection'));

----------------------------------------------------------------------------------
| Id  | Operation               | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | MERGE STATEMENT         |        |  1100 | 28600 |    45  (12)| 00:00:01 |
|   1 |  MERGE                  | ORD    |       |       |            |          |
|   2 |   VIEW                  |        |       |       |            |          |
|*  3 |    HASH JOIN            |        |  1100 | 44000 |    45  (12)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | ORD    |  1000 | 30000 |     4   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| X_IDX2 |   100K|   976K|    39   (8)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ORD"."GLOBAL_EXT_ID"="EXT_ID")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   2 - "X"."ORD_ID"[NUMBER,22]
   3 - (#keys=1) "ORD"."GLOBAL_EXT_ID"[NUMBER,22], "EXT_ID"[NUMBER,22],
       "ORD".ROWID[ROWID,10], "ORD"."ORD_ID"[NUMBER,22], "X".ROWID[ROWID,10],
       "ORD_ID"[NUMBER,22]
   4 - "ORD".ROWID[ROWID,10], "ORD"."ORD_ID"[NUMBER,22],
       "ORD"."GLOBAL_EXT_ID"[NUMBER,22]
   5 - "X".ROWID[ROWID,10], "ORD_ID"[NUMBER,22], "EXT_ID"[NUMBER,22]

Apart from the change in the list of column names in the projection content for operations 3 and 4, take note of the reduction in the Bytes column of the execution plan body.

3 Comments »

  1. Jonathan,
    Alexander Anohkin mentioned a related problem in https://alexanderanokhin.wordpress.com/2012/07/18/dont-forget-about-column-projection/: it seems that the superfluos columns are also used in the workarea operation – so I would guess that the use of only the necessary columns can have an impact on the performance.
    Regards
    Martin

    Comment by Martin Preiss — June 6, 2016 @ 1:30 pm GMT Jun 6,2016 | Reply

    • Martin,

      Thanks for the link. I hadn’t concerned myself with the target table and other side efffects of the projections, but the projection information for operations 4 and 3 does show that the “extra” colums in the target table are also carried through a couple of steps of the plan – and Alexander shows that that can be eliminated too.

      Comment by Jonathan Lewis — June 6, 2016 @ 1:45 pm GMT Jun 6,2016 | Reply

  2. Very nice informative posting.

    Comment by Pavan Kumarn — June 8, 2016 @ 7:59 am GMT Jun 8,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.