Upgrades

November 3, 2014

Upgrades

Filed under: Execution plans,Oracle,Upgrades — Jonathan Lewis @ 6:31 pm GMT Nov 3,2014

One of the worst problems with upgrades is that things sometimes stop working. A particular nuisance is the execution plan that suddenly stops appearing, to be replaced by an alternative plan that is much less efficient.

Apart from the nuisance of the time spent trying to force the old plan to re-appear, plus the time spent working out a way of rewriting the query when you finally decide the old plan simply isn’t going to re-appear, there’s also the worry about WHY the old plan won’t appear. Is it some sort of bug, is it that some new optimizer feature has disabled some older optimizer feature, or is it that someone in the optimizer group realised that the old plan was capable of producing the wrong results in some circumstances … it’s that last possibility that I find most worrying.

Here’s an example that appeared recently on OTN that’s still got me wondering about the possibility of wrong results (in the general case). We start with a couple of tables, a view, and a pipelined function. This example is a simple model of the problem that showed up on OTN; it’s based on generated data so that anyone who wants to can play around with it to see if they can bypass the problem without making any significant changes to the shape of the code:

rem
rem     Script:         collection_jppd.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2014
rem

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 --> comment to bypass WordPress format issue
)
select
        rownum                  id,
        rownum                  n1,
        mod(rownum,100)         n_100,
        rpad('x',100)           padding
from
        generator       v1
;

create table t2
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4 --> comment to bypass WordPress format issue
)
select
        rownum                  id,
        rownum                  n1,
        mod(rownum,100)         n_100,
        rpad('x',100)           padding
from
        generator       v1
;

alter table t2 add constraint t2_pk primary key(id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );

end;
/

create or replace type myScalarType as object (
        x int,
        y varchar2(15),
        d date
)
/

create or replace type myArrayType as table of myScalarType
/

create or replace function t_fun1(i_in number)
return myArrayType
pipelined
as
begin
        pipe row (myscalartype(i_in,     lpad(i_in,15),     trunc(sysdate) + i_in    ));
        pipe row (myscalartype(i_in + 1, lpad(i_in + 1,15), trunc(sysdate) + i_in + 1));
        return;
end;
/

create or replace view v1
as
select
        --+ leading(t2 x) index(t2)
        x.x, x.y, x.d,
        t2.id, t2.n1
from
        t2,
        table(t_fun1(t2.n_100)) x
where
        mod(t2.n1,3) = 1
union all
select
        --+ leading(t2 x) index(t2)
        x.x, x.y, x.d,
        t2.id, t2.n1
from
        t2,
        table(t_fun1(t2.n_100)) x
where
        mod(t2.n1,3) = 2
;

A key part of the problem is the UNION ALL view, where each subquery holds a join to a pipeline function. We’re about to write a query that joins to this view, and wants to push a join predicate into the view. Here’s the SQL:


select
        /*+ leading(t1 v1) use_nl(v1) */
        v1.x, v1.y, v1.d,
        v1.n1,
        t1.n1
from
        t1,
        v1
where
        t1.n_100 = 0
and     v1.id = t1.n1
;

You’ll notice that the join v1.id = t1.n1 could (in principle) be pushed inside the view to become t2.id = t1.n1 in the two branches of the UNION ALL; this would make it possible for the nested loop that I’ve hinted between t1 and v1 to operate efficiently – and in 11.1.0.7 this is exactly what happens:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        | 16336 |   733K|   123   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                         |        | 16336 |   733K|   123   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|   3 |   VIEW                                | V1     |   163 |  6357 |     1   (0)| 00:00:01 |
|   4 |    UNION-ALL PARTITION                |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|*  7 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
|   9 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|* 11 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   6 - filter(MOD("T2"."N1",3)=1)
   7 - access("T2"."ID"="T1"."N1")
  10 - filter(MOD("T2"."N1",3)=2)
  11 - access("T2"."ID"="T1"."N1")

For each row returned by the tablescan at line 2 we call the view operator at line 3 to generate a rowsource, but we can see in the predicate sections for lines 7 and 11 that the join value has been pushed inside the view, allowing us to access t2 through its primary key index. Depending on the data definitions, constraints, view definition, and version of Oracle, you might see the UNION ALL operator displaying the PARTITION option or the PUSHED PREDICATE option in cases of this type.

So now we upgrade to 11.2.0.4 (probably any 11.2.x.x version) and get the following plan:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|   1 |  NESTED LOOPS                         |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|*  3 |   VIEW                                | V1     | 16336 |   909K|  2966   (4)| 00:00:15 |
|   4 |    UNION-ALL                          |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|   7 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
|   9 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|  11 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   3 - filter("V1"."ID"="T1"."N1")
   6 - filter(MOD("T2"."N1",3)=1)
  10 - filter(MOD("T2"."N1",3)=2)

In this plan the critical join predicate appears at line 3; the predicate hasn’t been pushed. On the other hand the index() hints in the view have, inevitably, been obeyed (resulting in index full scans), as has the use_nl() hint in the main query – leading to a rather more expensive and time-consuming execution plan.

The first, quick, debugging step is simply to set the optimizer_features_enable back to 11.1.0.7 – with no effect; the second is to try adding the push_pred() hint to the query – with no effect; the third is to generate the outline section of the execution plans and copy the entire set of hints from the good plan into the bad plan, noting as we do so that the good plan actually uses the hint OLD_PUSH_PRED(@”SEL$1″ “V1″@”SEL$1” (“T2″.”ID”)) – still no effect.

Since I happen to know a few things about what is likely to appear in the 10053 (optimizer) trace file, my next step would be to flush the shared pool, enable the trace, and then check the trace file (using grep or find depending on whether I was running UNIX or Windows) for the phrase “JPPD bypassed”; this is what I got:


test_ora_9897.trc:OJPPD:     OJPPD bypassed: View contains TABLE expression.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.

So 11.1.0.7 had a plan that used the old_push_pred() hint, but 11.2.0.4 explicitly bypassed the option (the rubric near the top of the trace file translates OJPPD to “old-style (non-cost-based) JPPD”, where JPPD translates to “join predicate push-down”). It looks like the plan we got from 11.1.0.7 has been deliberately blocked in 11.2.0.4. So now it’s time to worry whether or not that means I could have been getting wrong results from 11.1.0.7.

In my test case, of course, I can bypass the problem by explicitly rewriting the query – but I’ll have to move the join with t1 inside the view for both subqueries; alternatively, given the trivial nature of the pipeline function, I could replace the table() operator with a join to another union all view. In real life such changes are not always so easy to implement.

Footnote: somewhere I’ve published a short note explaining that one of my standard pre-emptive strikes on an upgrade is to run the following command to extract useful information from the executable: “strings -a oracle | grep -v bypass”: it can be very helpful to have a list of situations in which some query transformation is bypassed.

Update (Mar 2018)

The restriction is still in place on 12.1.0.2 (unpatched version), but has been removed by 12.2.0.1

Comments (9)

9 Comments »

I had a similar issue when upgrading from 11.2.0.3 to 11.2.0.4. The Optimizer was choosing a different plan in the newer version, even though this was a minor patchset upgrade. A 10053 trace both versions to be almost identical except in one calculation which ultimately led the CBO to take a different path. Changing OPTIMIZER_FEATURES_ENABLE didn’t help in the new version. It turned out that the issue was due to datatype conversion, but it was curious to see the two versions handle it differently. I blogged about it here. http://www.peasland.net/2014/08/19/sticky-upgrade-problem/

Comment by Brian Peasland — November 3, 2014 @ 6:50 pm GMT Nov 3,2014 | Reply
Thanks for blogging about the upgrade issue along with the example. I liked the Footnote to identify these type of bypass information from the 10053 trace files for future upgrades.

Comment by Aswath Rao — November 4, 2014 @ 2:01 pm GMT Nov 4,2014 | Reply
I think it is relevant here your twit (or is it tweet?) about Mauro Pagano’s SQLT XPLORE blog entry:

“When the going gets tough, the tough get going” aka SQLT XPLORE

Looks like it would be able to take you to the root cause of the change, although I don’t have enough Oracle versions available to do the testing.

Comment by Jose Rodriguez — November 4, 2014 @ 2:49 pm GMT Nov 4,2014 | Reply
I have similar issue and we cannot change the query so please suggest whats the work around,union all works fine.

Comment by abinasroy — November 4, 2014 @ 6:29 pm GMT Nov 4,2014 | Reply
- abinssroy,
  
  I haven’t yet found a workaround – there may not be one if the change was a deliberate error correction strategy from Oracle Corp.
  
  You might compare v$system_fix_control on 11.1.0.7 with the same view on 11.2.0.4 to see if there are any new bug fix number with a description that looks relevant – or old ones with changed values that don’t revert with the setting of optimizer_features_enable. (See this note for an example of the approach). Possibly there is a fix_control that could force the change back, but your best bet is to raise an SR with Oracle to find out why the change is there, and if there’s a workaround.
  
  Comment by Jonathan Lewis — November 29, 2014 @ 11:54 am GMT Nov 29,2014 | Reply
Hi
what are the necessary steps/ analysis to be done before upgrade of database from 10.2.0.4 to 11.2.0.4

Comment by Kailash — November 22, 2014 @ 7:34 pm GMT Nov 22,2014 | Reply
- Kailash,
  
  First, make sure you read the “how to upgrade” document completely before you start doing the upgrade.
  To minimise surprises after the upgrade, make sure you’ve read and thought about all the “new features” documentation that might be relevant.
  
  As an aid to dealing with unexpected changes in execution plans on upgrade you could also enable event 10132 for at least 24 hours some time before you upgrade. This will capture the text and execution plan for every statement you execute (which might result in a lot of large files in the trace directory). See https://jonathanlewis.wordpress.com/2006/11/27/event-10132/
  
  Comment by Jonathan Lewis — November 27, 2014 @ 8:58 am GMT Nov 27,2014 | Reply
it might be bug 19174639

Comment by madhukar — April 4, 2016 @ 10:46 pm BST Apr 4,2016 | Reply
- Madhukar,
  
  That looks like a possible match – with patches available for 11.2.0.4 and 12.1.0.2.
  
  It would be interesting to hear from someone with an earlier version of 11.2 whether or not the test case above shows predicate pushing since the references to the bug describe it as appearing on upgrade from 11.2.0.3 to 11.2.0.4.
  
  Comment by Jonathan Lewis — April 5, 2016 @ 9:23 am BST Apr 5,2016 | Reply