Oracle Scratchpad

November 9, 2015

Wrong Results

Filed under: 12c,Bugs,Oracle — Jonathan Lewis @ 6:23 am BST Nov 9,2015

Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize!

The email had a short description, and a script to create a small data set that would demonstrate the problem. I’m not going to show you the query, or the result set, but here’s a sample of the output from an SQL*Plus session after creating the data. This is, by the way, on a “single-user” system – there is no way that some other session is changing the data – especially after the opening “set transaction”:

SQL> set transaction read only;

Transaction set.

SQL> set arraysize 1
SQL> select ...

...

541 rows selected.

SQL> set arraysize 4
SQL> select ...

...

599 rows selected.

SQL> set arraysize 10
SQL> select ...

...

620 rows selected.

SQL> set arraysize 32
SQL> select ...

...

616 rows selected.

The correct result set should have had the 616 rows reported when the arraysize was set to 32 (of, course, it’s possible with an arraysize of 32 the 616 rows returned weren’t the correct 616 rows – rows seemed to get multiplied or deleted fairly arbitrarily as the arraysize changed).

The execution plan was a little unusual in that it forced a nested loop join with a tablescan on the inner table; and when I ran the query with rowsource execution statistics enabled the number of starts of the inner tablescan was 1,597 but the number of rows actually returned varied. My first thought was that some new mechanical optimisation of the tablescan code was losing track of where it had got to in the repeated tablescans – but it turned out I was wrong.

Here’s the execution plan (with camouflage) – the key detail is in a section I didn’t look at intially, the column projection:


select * from table(dbms_xplan.display_cursor('0dh0kh9qa88mz',1,'-note +projection'));

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       | 14118 (100)|          |
|   1 |  NESTED LOOPS       |      |    29 |  2958 | 14118   (2)| 00:00:01 |
|*  2 |   HASH JOIN         |      |   892 | 57088 |    35   (3)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ABC  |   549 | 21411 |    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DEF  |   892 | 22300 |    17   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | XYZ  |     1 |    38 |    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEF"."ABC_FK"="ABC"."ABC_ID")
   5 - filter(("DEF"."COL0"="XYZ"."COL0" AND "XYZ"."COL1"="ABC"."COL1"
              AND "XYZ"."COL2"="ABC"."COL2"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20], "XYZ"."COL1"[NUMBER,22],
       "XYZ"."COL2"[NUMBER,22], "XYZ"."COL0"[VARCHAR2,20]
   2 - (#keys=1) "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20]
   3 - (rowset=200) "ABC"."ABC_ID"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22]
   4 - (rowset=200) "DEF"."ABC_FK"[NUMBER,22], "DEF"."COL0"[VARCHAR2,20]
   5 - "XYZ"."COL1"[NUMBER,22], "XYZ"."COL2"[NUMBER,22],
       "XYZ"."COL0"[VARCHAR2,20]

The predicate section is irrelevant in this case, and I’ve camouflaged the names of the tables and columns – the only interesting bit is the appearance of the (rowset=200) in the projection information. This is reporting a feature new in 12c (and not to be confused with Oracle Java Rowsets) that should improve the performance of some queries.

I didn’t actually look at the projection information until after I’d asked the Oak Table members if they had ever seen this type of anomaly before – and Stefan Koehler emailed back to suggest that the problem might be related to rowsets (there are a couple of similar bugs on MoS, e.g: 17016479 and 20960570) – so I checked the projection, then repeated my tests after disabling the feature with a call to: ‘alter session set “_rowsets_enabled”=false;’

Problem solved – although I’ve told the person who emailed me to report this discovery and workaround to Oracle support and see what they supply as the approved solution.

It is possible to affect the feature through event 10055 – different levels disable it at different locations in the code; the list of options is given in the $ORACLE_HOME/rdbms/mesg/oraus.msg file (if you’re not running Windows):


//            Level:
//            0x00000001 - turn off for table scan
//            0x00000002 - turn off for hash join consume
//            0x00000004 - turn off for hash join produce
//            0x00000008 - turn off for group by
//            0x00000010 - turn off for sort
//            0x00000020 - turn off for table-queue out
//            0x00000040 - turn off for table-queue in
//            0x00000080 - turn off for identity
//            0x00000100 - turn off for granule iterator
//            0x00000200 - turn off for EVA functions
//            0x00000400 - turn off for PL/SQL
//            0x00000800 - turn off for upgrade
//            0x00001000 - turn off for database startup
//            0x00002000 - turn off for blobs and clobs
//            0x00004000 - turn off for tracing row source
//            0x00008000 - turn off rowset information in explain plan
//            0x00010000 - disable hash join rowsets fast path
//            0x00020000 - turn off for bloom create
//            0x00040000 - turn off for bloom use
//            0x00080000 - disable prefetch for hash join
//            0x00100000 - disable prefetch for bloom
//            0x00200000 - disable semi blocking hash join
//            0x00400000 - turn off rowset for fixed table

I tried the first few levels and found that both levels 1 and 2 eliminated the problem (and eliminated the appearance of the (rowset=200) entry in the projection information). Given the shape of the plan I had thought that just one of 1,2 or 4 might have been relevant so I was a little surprised to find that both 1 and 2 were effective – but that’s probably just a question of interpretation of the brief descriptions.

Update

I’ve asked the owner of the problem if it’s okay to post the script to create the tables and data – and the answer was yes: the content was already heavily camouflaged anyway. So here’s a file you can download if you want to test other environments: insert_script_12c_bug

It’s declared as a “.doc” file to get past the upload process, but it’s really a flat text file.

Update 16th Nov

The official “minimum impact” workaround is to set event 10055 at level 2097152 (disable semi blocking hash join). Alternatively there is also a patch available. See Mike Dietriech’s blog for details: https://blogs.oracle.com/UPGRADE/entry/update_for_switch_off_rowsets

Update 7th Dec

Mike Dietriech has updated his second posting to add details of a patch for this bug.

 

17 Comments »

  1. Does this affect setting the “fetch size” in JDBC as well? The JDBC fetch size is pretty much the same as the arraysize in SQL*Plus, so I wonder if queries done from the application would from this bug as well.

    Comment by Hans — November 9, 2015 @ 6:53 am BST Nov 9,2015 | Reply

    • Hi Hans,

      > so I wonder if queries done from the application would from this bug as well.

      Yes, you probably hit the bug with JDBC as well as it is not a client side issue. The bug is in rowset optimization on server side (OPI layer and above), which is basically a new (optimized) way how data is moved around between the different execution plan steps.

      Regards
      Stefan

      Comment by Stefan Koehler — November 9, 2015 @ 8:54 am BST Nov 9,2015 | Reply

    • I haven’t tested this, so made no comment, but I agree with Stefan’s analysis. The problem seems to be server side so you should be able to trigger it from any client, though one could imagine that some tiny client-side implementation detail may make the symptoms vary slightly with client (e.g. setting the fetch size to N in one client might actually mean it was using N+1, resulting in the error being present but apparently showing up on different fetch sizes).

      Comment by Jonathan Lewis — November 9, 2015 @ 9:57 am BST Nov 9,2015 | Reply

  2. In that case showing a sample query that triggers this bug would be helpful. Currently I have no idea how I to test this.

    Comment by Hans — November 9, 2015 @ 9:24 am BST Nov 9,2015 | Reply

  3. Hello

    I have a simular bug posted at Oracle support but still waiting for the right patch since 3 months

    It’s more an mission impossible for obtaining solutions/parches by ORACLE

    We have the problem with sqlplus, sql detective , sql developer

    Kind regards

    Comment by Erik Van Driessche — November 9, 2015 @ 7:38 pm BST Nov 9,2015 | Reply

  4. […] Wrong Results // Oracle Scratchpad Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize! […]

    Pingback by Wrong Results | Dinesh Ram Kali. — November 9, 2015 @ 8:56 pm BST Nov 9,2015 | Reply

  5. Thanks for the script. This bug shows up with JDBC as well. However setting the fetch size to 1 (one) returns the correct number of rows (in contrast to the tests done with SQL*Plus). With increasing fetch size, the possibility of a correct result increases, but I also got a wrong result with a fetch size of 98. Anything beyond 100 seems to be OK (I tested it up to 500). As this bug happens on the server, it’s also not a surprise that the JDBC driver version doesn’t influence this. Using _rowsets_enabled=false works as a workaround. What would be the implication of changing the “_rowsets_enabled” property globally (using ALTER SYSTEM)?

    Comment by Hans — November 10, 2015 @ 8:29 am BST Nov 10,2015 | Reply

    • Hello Hans

      What about the performance when changing this hidden parameter on system wide?

      We have correct results when setting array_size to 5000 .

      My support ticket, engineer says: Bug 21853752 – WRONG RESULTS CHANGING ARRAYSIZE OPTIMIZER_ADAPTIVE_FEATURES SET TO FALSE

      Why it takes so long for making a patch for something critical … and at first glance a lot of people suffers from it.

      Kind regards

      Comment by Erik Van Driessche — November 10, 2015 @ 8:41 am BST Nov 10,2015 | Reply

      • Erik,

        The title for that bug number says it’s about the Partial Join transformation. Have you copied the right number ?

        If the problem is the partial join I would be inclined to disable the feature “_optimizer_partial_join_eval”=false on the basis that it’s a performance enhancer that you didn’t have before the upgrade. Given the type of the queries (distinct aggregation) it’s aimed at, though, I think I’d also look for cases where the benefit of the enhancement was so great that it would be worth re-writing the queries to avoid the duplication that led up to the distinct.

        If the problem is wrong results with changing arraysize, as the title suggests, then I’d check the plan for the (rowsets) projection and see if disabling that feature helped – it’s likely to have a much smaller performance impact than disabling partial joins.

        Comment by Jonathan Lewis — November 10, 2015 @ 9:00 am BST Nov 10,2015 | Reply

        • Hi Jonathan,

          For data dictionary contention and performance reason we have optimizer_adaptive_features=false system wide.

          When I set
          alter session set “_optimizer_partial_join_eval”=false ;
          => wrong results

          alter session set “_rowsets_enabled”=false;
          => correct results

          When I set (after restarting my sql session each time !)
          alter session set optimizer_adaptive_features=true ;
          => correct results

          Searching _rowsets_enabled on metalink gives multiple answers.

          Kind regards

          Comment by Erik Van Driessche — November 10, 2015 @ 9:39 am BST Nov 10,2015

        • Hi,

          Forgotten: Yes I have copied the right number/text from my SR.

          Kin regards

          Comment by Erik Van Driessche — November 10, 2015 @ 9:49 am BST Nov 10,2015

    • Hans,

      Thanks for the investigation. Here are a couple of thoughts:

      It’s possible that setting the fetch size to 1 in JDBC results in the server side disabling the “rowsets” features for the query – you can check this in the projection information of the actual execution plan. We might not see the same effect in SQL*Plus because it cheats and uses an arraysize of 2 if you try to set the arraysize to 1.

      I agree with your comment about larger arraysizes decrease the chances of error appearing. To me this has the flavour of an array pointer being recorded incorrectly in some part of the code: the larger the number of times we call that code (i.e. the smaller the fetch size) the more likely we are to hit a problem because of that pointer error. Possibly if you scaled the data sizes up by a factor of 10 or more (so that you have to call the fetch a larger number of times – maybe scale by a prime number, or set the arraysize to a prime) you might see the error re-appearing at larger arraysizes.

      I can’t give you a complete picture of the implication of changing a hidden parameter would be – and you should contact Oracle Support before doing so – but it’s a performance-related feature that can give wrong results for simple SQL; and I’d choose right results over a small performance benefit.

      Comment by Jonathan Lewis — November 10, 2015 @ 8:42 am BST Nov 10,2015 | Reply

      • Hi Jonathan,

        > To me this has the flavour of an array pointer being recorded incorrectly in some part of the code

        This assumption seems to be in the right direction as i asked Tanel Poder some time ago about the implementation of rowsets and he answered something like that: “… basically looks like a different array structure (and #rows) passed around in plan by reference. it’s too low level and there’s zero info about it out there, so can’t say yet confidently how exactly it’s been implemented. so basically a different interface for returning data to parent plan step with less data move-around.”

        Regards
        Stefan

        Comment by Stefan Koehler — November 10, 2015 @ 9:14 am BST Nov 10,2015 | Reply

      • Hans,

        Advice from Mike Dietrich (Master Product Manager – Database Upgrade & Migrations – Oracle) – switch the feature off for the present.

        Comment by Jonathan Lewis — November 10, 2015 @ 2:41 pm BST Nov 10,2015 | Reply

  6. […] the post by Jonathan Lewis and on the upgrade blog for further information: […]

    Pingback by Some Parameter Recommendations for Oracle 12c (Upgrade your Database — NOW!) — orabase.org — February 20, 2016 @ 1:09 pm BST Feb 20,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.