Oracle Scratchpad

November 9, 2015

Wrong Results

Filed under: 12c,Bugs,Oracle — Jonathan Lewis @ 6:23 am GMT Nov 9,2015

Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result seems to depend on the SQL*Plus arraysize!

The email had a short description, and a script to create a small data set that would demonstrate the problem. I’m not going to show you the query, or the result set, but here’s a sample of the output from an SQL*Plus session after creating the data. This is, by the way, on a “single-user” system – there is no way that some other session is changing the data – especially after the opening “set transaction”:

SQL> set transaction read only;

Transaction set.

SQL> set arraysize 1
SQL> select ...

...

541 rows selected.

SQL> set arraysize 4
SQL> select ...

...

599 rows selected.

SQL> set arraysize 10
SQL> select ...

...

620 rows selected.

SQL> set arraysize 32
SQL> select ...

...

616 rows selected.

The correct result set should have had the 616 rows reported when the arraysize was set to 32 (of, course, it’s possible with an arraysize of 32 the 616 rows returned weren’t the correct 616 rows – rows seemed to get multiplied or deleted fairly arbitrarily as the arraysize changed).

The execution plan was a little unusual in that it forced a nested loop join with a tablescan on the inner table; and when I ran the query with rowsource execution statistics enabled the number of starts of the inner tablescan was 1,597 but the number of rows actually returned varied. My first thought was that some new mechanical optimisation of the tablescan code was losing track of where it had got to in the repeated tablescans – but it turned out I was wrong.

Here’s the execution plan (with camouflage) – the key detail is in a section I didn’t look at intially, the column projection:


select * from table(dbms_xplan.display_cursor('0dh0kh9qa88mz',1,'-note +projection'));

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       | 14118 (100)|          |
|   1 |  NESTED LOOPS       |      |    29 |  2958 | 14118   (2)| 00:00:01 |
|*  2 |   HASH JOIN         |      |   892 | 57088 |    35   (3)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ABC  |   549 | 21411 |    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DEF  |   892 | 22300 |    17   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | XYZ  |     1 |    38 |    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEF"."ABC_FK"="ABC"."ABC_ID")
   5 - filter(("DEF"."COL0"="XYZ"."COL0" AND "XYZ"."COL1"="ABC"."COL1"
              AND "XYZ"."COL2"="ABC"."COL2"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20], "XYZ"."COL1"[NUMBER,22],
       "XYZ"."COL2"[NUMBER,22], "XYZ"."COL0"[VARCHAR2,20]
   2 - (#keys=1) "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20]
   3 - (rowset=200) "ABC"."ABC_ID"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22]
   4 - (rowset=200) "DEF"."ABC_FK"[NUMBER,22], "DEF"."COL0"[VARCHAR2,20]
   5 - "XYZ"."COL1"[NUMBER,22], "XYZ"."COL2"[NUMBER,22],
       "XYZ"."COL0"[VARCHAR2,20]

The predicate section is irrelevant in this case, and I’ve camouflaged the names of the tables and columns – the only interesting bit is the appearance of the (rowset=200) in the projection information. This is reporting a feature new in 12c (and not to be confused with Oracle Java Rowsets) that should improve the performance of some queries.

I didn’t actually look at the projection information until after I’d asked the Oak Table members if they had ever seen this type of anomaly before – and Stefan Koehler emailed back to suggest that the problem might be related to rowsets (there are a couple of similar bugs on MoS, e.g: 17016479 and 20960570) – so I checked the projection, then repeated my tests after disabling the feature with a call to: ‘alter session set “_rowsets_enabled”=false;’

Problem solved – although I’ve told the person who emailed me to report this discovery and workaround to Oracle support and see what they supply as the approved solution.

It is possible to affect the feature through event 10055 [update – and with event 10123 in 18.3] – different levels disable it at different locations in the code; the list of options is given in the $ORACLE_HOME/rdbms/mesg/oraus.msg file (if you’re not running Windows):


//            Level:
//            0x00000001 - turn off for table scan
//            0x00000002 - turn off for hash join consume
//            0x00000004 - turn off for hash join produce
//            0x00000008 - turn off for group by
//            0x00000010 - turn off for sort
//            0x00000020 - turn off for table-queue out
//            0x00000040 - turn off for table-queue in
//            0x00000080 - turn off for identity
//            0x00000100 - turn off for granule iterator
//            0x00000200 - turn off for EVA functions
//            0x00000400 - turn off for PL/SQL
//            0x00000800 - turn off for upgrade
//            0x00001000 - turn off for database startup
//            0x00002000 - turn off for blobs and clobs
//            0x00004000 - turn off for tracing row source
//            0x00008000 - turn off rowset information in explain plan
//            0x00010000 - disable hash join rowsets fast path
//            0x00020000 - turn off for bloom create
//            0x00040000 - turn off for bloom use
//            0x00080000 - disable prefetch for hash join
//            0x00100000 - disable prefetch for bloom
//            0x00200000 - disable semi blocking hash join
//            0x00400000 - turn off rowset for fixed table
//  12.2 adds 2 new levels
//            0x00800000 - turn off rowset for view
//            0x01000000 - turn off rowset for filter
//  18.3 moves to event 10123 and adds 7 new levels 
//            0x02000000 - disable producing rowsets in scan with one-row eva
//            0x04000000 - turn off rowsets for non-inner hash join
//            0x08000000 - turn off rowsets for update
//            0x10000000 - turn off rowsets for delete
//            0x20000000 - disable prefetch for hash aggregation
//            0x40000000 - turn off rowsets for window function
//            0x80000000 - turn off rowsets for count


I tried the first few levels and found that both levels 1 and 2 eliminated the problem (and eliminated the appearance of the (rowset=200) entry in the projection information). Given the shape of the plan I had thought that just one of 1,2 or 4 might have been relevant so I was a little surprised to find that both 1 and 2 were effective – but that’s probably just a question of interpretation of the brief descriptions.

Update

I’ve asked the owner of the problem if it’s okay to post the script to create the tables and data – and the answer was yes: the content was already heavily camouflaged anyway. So here’s a file you can download if you want to test other environments: insert_script_12c_bug

It’s declared as a “.doc” file to get past the upload process, but it’s really a flat text file.

Update 16th Nov 2015

The official “minimum impact” workaround is to set event 10055 at level 2097152 (disable semi blocking hash join). Alternatively there is also a patch available. See Mike Dietriech’s blog for details.

Update 7th Dec 2015

Mike Dietriech has updated his second posting to add details of a patch for this bug.

Update May 2019

I’ve learned from a recent comment (and made a couple of errors that needed correcting in writing up this simple update) that an alternative workaround is available through event 10123 – so the advice above is version dependent. If you’re running 12c the event is 10055 and you can set lots of different levels, and in 12.2.0.1 the event adds a couple of extra levels, then Oracle 18.3 adds a few more levels. (All the levels are listed in the body of the note, with break points for the different versions) but also introduced event 10123 with just level 1 to turn off the feature for “top select” – whatever that means.

 

26 Comments »

  1. Does this affect setting the “fetch size” in JDBC as well? The JDBC fetch size is pretty much the same as the arraysize in SQL*Plus, so I wonder if queries done from the application would from this bug as well.

    Comment by Hans — November 9, 2015 @ 6:53 am GMT Nov 9,2015 | Reply

    • Hi Hans,

      > so I wonder if queries done from the application would from this bug as well.

      Yes, you probably hit the bug with JDBC as well as it is not a client side issue. The bug is in rowset optimization on server side (OPI layer and above), which is basically a new (optimized) way how data is moved around between the different execution plan steps.

      Regards
      Stefan

      Comment by Stefan Koehler — November 9, 2015 @ 8:54 am GMT Nov 9,2015 | Reply

    • I haven’t tested this, so made no comment, but I agree with Stefan’s analysis. The problem seems to be server side so you should be able to trigger it from any client, though one could imagine that some tiny client-side implementation detail may make the symptoms vary slightly with client (e.g. setting the fetch size to N in one client might actually mean it was using N+1, resulting in the error being present but apparently showing up on different fetch sizes).

      Comment by Jonathan Lewis — November 9, 2015 @ 9:57 am GMT Nov 9,2015 | Reply

  2. In that case showing a sample query that triggers this bug would be helpful. Currently I have no idea how I to test this.

    Comment by Hans — November 9, 2015 @ 9:24 am GMT Nov 9,2015 | Reply

  3. Hello

    I have a simular bug posted at Oracle support but still waiting for the right patch since 3 months

    It’s more an mission impossible for obtaining solutions/parches by ORACLE

    We have the problem with sqlplus, sql detective , sql developer

    Kind regards

    Comment by Erik Van Driessche — November 9, 2015 @ 7:38 pm GMT Nov 9,2015 | Reply

  4. […] Wrong Results // Oracle Scratchpad Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize! […]

    Pingback by Wrong Results | Dinesh Ram Kali. — November 9, 2015 @ 8:56 pm GMT Nov 9,2015 | Reply

  5. Thanks for the script. This bug shows up with JDBC as well. However setting the fetch size to 1 (one) returns the correct number of rows (in contrast to the tests done with SQL*Plus). With increasing fetch size, the possibility of a correct result increases, but I also got a wrong result with a fetch size of 98. Anything beyond 100 seems to be OK (I tested it up to 500). As this bug happens on the server, it’s also not a surprise that the JDBC driver version doesn’t influence this. Using _rowsets_enabled=false works as a workaround. What would be the implication of changing the “_rowsets_enabled” property globally (using ALTER SYSTEM)?

    Comment by Hans — November 10, 2015 @ 8:29 am GMT Nov 10,2015 | Reply

    • Hello Hans

      What about the performance when changing this hidden parameter on system wide?

      We have correct results when setting array_size to 5000 .

      My support ticket, engineer says: Bug 21853752 – WRONG RESULTS CHANGING ARRAYSIZE OPTIMIZER_ADAPTIVE_FEATURES SET TO FALSE

      Why it takes so long for making a patch for something critical … and at first glance a lot of people suffers from it.

      Kind regards

      Comment by Erik Van Driessche — November 10, 2015 @ 8:41 am GMT Nov 10,2015 | Reply

      • Erik,

        The title for that bug number says it’s about the Partial Join transformation. Have you copied the right number ?

        If the problem is the partial join I would be inclined to disable the feature “_optimizer_partial_join_eval”=false on the basis that it’s a performance enhancer that you didn’t have before the upgrade. Given the type of the queries (distinct aggregation) it’s aimed at, though, I think I’d also look for cases where the benefit of the enhancement was so great that it would be worth re-writing the queries to avoid the duplication that led up to the distinct.

        If the problem is wrong results with changing arraysize, as the title suggests, then I’d check the plan for the (rowsets) projection and see if disabling that feature helped – it’s likely to have a much smaller performance impact than disabling partial joins.

        Comment by Jonathan Lewis — November 10, 2015 @ 9:00 am GMT Nov 10,2015 | Reply

        • Hi Jonathan,

          For data dictionary contention and performance reason we have optimizer_adaptive_features=false system wide.

          When I set
          alter session set “_optimizer_partial_join_eval”=false ;
          => wrong results

          alter session set “_rowsets_enabled”=false;
          => correct results

          When I set (after restarting my sql session each time !)
          alter session set optimizer_adaptive_features=true ;
          => correct results

          Searching _rowsets_enabled on metalink gives multiple answers.

          Kind regards

          Comment by Erik Van Driessche — November 10, 2015 @ 9:39 am GMT Nov 10,2015

        • Hi,

          Forgotten: Yes I have copied the right number/text from my SR.

          Kin regards

          Comment by Erik Van Driessche — November 10, 2015 @ 9:49 am GMT Nov 10,2015

    • Hans,

      Thanks for the investigation. Here are a couple of thoughts:

      It’s possible that setting the fetch size to 1 in JDBC results in the server side disabling the “rowsets” features for the query – you can check this in the projection information of the actual execution plan. We might not see the same effect in SQL*Plus because it cheats and uses an arraysize of 2 if you try to set the arraysize to 1.

      I agree with your comment about larger arraysizes decrease the chances of error appearing. To me this has the flavour of an array pointer being recorded incorrectly in some part of the code: the larger the number of times we call that code (i.e. the smaller the fetch size) the more likely we are to hit a problem because of that pointer error. Possibly if you scaled the data sizes up by a factor of 10 or more (so that you have to call the fetch a larger number of times – maybe scale by a prime number, or set the arraysize to a prime) you might see the error re-appearing at larger arraysizes.

      I can’t give you a complete picture of the implication of changing a hidden parameter would be – and you should contact Oracle Support before doing so – but it’s a performance-related feature that can give wrong results for simple SQL; and I’d choose right results over a small performance benefit.

      Comment by Jonathan Lewis — November 10, 2015 @ 8:42 am GMT Nov 10,2015 | Reply

      • Hi Jonathan,

        > To me this has the flavour of an array pointer being recorded incorrectly in some part of the code

        This assumption seems to be in the right direction as i asked Tanel Poder some time ago about the implementation of rowsets and he answered something like that: “… basically looks like a different array structure (and #rows) passed around in plan by reference. it’s too low level and there’s zero info about it out there, so can’t say yet confidently how exactly it’s been implemented. so basically a different interface for returning data to parent plan step with less data move-around.”

        Regards
        Stefan

        Comment by Stefan Koehler — November 10, 2015 @ 9:14 am GMT Nov 10,2015 | Reply

      • Hans,

        Advice from Mike Dietrich (Master Product Manager – Database Upgrade & Migrations – Oracle) – switch the feature off for the present.

        Comment by Jonathan Lewis — November 10, 2015 @ 2:41 pm GMT Nov 10,2015 | Reply

  6. […] the post by Jonathan Lewis and on the upgrade blog for further information: […]

    Pingback by Some Parameter Recommendations for Oracle 12c (Upgrade your Database — NOW!) — orabase.org — February 20, 2016 @ 1:09 pm GMT Feb 20,2016 | Reply

  7. It seems Oracle 18c introduced new issue with rowsets. https://community.oracle.com/message/15355712#15355712. Reproducible on Database Versions 18.4, 18.5 and 18.6, Linux x86_64 and AIX (18.3 we did not test, it has never been released for AIX and AIX is our main platform for Oracle databases).

    Comment by Pavol Babel — May 7, 2019 @ 12:39 pm BST May 7,2019 | Reply

    • Pavol,

      Thanks for the heads-up.

      Comment by Jonathan Lewis — May 7, 2019 @ 12:50 pm BST May 7,2019 | Reply

      • Hi Jonathan,

        This time the official minimum impact workaround (proposed by development) seems to be setting event 10123, level 1.

        10123, 00000, "Rowsets: turn off rowsets for various operations"
        // *Document: NO
        // *Cause:    N/A
        // *Action:   Turns off rowsets for various operations
        //            Level:
        //            0x00000001 - turn off for top select
        //
        

        Comment by Anonymous — May 14, 2019 @ 12:18 am BST May 14,2019 | Reply

        • Anonymous,

          Thanks for the update. Comments like this that allow articles to evolve with changes Oracle’s code are most helpful.

          I’ve checked the oraus.msg files for various versions and the change from event 10055 to 10123 seems to have occurred in 18.3.
          I’ve noted the change (and the extra levels available) in the body of the article.

          Comment by Jonathan Lewis — May 14, 2019 @ 10:27 am BST May 14,2019

        • Hi Jonathan,

          the “anonymouus” post was mine. The event 10055 still takes place in 18.x, however several new levels added when compared to 12.2. The event 10123 seems to turn off rowsets for top level query only (maybe for top level operator in execution plan, which returns data to client).

          Regards
          Pavol Babel
          OCM 10g/11g/12f

          Comment by Pavol Babel — May 15, 2019 @ 3:23 pm BST May 15,2019

  8. Hi Jonathan,

    Maybe I just do not understand what you mean by movement 10055 to 10123 in 18.3. From oraus.msg (18.6). I still can see 10055 in 18.6, several levels added. In the new event 10123 I can only one particular level, turning off for top level select.

    10123, 00000, "Rowsets: turn off rowsets for various operations"
    // *Document: NO
    // *Cause:    N/A
    // *Action:   Turns off rowsets for various operations
    //            Level:
    //            0x00000001 - turn off for top select
    //
    
    10055, 00000, "Rowsets: turn off rowsets for various operations"
    // *Document: NO
    // *Cause:    N/A
    // *Action:   Turns off rowsets for various operations
    //            Level:
    //            0x00000001 - turn off for table scan
    //            0x00000002 - turn off for hash join consume
    //            0x00000004 - turn off for hash join produce
    //            0x00000008 - turn off for group by
    //            0x00000010 - turn off for sort
    //            0x00000020 - turn off for table-queue out
    //            0x00000040 - turn off for table-queue in
    //            0x00000080 - turn off for identity
    //            0x00000100 - turn off for granule iterator
    //            0x00000200 - turn off for EVA functions
    //            0x00000400 - turn off for PL/SQL
    //            0x00000800 - turn off for upgrade
    //            0x00001000 - turn off for database startup
    //            0x00002000 - turn off for blobs and clobs
    //            0x00004000 - turn off for tracing row source
    //            0x00008000 - turn off rowset information in explain plan
    //            0x00010000 - disable hash join rowsets fast path
    //            0x00020000 - turn off for bloom create
    //            0x00040000 - turn off for bloom use
    //            0x00080000 - disable prefetch for hash join
    //            0x00100000 - disable prefetch for bloom
    //            0x00200000 - disable semi blocking hash join
    //            0x00400000 - turn off rowset for fixed table
    //            0x00800000 - turn off rowset for view
    //            0x01000000 - turn off rowset for filter
    //            0x02000000 - disable producing rowsets in scan with one-row eva
    //            0x04000000 - turn off rowsets for non-inner hash join
    //            0x08000000 - turn off rowsets for update
    //            0x10000000 - turn off rowsets for delete
    //            0x20000000 - disable prefetch for hash aggregation
    //            0x40000000 - turn off rowsets for window function
    //            0x80000000 - turn off rowsets for count
    
    

    Regards
    Pavol Babel

    Comment by Pavol Babel — May 16, 2019 @ 1:19 pm BST May 16,2019 | Reply

    • Pavol,

      I am now baffled – you’re right, of course, 10055 is still there from 18.3 onwards with all different levels and 10123 has only got the “top level” option. I’m not sure how I managed to see things differently – probably a little finger trouble combined with a failure to look very carefully when I though I knew what I was going to see.

      Thanks for coming back on that one.

      Comment by Jonathan Lewis — May 16, 2019 @ 2:11 pm BST May 16,2019 | Reply

      • Seems like the events 10055 and 10123 are partly related to different functions (query kernel allocate rowsets setup …). Here is the mapping for Oracle 19c:

        ———8<———————
        localhost:scripts istefan$ grep 10055 oracle_function_to_event_mapping19c.txt
        qerblStart dbkdChkEventRdbmsErr EVENT: 10055
        qerhjStart dbkdChkEventRdbmsErr EVENT: 10055
        qerhnStart dbkdChkEventRdbmsErr EVENT: 10055
        qesnhInitHTI dbkdChkEventRdbmsErr EVENT: 10055
        qkadrv2Post dbkdChkEventRdbmsErr EVENT: 10055
        qkaTraceTraverse dbkdChkEventRdbmsErr EVENT: 10055
        qknrBloomRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrCheckOpn dbkdChkEventRdbmsErr EVENT: 10055
        qknrCountRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrDefaultRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrGroupByRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrJoinRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrScanFixedRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrTableQueueRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrTableRowsets dbkdChkEventRdbmsErr EVENT: 10055
        qknrWindowRowsets dbkdChkEventRdbmsErr EVENT: 10055
        xplProjToTextPrefix dbkdChkEventRdbmsErr EVENT: 10055

        localhost:scripts istefan$ grep 10123 oracle_function_to_event_mapping19c.txt
        qknrExternalTableRowsets dbkdChkEventRdbmsErr EVENT: 10123
        qknrJoinRowsets dbkdChkEventRdbmsErr EVENT: 10123
        qknrTopRowsets dbkdChkEventRdbmsErr EVENT: 10123

        localhost:scripts istefan$ grep 10055 oracle_function_to_event_mapping19c.txt | grep qknrJoinRowsets
        qknrJoinRowsets dbkdChkEventRdbmsErr EVENT: 10055
        localhost:scripts istefan$ grep 10055 oracle_function_to_event_mapping19c.txt | grep qknrTopRowsets
        ———8<———————

        Comment by Stefan Koehler — May 17, 2019 @ 8:35 am BST May 17,2019 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.