Oracle Scratchpad

March 20, 2011

Upgrade Whoa

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 8:58 pm UTC Mar 20,2011

(The title’s a pun, by the way – an English form of humour that is not considered good unless it’s really bad.)

Very few people try to email me or call me with private problems – which is the way it should be, and I am grateful to my audience for realizing that this blog isn’t trying to compete with AskTom – but I do get the occasional communication and sometimes it’s an interesting oddity that’s worth a little time.

Today’s blog item is one such oddity – it was a surprise, it looked like a nasty change in behaviour, and it came complete with a description of environment, and a neatly formatted, complete, demonstration. For a discussion of the problem in Spanish you can visit the blog of John Ospino Rivas, who sent me the original email and has written his own blog post on the problem.

We start with a simple table, and then query it with a ‘select for update from two different sessions:


drop table tab1 purge;

create table tab1(
	id	number,
	info	varchar2(10),
	constraint tab1_pk primary key (id)
		using index (create index idx_tab1_pk on tab1(id))
);

insert into tab1 values(1,'a');
insert into tab1 values(2,'a');
insert into tab1 values(3,'a');
commit;

execute dbms_stats.gather_table_stats(user,'tab1',cascade=>true)

column id new_value m_id

set autotrace on explain

select  id
from    tab1
where   id = (
            select  min(id)
            from    tab1
        )
for update
;

set autotrace off

prompt	=============================================================
prompt  Now repeat the query in another session and watch it lock
prompt	And use a third session to check v$lock
prompt  Then delete here, commit and see what the second session does
prompt	=============================================================

accept X prompt 'Press return to delete and commit'

set verify on
delete from tab1 where id = &m_id;
commit;

The fact that the primary key index is created as a non-unique index isn’t a factor that affects this demonstration.

Given the query and the data in the table, you won’t be surprised by the result of the query from the first session (for convenience I’ve captured the selected value using the ‘column new_value’ option). Here’s the result of the query and its execution plan:


        ID
----------
         1

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  FOR UPDATE                  |             |       |       |            |          |
|*  2 |   INDEX RANGE SCAN           | IDX_TAB1_PK |     1 |     3 |     0   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE            |             |     1 |     3 |            |          |
|   4 |     INDEX FULL SCAN (MIN/MAX)| IDX_TAB1_PK |     3 |     9 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"= (SELECT MIN("ID") FROM "TAB1" "TAB1"))

At this point the program issues instructions to repeat the query from a second session, then waits for you to press Return. When you run the same query from another session it’s going to see the data in read-consistent mode and try to select and lock the row where ID = 1, so the second session is going to hang waiting for the first session to commit or rollback.

Here’s the key question: what’s the second session going to return when you allow the first session to continue, delete the row it has selected, and commit ? Here’s the answer if you’re running 10.2.0.3 or 11.1.0.6 (which is what I happen to have easily available):

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

        ID
----------
         2

1 row selected.

Now, this seems perfectly reasonable to me – especially since I’ve read Tom Kyte’s notes on “write consistency” and seen the “rollback and restart” mechanism that kicks in when updates have to deal with data that’s changed since the start of the update. Session 2 had a (select for) update, and when it finally got to a point where it could lock the data it found that the read-consistent version of the data didn’t match the current version of the data so it restarted the statement at a new SCN. At the new SCN the current highest value was 2.

Now here’s what happened when I ran the test under 11.2.0.2:

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

no rows selected

The upgrade produces a different answer !

At first sight (or guess) it looks as if the query has run in two parts – the first part producing the min(id) of 1 using a read-consistent query block, with the second part then using the resulting “known value” to execute the outer select (shades of “precompute_subquery”) and restarting only the second part when it discovers that the row it has been waiting for has gone away.

It doesn’t really matter whether you think the old behaviour or the new behaviour is correct – the problem is that the behaviour has changed in a way that could silently produce unexpected results. Be careful if any of your code uses select for update with subqueries.

As a defensive measure you might want to change the code to use the serializable isolation level – that way the upgraded code will crash with Oracle error ORA-08177 instead of silently giving different answers:

SQL> alter session set isolation_level = serializable;

Session altered.

SQL> get afiedt.buf
  1  select  /*+ gather_plan_statistics */
  2          id
  3  from    tab1
  4  where   id = (
  5              select  min(id)
  6              from    tab1
  7          )
  8* for update
  9  /
from    tab1
        *
ERROR at line 3:
ORA-08177: can't serialize access for this transaction

It might be a way of avoiding this specific problem, of course, but it’s not a frequently used feature (the first pages of hits on Google are mostly about SQL Server) so who knows what other anomalies this change in isolation level might introduce.

24 Comments »

  1. Definitely an eye-opener.
    > It doesn’t really matter whether you think the old behaviour or the new behaviour is correct
    Just as long as it’s deliberate.

    Comment by Dom Brooks — March 21, 2011 @ 11:18 am UTC Mar 21,2011 | Reply

    • And the worrying thing about being “deliberate” is that sometimes a MOS (Metalink) note can suggest it’s deliberate (the catch phrase is “expected behaviour”) when it isn’t really. I found some (MOS) metalink notes about this one that suggested it was deliberate – which is why I didn’t describe it as a possible bug.

      Comment by Jonathan Lewis — March 26, 2011 @ 12:49 pm UTC Mar 26,2011 | Reply

  2. I wonder if this change is documented. To document the change the writer would have to provide a test case like yours to really explain the change.
    Whether documented or not, it will bite.
    (BTW : Need to test it in 11.2.0.1 ?)

    Comment by Hemant K Chitale — March 22, 2011 @ 7:30 am UTC Mar 22,2011 | Reply

  3. I get “no rows selected” in 11.2.0.1.

    Comment by Hemant K Chitale — March 22, 2011 @ 2:57 pm UTC Mar 22,2011 | Reply

  4. Found two Metalink entries on this :

    Bug 11799526: SUB-QUERY IN SELECT FOR UPDATE EXECUTES PREMATURELY, CAUSES WRONG RESULT

    Bug 10177779: SELECT FOR UPDATE RETURNS WRONG RESULTS WITH SUBQUERY.

    one closed as no-bug

    regards

    Comment by Nilo Segura — March 23, 2011 @ 2:08 pm UTC Mar 23,2011 | Reply

  5. there are some things that changed in 11.2

    “… It doesn’t really matter whether you think the old behaviour or the new behaviour is correct – the problem is that the behaviour has changed in a way that could silently produce unexpected results. …”

    I don’t agree with that.
    It does matter whether the new behaviour is correct or rather a bug – it could influence your decision to use 11.2 at all.

    See for example
    http://marogel.wordpress.com/2011/03/27/prevent-validation-of-check-constraints-in-11-2-0-2/
    about a new behaviour in 11.2 which could lead to the decision “don’t use 11.2 at all”

    Comment by Matthias Rogel — March 27, 2011 @ 5:56 pm UTC Mar 27,2011 | Reply

  6. FYI

    Bug 11869076: CHANGE IN LOCKING BEHAVIOUR FOR DML WITH SUBQUERY IN 11.2

    Comment by Óscar Pérez — April 1, 2011 @ 11:23 am UTC Apr 1,2011 | Reply

  7. Hello,

    Isn’t the behaviour on 11.2.0.2 a violation of read-consistency, i.e. Oracle’s claim that

    Oracle Database always enforces statement-level read consistency, which guarantees that data returned by a single query is committed and consistent with respect to a single point in time

    (excerpt from Oracle Database Concepts, chapter 9 Data Concurrency and Consistency)?

    There was never a point in time where that query should have returned no rows (well there was, right after creation of the table, when the table was empty, but this point in time is potentially a long time before the statement/transaction began).

    In this sense shouldn’t it be considered a bug (difference between documentation and actual behaviour)?

    kind regards
    Martin

    Comment by Martin Maletinsky — April 29, 2011 @ 10:41 am UTC Apr 29,2011 | Reply

    • Martin,

      Personallly I’d call it a bug; it’s clearly incorrect behaviour – but it gets harder to label something as a bug when it has been reported as “expected behaviour”.

      Many years ago (in the v7.2 timeline) I was given the following “definition” by someone on support: “Yes, I know it’s wrong, but it’s in the specification so it’s not a bug.”

      Comment by Jonathan Lewis — April 29, 2011 @ 11:41 am UTC Apr 29,2011 | Reply

  8. Hello Jonathan,

    Thank you for your reply. Do you agree on my point that the behaviour under 11.2.0.2 is contradictory to the read consistency as cited from the Oracle Database Concepts in my initial comment?
    If this is the case, then there is a contradiction in the specification by (a) claiming read consistency and by (b) labelling something “expected behaviour” which contradicts (a).

    kind regards
    Martin

    Comment by Martin Maletinsky — April 29, 2011 @ 1:28 pm UTC Apr 29,2011 | Reply

  9. [...] one here from Jonathan Lewis that may or may not be a [...]

    Pingback by Select For Update fetching across commit « OraStory — June 7, 2011 @ 10:45 am UTC Jun 7,2011 | Reply

  10. The Oracle Support answer was …. “The bug has meanwhile been fixed and the fix is included in 11.2.0.3.”

    Regards.

    Comment by Óscar Pérez — June 17, 2011 @ 11:01 am UTC Jun 17,2011 | Reply

  11. As I’ve commented, Bug 11869076: CHANGE IN LOCKING BEHAVIOUR FOR DML WITH SUBQUERY IN 11.2.

    And RELATED BUGS:
    Bug 10177779: SELECT FOR UPDATE RETURNS WRONG RESULTS WITH SUBQUERY.
    Bug 9176765: THE RECORD COUNT IS DIFFERENT BY THE SPECIFIED COLUMN.

    I only have followed the John Ospino’s SR.
    Regards.

    Comment by Óscar Pérez — June 18, 2011 @ 8:12 am UTC Jun 18,2011 | Reply

  12. The patch is ready and running ok!

    From Metalink:

    The patch is ready, it can be downloaded from
    Patches and Updates -> Patch Search -> Patch Name or Number
    enter 11869076 and choose the HP-UX Itanium version.

    Regards, Óscar.

    Comment by Óscar Pérez — August 11, 2011 @ 9:31 am UTC Aug 11,2011 | Reply

  13. Here is example of write inconsistency:

    drop table t purge;
    create table t as select 1 id, 0 val from dual;
    insert into t values(2, 0);
    update t set val = 1;
    
    -- Then in second session
    update t set val = val + 1;
    -- Session waits for the first
    
    -- Then commit first
    commit;
    
    -- And query in second
    select * from t;

    The result will be

    SQL> select * from t;
    
            ID        VAL
    ---------- ----------
             1          2
             2          1

    Comment by Valentin Nikotin — December 23, 2011 @ 3:49 pm UTC Dec 23,2011 | Reply

    • Valentin,

      That’s an interesting example of something that’s arguably doing the wrong thing.

      According to your test result, the second session has updated (1,1) to (1,2) – which means it has seen some of the effects of the transaction executed by the first session, but it hasn’t updated (2,1) to (2,2) – which means it hasn’t seen all of the effects of the transaction executed by the first session.

      In principle, the second transaction should have done one of three things:
      a) Failed and rolled back (if you have made the second session serializable this would have happened with an ORA-08177)
      b) Rolled back and restarted at an SCN later than the commit SCN generated by the first session.

      As it is, the second session has behaved inconsistently.
      I’ve repeated your test on 8.1.7.4 and 11.2.0.2, by the way, and the result is the same.

      Comment by Jonathan Lewis — December 23, 2011 @ 7:04 pm UTC Dec 23,2011 | Reply

  14. It seems that the original problem described in the post refers to using the Scalar Subquery Caching in inappropriate place. This could be seen from 10200 trace for example. So, I created a simple test based on the same idea to show really wrong results:

    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    
    -- SESS1
    
    SQL> create table t1 (id, max1, max2) as select level, 0, 0 from dual connect by level <= 3;
    
    Table created.
    
    SQL> update t1 set id = 0 where id = 3;
    
    1 row updated.
    
    -- SESS2
    
    SQL> update t1 a
      2  set max1 = +(select max(id) from t1),
      3      max2 = -(select max(id) from t1)
      4  where a.id in (1, 3);
    
    -- waiting for session1
    
    -- SESS1
    
    SQL> commit;
    
    Commit complete.
    
    SQL>
    
    -- SESS2
    
    1 row updated.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> select * from t1 where id = 1;
    
            ID       MAX1       MAX2
    ---------- ---------- ----------
             1          2         -3
    
    SQL>
    

    Comment by Valentin Nikotin — February 17, 2012 @ 10:29 pm UTC Feb 17,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 874 other followers