Merge – argh! | Oracle Scratchpad

June 3, 2011

Merge – argh!

Filed under: Bugs,Oracle — Jonathan Lewis @ 6:21 pm BST Jun 3,2011

I wonder if I ought to create an “Argh” category on my blog for articles about some of the nastier, and sometimes unbelievable, bugs in the Oracle code. Here’s one (that’s just about forgivable in terms of failing to test) published by Sokrates a couple of months ago.

If you’re using the merge command without including insert/delete options then Oracle will ignore check constraints. (What’s the most important requirement of a database – many people would say that it’s the correctness of the data.)

Comments (17)

17 Comments »

Jonathan:

Are you missing an “un” in the line “that’s just about forgivable in terms of failing to test”?

It seems to imply that the update part of a merge uses an entirely different code path than a regular update whixh is worrisome.

John

Comment by John Spencer — June 3, 2011 @ 8:22 pm BST Jun 3,2011 | Reply
- John,
  
  No, the “un” wasn’t missing. I really do mean that when you’re constructing a piece of code it’s easy to overlook things that other people will say you obviously should have checked.
  
  In this case I can imagine that someone could easily manage to write a test suite that proved that the incoming information resulted in the correct modifications taking place, but forgot that the “correct” result of a calculation could still be an illegal result.
  
  I have seen someone destroy a production database because they tested every possible case to show that their code updated every row of their test data that it should have done in exactly the correct way – but forgot to test that it didn’t update anything that it shouldn’t have updated. It’s actually a very common failing (though not often so extreme).
  
  Comment by Jonathan Lewis — June 4, 2011 @ 10:37 am BST Jun 4,2011 | Reply
I think such a category could have the right mix of schadenfreude and technical usefulness to be very popular. Oracle is pretty good about publishing bugs, but must necessarily keep it flat and unemotional and not speculate on other consequences of them. A blog like this has the competency and dynamism to go far beyond what Oracle can reasonably do.

Merge has always been fragile without proper coding and error handling (one silly constraint violation can blow off a huge merge, for example – perhaps the converse of the bug). These kinds of things aren’t addressed well in the docs at all, finding them on the tubes is hit or miss at best, and those on the bleeding edge will miss.

Comment by joel garry — June 3, 2011 @ 8:35 pm BST Jun 3,2011 | Reply
Unbelievable! Remind me of 7.3.3 bug resulting in corrupted redologs.

Comment by al0 — June 3, 2011 @ 8:59 pm BST Jun 3,2011 | Reply
The *argh* to me is that they have the gall to only fix this in 12!

What, those who have upgraded to 11 have to now wait forever or not use merge?

They “don’t count” because 11 is “old hat”?

And Oracle is surprised that folks don’t even want to touch the latest releases?

Comment by Noons — June 4, 2011 @ 12:58 am BST Jun 4,2011 | Reply
- FYI, the way this works is the bug must be fixed in the main code line first (which happened to be 12, which is the “fixed version”). It was then backported to 11.2 and is already available as a BLR on 11.2.0.2 and is marked for inclusion in 11.2.0.3.
  
  Comment by Greg Rahn — June 4, 2011 @ 4:55 am BST Jun 4,2011 | Reply
  - That’s perfectly reasonable, it just looks scary with no qualification on 12, no way to find out when 12 will be available, and no easy way to find backports. Or am I missing some flashy MOS thing? I haven’t missed MOS 742060.1, no 11.2.0.3 there. (Excuse the grumpiness, I’m on Itanium).
    
    Comment by joel garry — June 6, 2011 @ 10:41 pm BST Jun 6,2011 | Reply
  - jgarr: I think it’ll be a while before 11.2.0.3 percolates to the less popular ports, such as Aix. Ah well, that delays our 11g upgade for a while more. No way I’m going to let go of a stable system and replace it with a beta release…
    (no: I don’t care if I’m out of prime support, if this is what passes for prime)
    
    Comment by Noons — June 6, 2011 @ 10:54 pm BST Jun 6,2011 | Reply
As Rafu pointed out on
http://rafudb.blogspot.com/2011/03/merge-ignores-check-constraint.html
Patch 9285259 seems to fix it

Comment by Sokrates — June 4, 2011 @ 6:48 am BST Jun 4,2011 | Reply
- Sokrates,
  
  Thanks for the follow-up.
  
  Comment by Jonathan Lewis — June 4, 2011 @ 11:01 am BST Jun 4,2011 | Reply
Hmm… well, if you don’t like too lazy constraint checking, maybe you prefer the over eagerness of direct path constraint checking…
```
create table my_test (col1 varchar(30) not null, col2 varchar(30) not null);

insert into my_test (col1) select 'x' from dual where 1=0;
```
–> works w/o problem
```
insert /*+ APPEND */ into my_test (col1) select 'x' from dual where 1=0;

ERROR at line 1:
ORA-01400: cannot insert NULL into ("MY_TEST"."COL2")
```
No rows are inserted, but still we get an error.

Once a default value for the second column is defined, the error is not thrown anymore…

Funny, isn’t it?

Comment by Lars — June 4, 2011 @ 12:12 pm BST Jun 4,2011 | Reply
- Lars,
  
  Interesting little oddity – the important thing now is to work out whether this is going to cause a real problem.
  
  Comment by Jonathan Lewis — June 5, 2011 @ 11:51 am BST Jun 5,2011 | Reply
Hi Jonathan,
I’m facing a very weird situation with a MERGE command and I’m quite lost. The database version is 10.2.0.4 and the OS is solaris 64 bits sparc. Basically, I have a particular MERGE command that runs for 30-40 min and suddenly starts to rollback the transaction (sometimes it runs successfully and the execution plan is exactly the same when the job fails). This is part of a sql script executed within an informatica job (it drops the indexes before the indexes, runs the MERGE and recreate the indexes), the application doesn’t fire an error and no error is seen in the alert.log. I tried to trace a 10046 event and nothing is seen, the rollback activity is also traced.
The immediate sql statement after the MERGE command is not executed when this MERGE is rolled back. While the session is rolling back , the status is of the session is ACTIVE, so this is not killed externally. I cannot trap any error from the informatica session since that tool “thinks” the session is still running.
One of the times I found an ORA 12170 in the server’s sqlnet.log, but the next time the job failed this sqlnet error was not observed.
I opened a SR in Metalink a week ago and so far they have no clue about this problem. This issue is also seen once in a while in a QA environment with a completely different network. We run several MERGE’s jobs from informatica and they don’t have any problem.
As a workaround, would it help to develop a procedure including the entire process and invoke the procedure from the client?
I know you are not a magician but you are one of the best on troubleshooting weird issues. Hope you can help me out to dig into this issue.
Thanks in advance

Comment by Rodrigo — July 24, 2012 @ 2:50 pm BST Jul 24,2012 | Reply
- Rodrigo,
  
  In the absence of any reported error, I can’t think of any reason why the merge should roll back.
  
  As a first step I would keep tracing the code and check the trace file for the text “ERR” – it’s possible that an error is being raised and then silently discarded; if so it should still be visible in the trace with the originating ORA- error number.
  
  Without some idea of why this is happening there is little point in speculating whether you can work around it by putting it into a package.
  
  Comment by Jonathan Lewis — August 7, 2012 @ 5:38 pm BST Aug 7,2012 | Reply
Hi,

We are facing the same problem in Oracle data integrator. Was this problem fixed ,if so please share the same . . .

Comment by Vikas G K — December 28, 2012 @ 6:28 am GMT Dec 28,2012 | Reply
- Vikas,
  
  If you’re asking about the problem reported in the main article, then comment 5 answers your question.
  If you’re asking about the problem in comment 7, then I’ll have to leave it to Rodrigo to see WordPress sends him an email prompt about your question.
  
  Comment by Jonathan Lewis — December 28, 2012 @ 11:21 am GMT Dec 28,2012 | Reply
  - Jonathan/Vikas, the issue in comment 7 was fixed. We found the database server lost connection with the client due to a sort of timeout set by the firewall, but the client didn’t show any ORA error. So we enabled expire_time=10 on sqlnet.ora and the issue disappeared.
    BTW, Happy New Year!!
    
    Comment by Rodrigo — December 28, 2012 @ 12:25 pm GMT Dec 28,2012 | Reply