Oracle Scratchpad

January 14, 2020

Drop Column bug

Filed under: 18c,Bugs,Oracle,Upgrades — Jonathan Lewis @ 1:22 pm GMT Jan 14,2020

 

When I was a child I could get lost for hours in an encyclopedia because I’d be looking for one topic, and something in it would make me want to read another, and another, and …

The same thing happens with MOS (My  Oracle Support) – I search for something and the search result throws up a completely irrelvant item that looks much more interesting so I follow a hyperlink, which mentions a couple of other notes, and a couple of hours later I can’t remember what I had started looking for.

Today’s note is a side effect of that process. A comment made yesterday about count(*)/count(1) referenced Oracle bug “19450314: UNNECESSARY INVALIDATIONS IN 12C”, and when I searched MOS for more information on this bug I discovered bug 30404639 : TRIGGER DOES NOT WORK CORRECTLY AFTER ALTER TABLE DROP UNUSED COLUMN. The impact of this bug is easy to demonstrate, and the ramifications are as follows:

Exercise extreme care with the “alter table drop column” command in 18c and above.

The problem is easy to work around, but the impact of not knowing about it could be catastrophic if your pre-production testing wasn’t quite good enough. Here’s a little demonstration script – the bug note says the problem appeared in 18.3 but I ran this test against 19.3. The script is a modified version of the SQL in the bug note:


create table t1 (c0 varchar2(30), c1 varchar2(30), c2 varchar2(30), c3 varchar2(30), c4 varchar2(30));
create table t2 (c_log varchar2(30));

create or replace trigger t1_ariu
after insert or update on t1
for each row
begin
        IF :new.c3 is not null then
                insert into t2 values (:new.c3);
        end if;
end;
/

spool drop_col_bug_18c.lst

insert into t1(c3) values ('Inserting c3 - should log'); 
select * from t2;
 
insert into t1(c4) values ('Inserting c4 - should not log'); 
select * from t2;

prompt  ===================================
prompt  Drop some columns in two steps then
prompt  truncate t2 and repeat the test
prompt  ===================================
 
alter table t1 set unused (c1, c2);
alter table t1 drop unused columns;

truncate table t2;

insert into t1(c3) values ('Inserting c3 - should log'); 
select * from t2;
 
insert into t1(c4) values ('Inserting c4 - should not log'); 
select * from t2;
 

The code is very simple. It creates a couple of tables an “after row” trigger on one of them to copy one column value across to the other table on an insert or update provided the new column value is not null.

To check that the trigger is (at least in part) behaving the code does two inserts – one which should copy a value and one which should not – and we see that the copy takes place as expected.

Now comes the critical part. We mark two of the columns in the table as unused, then drop all unused columns, truncate the second table and repeat the inserts.

If you run the test on 12.2.0.1 then you should find that the second run behaves just like the first run. If you’re running 18c or 19c be prepared for the following:


insert into t1(c3) values ('Inserting c3 - should log')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [insChkBuffering_1], [4], [4], [], [], [], [], [], [], [], [], []

no rows selected

insert into t1(c4) values ('Inserting c4 - should not log')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [insChkBuffering_1], [4], [4], [], [], [], [], [], [], [], [], []

This is not good – but it gets worse. If your application starts consistently breaking with an ORA-00600 error that’s going to annoy a lot of users for (we hope) a brief interval, but if your application keeps running and corrupting your data that’s a much bigger problem. Re-run the whole script (dropping the two tables first) but change it to mark just one of the two columns as unused, and you’ll get results for the second pass that look like this:


Table truncated.


1 row created.


no rows selected


1 row created.


C_LOG
------------------------------
Inserting c4 - should not log

1 row selected.

The trigger seems to “lose count” of the columns in the table (presumably it’s compiled to refer to something like “column_position = 3” and doesn’t adjust on the “drop column” – the linked bug notes on MOS refer to the problem being associated with the project to increase fine-grained dependencies) so it manages to survive with one column dropped because there’s still a “column 3” which happens now to be the column that used to be “column 4”.

Workaround

There is a simple workaround if you run into this problem after modifying your production system (and before you’ve corrupted a huge amount of data – recompile the trigger manually immediately after the drop completes: “alter trigger t1_ariu compile”.

Refinement

The problem seems to appear only if the following two conditions are true:

  • you use a two-step approach to dropping a column, viz: set unused then drop. If you simply issue “alter table t1 drop column c1” (with or without a “checkpoint NNN”) then the problem does not appear. It’s a great shame that in the past I’ve given advice that setting columns unused and dropping them later is a better option than doing an immediate drop.
  • you drop columns that appear earlier in the table than the highest position column mentioned in the trigger. But this isn’t something you should gamble on, particularly since the workaround is so easy to implement, because the order the columns appear in the table declaration isn’t necessarily the internal column ordering so you might get it wrong (not that I’ve tried to test for that threat) – and what if there are some selective materialized view logs where you don’t explicitly create triggers and forget to cater for.

I don’t expect anyone to be dropping columns in production systems with any great frequency, and you would expect such a significant operation to be tested quite carefully, but it’s easy to envisage a scenario where the testing might be split into two pieces viz:

  1. test the application on a pre-prod version of the database where a table has been created as a subset of the production data without the column that’s due to be dropped
  2. test how long it takes to execute the actual drop on a (minimal) recovered backup of production, but don’t test the new production code on the resulting table.

Sometimes it’s easy to overlook things that “obviously” won’t need testing, especially when it’s something that has always worked in the past with no special treatment required.

Footnote

If you try running this model on LiveSQL you’ll find that the code stops and the web page reports “Error: Internal Server Error” so you can’t tell that the problem is exactly the same there – but it seems quite likely that it is.

Given how easy it is to bypass the problem I haven’t bothered to do any further research on the issue – is it only related to insert and update trigger, and do they have to be after row for the update, and what about before row delete triggers (with materialized view logs in mind).

Update Aug 2020

I’ve just rediscovered this note and done a quick check on the bug. It’s now reported as fixed in 21.1 and there are a few patches for earlier versions of Oracle including 18.10 and 19.0

 

 

2 Comments »

  1. Hi Jonathan,
    As it is said that this issue is related to fine-grained dependencies, I have just quickly tested it and found that setting “_ignore_fg_deps”=tables can be used as a preventive workaround (the tests were run on a local 19.5 database and on LiveSQL):

    SQL> alter session set "_ignore_fg_deps"=tables;
    
    Session altered.
    
    SQL>
    SQL> alter table t1 set unused (c1, c2);
    
    Table altered.
    
    SQL> alter table t1 drop unused columns;
    
    Table altered.
    
    SQL>
    SQL> truncate table t2;
    
    Table truncated.
    
    SQL>
    SQL> insert into t1(c3) values ('Inserting c3 - should log');
    
    1 row created.
    
    SQL> select * from t2;
    
    C_LOG
    ------------------------------
    Inserting c3 - should log
    
    SQL>
    SQL> insert into t1(c4) values ('Inserting c4 - should not log');
    
    1 row created.
    
    SQL> select * from t2;
    
    C_LOG
    ------------------------------
    Inserting c3 - should log
    

    Comment by Mikhail Velikikh — January 14, 2020 @ 4:07 pm GMT Jan 14,2020 | Reply

    • Mikhail,

      Thanks for the comment.

      Excellent observation.

      Checking MOS I see there have been just a few bugs reported that resort to using _ignore_fg_deps – though the ORA-00600 error they report had a different first parameter. I think the appearance of [kkmmctbf:bad intcoln] as the first parameter is a bit of a clue to which of the “column id” columns in col$ Oracle is having a problem with.

      Regards
      Jonathan Lewis

      Comment by Jonathan Lewis — January 14, 2020 @ 4:29 pm GMT Jan 14,2020 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.