Many of the questions that appear on OTN are deceptively simple until you start thinking carefully about the implications; one such showed up a little while ago:
What i want to do is to delete rows from table where it matches condition upper(CATEGORY_DESCRIPTION) like ‘%BOOK%’.
At the same time i want these rows to be inserted into other table.
The first problem is this: how carefully does the requirement need to be stated before you can decide how to address it? Trying to imagine awkward scenarios, or boundary conditions, can help to clarify the issue.
If you delete before you insert, how do you find the data to insert ?
If you insert before you delete, what happens if someone updates a row you’ve copied so that it no longer matches the condition. Would it matter if the update changes the row in a way that leaves it matching the condition (what you’ve inserted is not totally consistent with what you’ve deleted).
If you insert before you delete, and someone executes some DML that makes another row match the requirement should you delete it (how do you avoid deleting it) or leave it in place.
Once you start kicking the problem about you’ll probably come to the conclusion that the requirement is for the delete and insert to be self-consistent – in other words what you delete has to be an exact match for what you insert as at the time you inserted it. You’ll ignore rows that come into scope in mid-process due to other activity, and you’ll have to stop people changing rows that are being transferred (in case there’s an audit trail that subsequently says that there was, at some point in time, a row that matched the condition but never arrived – and a row that has arrived that didn’t match the final condition of the rows that disappeared).
Somehow your code needs to lock the set of rows to be transferred and then transfer those rows and eliminate them. There are two “obvious” and simple strategies – readers are invited to propose others (or criticise the two I – or any of the comments – suggest). I’ll start with a simple data setup for testing:
create table t1 as select object_id, object_name, owner from all_objects ; alter table t1 add constraint t1_pk primary key(object_id); create table t2 as select * from t1 where rownum = 0 ; execute dbms_stats.gather_table_stats(user,'t1') execute dbms_stats.gather_table_stats(user,'t2')
The simplest approach is often the best – until, perhaps, you spot the drawbacks – do a basic delete of the data to be transferred (which handles the locking) but wrap the statement in a PL/SQL block that captures the data (using the returning clause) and then inserts it into the target table as efficiently as possible. With thanks to Andrew Sayer who prompted this blog post:
declare type t1_rows is table of t1%rowtype; t1_deleted t1_rows; begin delete from t1 where owner = 'SYSTEM' returning object_id, object_name, owner bulk collect into t1_deleted; forall i in 1..t1_deleted.count insert into t2 values t1_deleted(i); commit; end; /
The drawback to this, of course, is that if the volume to be transferred is large (where “large” is probably a fairly subjective measure) then you might not want to risk the volume of memory (PGA) it takes to gather all the data with the bulk collect.
For large volumes of data we could reduce the threat to the PGA by gathering only the rowids of the rows to be transferred (locking the rows as we do so) then do the insert and delete based on the rowids:
declare type rid_type is table of rowid; r rid_type; cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update; begin open c1; fetch c1 bulk collect into r; close c1; forall i in 1..r.count insert into t2 select * from t1 where rowid = r(i); forall i in 1..r.count delete from t1 where rowid = r(i); commit; end; /
Note, particularly, the “for update” in the driving select.
Inevitably there is a drawback to this strategy as well (on top of the threat that the requirement for memory might still be very large even when the return set is restricted to just rowids). We visit the source data (possibly through a convenient index
and avoid visiting the table, [silly error deleted – see comment 8]) to collect rowids; then we visit the data again by rowid (which is usually quite efficient) to copy it, then we visit it again (by rowid) to delete it. That’s potentially a significant increase in buffer cache activity (especially latching) over the simple “delete returning” strategy; moreover the first strategy gives Oracle the option to use the index-driven optimisation for maintaining indexes and this method doesn’t. You might note, by the way, that you could include an “order by rowid” clause on the select; depending on your data distribution and indexes this might reduce the volume of random I/O you have to do as Oracle re-visits the table for the inserts and deletes.
We can address the PGA threat, of course, by fetching the rowids with a limit:
declare type rid_type is table of rowid; r rid_type; cursor c1 is select rowid from t1 where owner = 'SYSTEM' for update; begin open c1; -- dbms_lock.sleep(60); loop fetch c1 bulk collect into r limit 5; forall i in 1..r.count insert into t2 select * from t1 where rowid = r(i); forall i in 1..r.count delete from t1 where rowid = r(i); exit when r.count != 5; end loop; close c1; commit; end; /
One thing to be aware of is that even though we fetch the rowids in small batches we lock all the relevant rows when we open the cursor, so we don’t run into the problem of inserting thousands of rows into t2 and then finding that the next batch we select from t1 has been changed or deleted by another session. (The commented out call to dbms_lock.sleep() was something I included as a way of checking that this claim was true.) This doesn’t stop us running into a locking (or deadlocking) problem, of course; if it takes us 10 seconds to lock 1M rows in our select for update another user might manage to lock our millionth row before we get there; if, a few seconds later, it then gets stuck in a TX/6 wait trying to lock one of our locked rows after we start waiting in a TX/6 wait for our millionth row our session will time out after 3 further seconds with an ORA-00060 deadlock error.
The limit of 5 is just for demonstration purposes, of course – there were 9 rows in all_objects that matched the select predicate; in a production system I’d probably raise the limit as high as 255 (which seems to be the limit of Oracle’s internal array-processing).
You’ll notice, of course, that we can’t use this limited fetch approach with the delete command – the entire delete would take place as we opened the equivalent cursor and, though we can use the bulk collect with the returning clause, there is no syntax that allows something like the fetch with limit to take place.
My first thought was to play around with the AS OF SCN clause. Select the current SCN from v$database and then do things like delete “as of scn”, or “select for update as of scn” – there were ways of getting close, but invariably I ended up running into Oracle error: “ORA-08187: snapshot expression not allowed here”. But maybe someone else can come up with a way of doing this that doesn’t add significant overheads and doesn’t allow for inconsistent results.