Which piece of code will be faster (clue – the table in question has no indexes):
Option 1 – pure SQL
update join1 set data = data||'#' where key_1=500 and key_2=23851 and key_3=57012 and key_4=521 and key_6=1 and key_7=23352 ;
Option 2 – a silly PL/SQL row by row approach:
declare type rowid_type is table of urowid index by binary_integer; tab_rowid rowid_type; lv_rows_updated number :=0; cursor my_cursor is select rowid rid from join1 where key_1=500 and key_2=23851 and key_3=57012 and key_4=521 and key_6=1 and key_7=23352 ; begin open my_cursor; -- We know that the number of rows to be updated is very small fetch my_cursor bulk collect into tab_rowid limit 10000; forall lv_row in tab_rowid.first .. tab_rowid.last update join1 set data = data||'#' where rowid = tab_rowid(lv_row); lv_rows_updated := sql%rowcount; close my_cursor; end; /
It’s a trick question, of course, and although the automatic response from any DBA-type is likely to be “the SQL”, the correct answer is (as so often) “it depends”.
This question appeared as a problem on the OTN database forum a few days ago. In it’s original form it asked why a select statement should be much faster than a select for update or an update – even though the volume identified and updated was very small (just one row in 100M).The note then went on to show that using PL/SQL to select the rowids of the target rows then doing the bulk update by rowid was faster than the basic SQL update. The answer didn’t spring to mind immediately; but fortunately someone asked for some run-time statistics (v$sesstat) and the supplied statistics told me what was going on.
Conveniently the OP gave us the code to recreate the test case – all 100M rows of it; I cut this back to 16M rows (ca. 1.5GB of disc space), and then ran the tests with ny db_cache_size set to 256MB (another clue). I got similar results to the OP – not so dramatic, but the PL/SQL ran faster than the SQL and the difference was due to an obvious change in the CPU usage.
If you haven’t guess from the clue in the 256MB db_cache_size (which means the table is more than 5 times the size of the cache), the answer is “serial direct path reads”. For a sufficiently large table (and that’s not easy to define – start here and follow a few links) it’s fairly common knowledge that from 11g a tablescan can use a serial direct path read, and that’s what the PL/SQL was doing to select the required rowids. However, here’s a detail that’s not often mentioned: an update has to take place in public where everyone can see it so when Oracle executed the simple SQL update or select for update statement it had to scan the table through the buffer cache. Pulling all those blocks into the buffer cache, grabbing latches to link them to the right cache buffers chains, pinning them, then unpinning them uses a lot of CPU – which isn’t needed for the direct path read. The PL/SQL with its pure select used far less CPU than the basic SQL with its update/select for update, and because the OP had a very high-powered machine with plenty of CPU and loads of (disc-)caching effects all over the place the difference in CPU time was exremely visible as a fraction of the total DB time.
This was, inevitably, a very special case where a little detail became a significant fraction of the workload. The OP scanned 100M rows to update 1 row (in 7 – 13 seconds!). This doesn’t sound like a strategy you would normally want to adopt for frequent use; and for occasional use we might be happy to use the slower (13 second) approach to avoid the coding requirement of the fast (7 second) solution.
It’s worth pointing out that the PL/SQL strategy is not safe. In the few seconds between the select statement starting and the row being identified and updated by rowid it’s possible that another session could have updated (or deleted) the row. In the former case the update statement is now updating a row which doesn’t match the specification; in the latter case the code will raise an exception.
We can make the PL/SQL safer by including the original predicates in the update statement – but that still leaves the question of what the code should do if the select statement finds a row and the update fails to update it. Should it, perhaps, assume that there is still a row in the table that needs an update and re-run (using up all the time you saved by adopting a PL/SQL solution).