Oracle Scratchpad

July 6, 2015

SQL vs. PL/SQL

Filed under: Oracle,Performance,Troubleshooting — Jonathan Lewis @ 10:23 am GMT Jul 6,2015

Which piece of code will be faster (clue – the table in question has no indexes):

Option 1 – pure SQL


update join1 set
        data = data||'#'
where   key_1=500
and     key_2=23851
and     key_3=57012
and     key_4=521
and     key_6=1
and     key_7=23352
;

Option 2 – a silly PL/SQL row by row approach:


declare
        type rowid_type is table of urowid index by binary_integer;
        tab_rowid           rowid_type;  

        lv_rows_updated     number :=0;  

        cursor my_cursor is
                select  rowid rid
                from    join1
                where   key_1=500
                and     key_2=23851
                and     key_3=57012
                and     key_4=521
                and     key_6=1
                and     key_7=23352
        ;

begin
        open my_cursor;

        -- We know that the number of rows to be updated is very small
        fetch my_cursor bulk collect into tab_rowid limit 10000;

        forall lv_row in tab_rowid.first .. tab_rowid.last
             update join1 set data = data||'#' where  rowid = tab_rowid(lv_row);

        lv_rows_updated := sql%rowcount;
        close my_cursor;
end;
/

It’s a trick question, of course, and although the automatic response from any DBA-type is likely to be “the SQL”, the correct answer is (as so often) “it depends”.

This question appeared as a problem on the OTN database forum a few days ago. In it’s original form it asked why a select statement should be much faster than a select for update or an update – even though the volume identified and updated was very small (just one row in 100M).The note then went on to show that using PL/SQL to select the rowids of the target rows then doing the bulk update by rowid was faster than the basic SQL update. The answer didn’t spring to mind immediately; but fortunately someone asked for some run-time statistics (v$sesstat) and the supplied statistics told me what was going on.

Conveniently the OP gave us the code to recreate the test case – all 100M rows of it; I cut this back to 16M rows (ca. 1.5GB of disc space), and then ran the tests with ny db_cache_size set to 256MB (another clue). I got similar results to the OP – not so dramatic, but the PL/SQL ran faster than the SQL and the difference was due to an obvious change in the CPU usage.

If you haven’t guess from the clue in the 256MB db_cache_size (which means the table is more than 5 times the size of the cache), the answer is “serial direct path reads”. For a sufficiently large table (and that’s not easy to define – start here and follow a few links) it’s fairly common knowledge that from 11g a tablescan can use a serial direct path read, and that’s what the PL/SQL was doing to select the required rowids. However, here’s a detail that’s not often mentioned: an update has to take place in public where everyone can see it so when Oracle executed the simple SQL update or select for update statement it had to scan the table through the buffer cache. Pulling all those blocks into the buffer cache, grabbing latches to link them to the right cache buffers chains, pinning them, then unpinning them uses a lot of CPU – which isn’t needed for the direct path read. The PL/SQL with its pure select used far less CPU than the basic SQL with its update/select for update, and because the OP had a very high-powered machine with plenty of CPU and loads of (disc-)caching effects all over the place the difference in CPU time was exremely visible as a fraction of the total DB time.

This was, inevitably, a very special case where a little detail became a significant fraction of the workload. The OP scanned 100M rows to update 1 row (in 7 – 13 seconds!). This doesn’t sound like a strategy you would normally want to adopt for frequent use; and for occasional use we might be happy to use the slower (13 second) approach to avoid the coding requirement of the fast (7 second) solution.

Footnote:

It’s worth pointing out that the PL/SQL strategy is not safe. In the few seconds between the select statement starting and the row being identified and updated by rowid it’s possible that another session could have updated (or deleted) the row. In the former case the update statement is now updating a row which doesn’t match the specification; in the latter case the code will raise an exception.

We can make the PL/SQL safer by including the original predicates in the update statement – but that still leaves the question of what the code should do if the select statement finds a row and the update fails to update it. Should it, perhaps, assume that there is still a row in the table that needs an update and re-run (using up all the time you saved by adopting a PL/SQL solution).

 

 

 

8 Comments »

  1. Why is PL/SQL allowed to do a serial direct path read here while SQL must do a conventional read?

    Comment by Markus — July 6, 2015 @ 1:27 pm GMT Jul 6,2015 | Reply

    • The SQL and the PL/SQL are doing different things.

      The PL/SQL is doing a select (which can be done in private)
      The SQL is doing an update (or select for update – which modifies blocks anyway) which apparently HAS to be done in public.

      Comment by Jonathan Lewis — July 6, 2015 @ 1:50 pm GMT Jul 6,2015 | Reply

  2. Could SQL be forced to do the same thing as the PL/SQL by using “update join1 where rowid in (select rowid from join1 where …)”?

    Comment by Andrew Flowers — July 6, 2015 @ 4:36 pm GMT Jul 6,2015 | Reply

    • Andrew,

      Good idea, worth a try – but doesn’t work.
      I even did a silly cast to multiset and back and still got cached reads, even with “_serial_direct_read” set to “always”.

      
      create or replace type jpl_table as table of varchar2(18)
      /
      
      update join1 j1 set
              data  =  data||'#'
      where   rowid in (
                      select  
                              *
                      from    table(
                                      cast(
                                              multiset(
                                                      select  rowid
                                                      from    join1   j2
                                                      where   key_1 = 500
                                                      and     key_2 = 23851
                                                      and     key_3 = 57012
                                                      and     key_4 = 521
                                                      and     key_6 = 1
                                                      and     key_7 = 23352
                                              ) as jpl_table
                                      )
                              )
              )
      ;
      
      
      

      Comment by Jonathan Lewis — July 6, 2015 @ 5:19 pm GMT Jul 6,2015 | Reply

    • Andrew,

      A little further thought produced this — how to separate the select from the update , and this one does work [ed: no it doesn’t, see next comment]: update with direct path reads on the select:

      update join1 j1 set
      data = data||’#’
      where rowid in (
      with rowids as (
      select /*+ materialize */
      rowid row_id
      from join1 j2
      where key_1 = 500
      and key_2 = 23851
      and key_3 = 57012
      and key_4 = 521
      and key_6 = 1
      and key_7 = 23352
      )
      select row_id from rowids
      )
      ;

      Comment by Jonathan Lewis — July 6, 2015 @ 5:31 pm GMT Jul 6,2015 | Reply

      • A useful aspect of scripting to make tests repeatable is that sometimes they don’t repeat – and that makes you look more closely at everything you’ve done.

        At some point I had set “_serial_direct_reads”=always for my session, and had not set it back to auto when testing materialized subquery factoring. When I added this test to the test suite and created a new session to regenerate all the results it didn’t use serial direct reads. This made me repeat the entire test suite with “_serial_direct_reads”=always, and EVERY SINGLE test (select, select for update, update) in simple SQL used direct path reads on the table.

        So Oracle can do serial direct reads on updates (and select for updates) but has a different rule regarding when AUTO applies for updates – which might be “don’t”.

        Comment by Jonathan Lewis — July 6, 2015 @ 7:11 pm GMT Jul 6,2015 | Reply

  3. […] While looking for a posting about efficient updates  I came across another of my posting that compares SQL with PL/SQL for updates – it’s worth a read. […]

    Pingback by Never … | Oracle Scratchpad — June 29, 2016 @ 8:45 am GMT Jun 29,2016 | Reply

  4. […] 昨天读jonathan lewis的https://jonathanlewis.wordpress.com/2015/07/06/sql-vs-plsql/文章,本文的comment部分值得仔细看。对于SQL和PLSQL的使用场景和效率问题,一直都是一个比较有趣的话题。 […]

    Pingback by direct path read和sql(or pl/sql) | leonotes — July 13, 2016 @ 5:46 am GMT Jul 13,2016 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.