Oracle Scratchpad

March 21, 2017

Deception

Filed under: Oracle,Troubleshooting — Jonathan Lewis @ 2:41 pm BST Mar 21,2017

One of the difficulties with trouble-shooting is that’s it very easy to overlook, or forget to go hunting for, the little details that turn a puzzle into a simple problem. Here’s an example showing how you can read a bit of an AWR report and think you’ve found an unpleasant anomaly. I’ve created a little model and taken a couple of AWR snapshots a few seconds apart so the numbers involved are going to be very small, but all I’m trying to demonstrate is a principle. So here’s a few lines of one of the more popular sections of an AWR report:

SQL ordered by Gets                       DB/Inst: OR32/or32  Snaps: 1754-1755
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> %Total - Buffer Gets   as a percentage of Total Buffer Gets
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Total Buffer Gets:         351,545
-> Captured SQL account for   65.0% of Total

     Buffer                 Gets              Elapsed
      Gets   Executions   per Exec   %Total   Time (s)  %CPU   %IO    SQL Id
----------- ----------- ------------ ------ ---------- ----- ----- -------------
      8,094          20        404.7    2.3        0.0 114.1   2.3 017r1rur8atzv
Module: SQL*Plus
UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1

We have a simple update statement which, according to the hint/comment (that’s not a real hint, by the way) and guessing from column names, is doing an update by primary key; but it’s taking 400 buffer gets per execution!

It’s possible, but unlikely, that there are about 60 indexes on the table that all contain the n1 column; perhaps there’s a massive read-consistency effect going on thanks to some concurrent long-running DML on the table; or maybe there are a couple of very hot hotspots in the table that are being constantly modified by multiple sessions; or maybe the table is a FIFO (first-in, first-out) queueing table and something funny is happening with a massively sparse index.

Let’s just check, first of all, that the access path is the “update by PK” that the hint/comment suggests (cut-n-paste):


SQL> select * from table(dbms_xplan.display_cursor('017r1rur8atzv',null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  017r1rur8atzv, child number 0
-------------------------------------
UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1

Plan hash value: 1764744892

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |       |       |       |     3 (100)|          |
|   1 |  UPDATE            | T1    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| T1_PK |     1 |    14 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=:B1)

The plan is exactly as expected – so where do we look next to find out what’s going on? I’m a great believer in trying to make sure I have as much relevant information as possible; but there’s always the compromise when collecting information that balances the benefit of the new information against the difficulty of gathering it – sometimes the information that would be really helpful is just too difficult, or time-consuming, to collect.

Fortunately, in this case, there’s a very quick easy way to enhance the information we’ve got so far. The rest of the AWR report – why not search for that SQL_ID in the rest of the report to see if that gives us a clue ? Unfortunately the value doesn’t appear anywhere else in the report. On the other hand there’s the AWR SQL report (?/rdbms/admin/awrsqrpt.sql – or the equivalent drill-down on the OEM screen), and here’s a key part of what it tells us for this statement:


Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                                36            1.8     0.0
CPU Time (ms)                                    41            2.0     0.1
Executions                                       20            N/A     N/A
Buffer Gets                                   8,094          404.7     2.3
Disk Reads                                        1            0.1     0.0
Parse Calls                                      20            1.0     0.4
Rows                                          2,000          100.0     N/A
User I/O Wait Time (ms)                           1            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                        0            N/A     N/A
Concurrency Wait Time (ms)                        0            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                     1            N/A     N/A
Sharable Mem(KB)                                 19            N/A     N/A
          -------------------------------------------------------------

Spot the anomaly?

We updated by primary key 20 times – and updated 2,000 rows!

Take another look at the SQL – it’s all in upper case (apart from the hint/comment) with a bind variable named B1 – that means it’s (probably) an example of SQL embedded in PL/SQL. Does that give us any clues ? Possibly, but even if it doesn’t we might be able to search dba_source for the PL/SQL code where that statement appears. And this is what it looks like in the source:

        forall i in 1..m_tab.count
                update  /*+ by_pk */ t1
                set     n1 = 0
                where   id = m_tab(i).id
        ;

It’s PL/SQL array processing – we register one execution of the SQL statement while processing the whole array, so if we can show that there are 100 rows in the array the figures we get from the AWR report now make sense. One of the commonest oversights I (used to) see in places like the Oracle newsgroup or listserver was people reporting the amount of work done but forgetting to consider the equally important “work done per row processed”. To me it’s also one of the irritating little defects with the AWR report – I’d like to see “rows processed” in various of the “SQL ordered by” sections of the report (not just the “SQL ordered by Executions” section), rather than having to fall back on the AWR SQL report.

Footnote:

If you want to recreate the model and tests, here’s the code:


rem
rem     Script:         forall_pk_confusion.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        2 * trunc(dbms_random.value(1e10,1e12))         n1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format problem
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1', 
                method_opt       => 'for all columns size 1'
        );
end;

alter table t1 add constraint t1_pk primary key(id);

declare

        cursor c1 is
        select  id
        from    t1
        where   mod(id,10000) = 1
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

begin

        open c1;

        fetch c1
        bulk collect
        into m_tab
        ;

        dbms_output.put_line('Fetched: ' || m_tab.count);

        close c1;

        forall i in 1..m_tab.count
                update  /*+ by_pk */ t1
                set     n1 = 0
                where   id = m_tab(i).id
        ;

        dbms_output.put_line('Updated: ' || sql%rowcount);

end;
/

select
        v.plan_table_output
from
        v$sql   sql,
        table(dbms_xplan.display_cursor(sql.sql_id, sql.child_number)) v
where
        sql_text like 'UPDATE%by_pk%'
;

select
        executions, rows_processed, disk_reads, buffer_gets
from    v$sql  
where   sql_id = '017r1rur8atzv'
;

10 Comments »

  1. Jonathan,
    If the update statement had an additional filter that meant that only 1 of the id values actually modified any rows (so executions=rows), would we have anywhere else to look to see that each execution accounted for 100 id values?
    It seems to me as if v$sql could benefit from an additional metric to record this, although maybe this sort of situation is where you’d go straight to the developers scratching your head.

    Comment by Andrew Sayer — March 22, 2017 @ 7:10 am BST Mar 22,2017 | Reply

    • Andrew,

      That’s a very good “what if” question. It’s the style of thing anyone involved in testing should ask, viz:

        “This is what Oracle will report if my (application) code behaves exactly the way I expect – now what sort of thing will Oracle show if my application code does something realistic that isn’t what I was expecting?”

      I’m not sure there is a good answer to the question.

      If rowsource_execution_stats were enabled you would see the “starts” figures related to the size of the array; similarly if monitoring were enabled for the statement then the “Execs” column of the SQL Monitor report would be reporting numbers related to the size of the array.

      If you could isolate the session statistics at exactly the right moment you’d see that the statistic “index fetch by key” would (for this very restricted experiment) be reporting the number of index probes performed by the forall.

      For the avoidance of (immediate) head-scratching, it’s worth noting that if the statement is still in the library cache, and if the code were embedded in a PL/SQL packaged then v$sql.program_id and v$sql.progam_line# would give you the package and line number of the bit of code that first optimised the update statement. For example (though I wouldn’t do it this way on a production system), after I modified the test to run the pl/sql as a packaged procedure:

      
      SQL> select program_id, program_line# from v$sql where sql_id = '8j4da2d9vddzf';
      
      PROGRAM_ID PROGRAM_LINE#
      ---------- -------------
          159810            29
      
      1 row selected.
      
      SQL> select * from user_source where name = (select object_name from user_objects where object_id = 159810) and line = 29;
      
      NAME         TYPE               LINE TEXT                                               ORIGIN_CON_ID
      ------------ ------------ ---------- -------------------------------------------------- -------------
      DO_SOMETHING PACKAGE BODY         29    forall i in 1..m_tab.count                                    0
      
      1 row selected.
      
      
      

      I suppose even with this clue you may be unlucky because the line numbers may not match properly if the PL/SQL optimizer has been too clever with rewriting your code, but at least it’s likely to be close and will be the right package. (The program_id always seems to be zero for procedures that aren’t packaged – another good reason for using code packages.)

      Comment by Jonathan Lewis — March 22, 2017 @ 10:23 am BST Mar 22,2017 | Reply

      • Jonathan, many thanks for those ideas. That’s something I didn’t realise about row source execution statistics: I would have expected with the last option to see starts 1. While verifying your claim, I see that ‘last’ includes all of the bulk binds of the last execution, that does make it obvious.
        Of course, if the bulk binded operation was done using some non-plsql way then we wouldn’t be able to rely on the program_id at all, another win for plsql!

        Comment by Andrew Sayer — March 22, 2017 @ 1:17 pm BST Mar 22,2017 | Reply

        • Andrew,

          Can you clarify the comment about ‘last’ and bulk binds – I thought you might mean from dbms_xplan.display_cursor with peeked binds but that didn’t help me.
          I did find that if I enabled SQL trace with binds on the code the trace file show a list of all the “Bind#0” values before the EXEC line – so tracing a suspect statement by SQL_ID would be another way of resolving the puzzle.

          Comment by Jonathan Lewis — March 22, 2017 @ 2:22 pm BST Mar 22,2017

        • By bulk binds I meant use of forall to execute once with many binds, I thought bulk binds was how it was commonly referred to, array bind is probably the term I was after though.
          I just wanted to convey that when you do use the ‘last’ keyword in display_cursor format, it counts the whole operation as the execution (so you see the multiple starts) – you don’t have to opt for all executions (where you would expect multiple starts anyway). This was probably what you meant in the first place, I was just surprised that ‘last’ behaved like this.
          Your mention of peeked binds gave me the idea that maybe the projection of the plan might give clues about the input – it doesn’t (at least to my eyes).
          I’m not surprised about sql trace having the whole list, it is certainly reassuring.

          Comment by Andrew Sayer — March 22, 2017 @ 2:37 pm BST Mar 22,2017

        • Andrew,

          Thanks – I just misinterpreted the point you were trying to make; I thought you’d jumped ahead to checking peeked_binds and found the output looked something like:

          Peeked Binds (identified by position):
          --------------------------------------
             1 - :1[100] (NUMBER): 1
          

          Naturally I was irritated that I hadn’t thought of that myself – and then I found it didn’t work for me.

          Comment by Jonathan Lewis — March 22, 2017 @ 2:43 pm BST Mar 22,2017

  2. Thats why I don’t stop at AWR reports for understanding more about a query
    Checking v$sql , DBA_HIST* or row source stats helps in these cases. I agree that Oracle should add the rows processed data to the AWR. Nice to know about “index fetch by key” statistics from your comments.

    Comment by sachinperfdba — March 23, 2017 @ 2:17 am BST Mar 23,2017 | Reply

  3. Fyi, real time sql monitoring should report the 100 starts of the statement.
    (If you hint monitor as well as otherwise it should be too quick (which will change the sql id obviously))

    It also only reports the first bind.

    https://orastory.wordpress.com/2015/02/26/real-time-sql-monitoring-bulk-bound-statement/

    Comment by Dom Brooks — March 23, 2017 @ 12:55 pm BST Mar 23,2017 | Reply

    • Dom,

      Thanks for the link.

      Did you find time to write up the explanation for the sessions doing array based deletes of stats, I’ve tried a couple of searches but haven’t been able to find the follow-up.

      Comment by Jonathan Lewis — March 23, 2017 @ 8:38 pm BST Mar 23,2017 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Powered by WordPress.com.