One of the difficulties with trouble-shooting is that’s it very easy to overlook, or forget to go hunting for, the little details that turn a puzzle into a simple problem. Here’s an example showing how you can read a bit of an AWR report and think you’ve found an unpleasant anomaly. I’ve created a little model and taken a couple of AWR snapshots a few seconds apart so the numbers involved are going to be very small, but all I’m trying to demonstrate is a principle. So here’s a few lines of one of the more popular sections of an AWR report:
SQL ordered by Gets DB/Inst: OR32/or32 Snaps: 1754-1755 -> Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code. -> %Total - Buffer Gets as a percentage of Total Buffer Gets -> %CPU - CPU Time as a percentage of Elapsed Time -> %IO - User I/O Time as a percentage of Elapsed Time -> Total Buffer Gets: 351,545 -> Captured SQL account for 65.0% of Total Buffer Gets Elapsed Gets Executions per Exec %Total Time (s) %CPU %IO SQL Id ----------- ----------- ------------ ------ ---------- ----- ----- ------------- 8,094 20 404.7 2.3 0.0 114.1 2.3 017r1rur8atzv Module: SQL*Plus UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1
We have a simple update statement which, according to the hint/comment (that’s not a real hint, by the way) and guessing from column names, is doing an update by primary key; but it’s taking 400 buffer gets per execution!
It’s possible, but unlikely, that there are about 60 indexes on the table that all contain the n1 column; perhaps there’s a massive read-consistency effect going on thanks to some concurrent long-running DML on the table; or maybe there are a couple of very hot hotspots in the table that are being constantly modified by multiple sessions; or maybe the table is a FIFO (first-in, first-out) queueing table and something funny is happening with a massively sparse index.
Let’s just check, first of all, that the access path is the “update by PK” that the hint/comment suggests (cut-n-paste):
SQL> select * from table(dbms_xplan.display_cursor('017r1rur8atzv',null)); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------ SQL_ID 017r1rur8atzv, child number 0 ------------------------------------- UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1 Plan hash value: 1764744892 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | | | 3 (100)| | | 1 | UPDATE | T1 | | | | | |* 2 | INDEX UNIQUE SCAN| T1_PK | 1 | 14 | 2 (0)| 00:00:01 | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=:B1)
The plan is exactly as expected – so where do we look next to find out what’s going on? I’m a great believer in trying to make sure I have as much relevant information as possible; but there’s always the compromise when collecting information that balances the benefit of the new information against the difficulty of gathering it – sometimes the information that would be really helpful is just too difficult, or time-consuming, to collect.
Fortunately, in this case, there’s a very quick easy way to enhance the information we’ve got so far. The rest of the AWR report – why not search for that SQL_ID in the rest of the report to see if that gives us a clue ? Unfortunately the value doesn’t appear anywhere else in the report. On the other hand there’s the AWR SQL report (?/rdbms/admin/awrsqrpt.sql – or the equivalent drill-down on the OEM screen), and here’s a key part of what it tells us for this statement:
Stat Name Statement Per Execution % Snap ---------------------------------------- ---------- -------------- ------- Elapsed Time (ms) 36 1.8 0.0 CPU Time (ms) 41 2.0 0.1 Executions 20 N/A N/A Buffer Gets 8,094 404.7 2.3 Disk Reads 1 0.1 0.0 Parse Calls 20 1.0 0.4 Rows 2,000 100.0 N/A User I/O Wait Time (ms) 1 N/A N/A Cluster Wait Time (ms) 0 N/A N/A Application Wait Time (ms) 0 N/A N/A Concurrency Wait Time (ms) 0 N/A N/A Invalidations 0 N/A N/A Version Count 1 N/A N/A Sharable Mem(KB) 19 N/A N/A -------------------------------------------------------------
Spot the anomaly?
We updated by primary key 20 times – and updated 2,000 rows!
Take another look at the SQL – it’s all in upper case (apart from the hint/comment) with a bind variable named B1 – that means it’s (probably) an example of SQL embedded in PL/SQL. Does that give us any clues ? Possibly, but even if it doesn’t we might be able to search dba_source for the PL/SQL code where that statement appears. And this is what it looks like in the source:
forall i in 1..m_tab.count update /*+ by_pk */ t1 set n1 = 0 where id = m_tab(i).id ;
It’s PL/SQL array processing – we register one execution of the SQL statement while processing the whole array, so if we can show that there are 100 rows in the array the figures we get from the AWR report now make sense. One of the commonest oversights I (used to) see in places like the Oracle newsgroup or listserver was people reporting the amount of work done but forgetting to consider the equally important “work done per row processed”. To me it’s also one of the irritating little defects with the AWR report – I’d like to see “rows processed” in various of the “SQL ordered by” sections of the report (not just the “SQL ordered by Executions” section), rather than having to fall back on the AWR SQL report.
Footnote:
If you want to recreate the model and tests, here’s the code:
rem rem Script: forall_pk_confusion.sql rem Author: Jonathan Lewis rem Dated: Mar 2017 rem rem Last tested rem 12.1.0.2 rem create table t1 nologging as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select cast(rownum as number(8,0)) id, 2 * trunc(dbms_random.value(1e10,1e12)) n1, cast(lpad('x',100,'x') as varchar2(100)) padding from generator v1, generator v2 where rownum <= 1e6 -- > comment to avoid WordPress format problem ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt => 'for all columns size 1' ); end; alter table t1 add constraint t1_pk primary key(id); declare cursor c1 is select id from t1 where mod(id,10000) = 1 ; type c1_array is table of c1%rowtype index by binary_integer; m_tab c1_array; begin open c1; fetch c1 bulk collect into m_tab ; dbms_output.put_line('Fetched: ' || m_tab.count); close c1; forall i in 1..m_tab.count update /*+ by_pk */ t1 set n1 = 0 where id = m_tab(i).id ; dbms_output.put_line('Updated: ' || sql%rowcount); end; / select v.plan_table_output from v$sql sql, table(dbms_xplan.display_cursor(sql.sql_id, sql.child_number)) v where sql_text like 'UPDATE%by_pk%' ; select executions, rows_processed, disk_reads, buffer_gets from v$sql where sql_id = '017r1rur8atzv' ;
Jonathan,
If the update statement had an additional filter that meant that only 1 of the id values actually modified any rows (so executions=rows), would we have anywhere else to look to see that each execution accounted for 100 id values?
It seems to me as if v$sql could benefit from an additional metric to record this, although maybe this sort of situation is where you’d go straight to the developers scratching your head.
Comment by Andrew Sayer — March 22, 2017 @ 7:10 am GMT Mar 22,2017 |
Andrew,
That’s a very good “what if” question. It’s the style of thing anyone involved in testing should ask, viz:
“This is what Oracle will report if my (application) code behaves exactly the way I expect – now what sort of thing will Oracle show if my application code does something realistic that isn’t what I was expecting?”
I’m not sure there is a good answer to the question.
If rowsource_execution_stats were enabled you would see the “starts” figures related to the size of the array; similarly if monitoring were enabled for the statement then the “Execs” column of the SQL Monitor report would be reporting numbers related to the size of the array.
If you could isolate the session statistics at exactly the right moment you’d see that the statistic “index fetch by key” would (for this very restricted experiment) be reporting the number of index probes performed by the forall.
For the avoidance of (immediate) head-scratching, it’s worth noting that if the statement is still in the library cache, and if the code were embedded in a PL/SQL packaged then v$sql.program_id and v$sql.progam_line# would give you the package and line number of the bit of code that first optimised the update statement. For example (though I wouldn’t do it this way on a production system), after I modified the test to run the pl/sql as a packaged procedure:
I suppose even with this clue you may be unlucky because the line numbers may not match properly if the PL/SQL optimizer has been too clever with rewriting your code, but at least it’s likely to be close and will be the right package. (The program_id always seems to be zero for procedures that aren’t packaged – another good reason for using code packages.)
Comment by Jonathan Lewis — March 22, 2017 @ 10:23 am GMT Mar 22,2017 |
Jonathan, many thanks for those ideas. That’s something I didn’t realise about row source execution statistics: I would have expected with the last option to see starts 1. While verifying your claim, I see that ‘last’ includes all of the bulk binds of the last execution, that does make it obvious.
Of course, if the bulk binded operation was done using some non-plsql way then we wouldn’t be able to rely on the program_id at all, another win for plsql!
Comment by Andrew Sayer — March 22, 2017 @ 1:17 pm GMT Mar 22,2017 |
Andrew,
Can you clarify the comment about ‘last’ and bulk binds – I thought you might mean from dbms_xplan.display_cursor with peeked binds but that didn’t help me.
I did find that if I enabled SQL trace with binds on the code the trace file show a list of all the “Bind#0” values before the EXEC line – so tracing a suspect statement by SQL_ID would be another way of resolving the puzzle.
Comment by Jonathan Lewis — March 22, 2017 @ 2:22 pm GMT Mar 22,2017
By bulk binds I meant use of forall to execute once with many binds, I thought bulk binds was how it was commonly referred to, array bind is probably the term I was after though.
I just wanted to convey that when you do use the ‘last’ keyword in display_cursor format, it counts the whole operation as the execution (so you see the multiple starts) – you don’t have to opt for all executions (where you would expect multiple starts anyway). This was probably what you meant in the first place, I was just surprised that ‘last’ behaved like this.
Your mention of peeked binds gave me the idea that maybe the projection of the plan might give clues about the input – it doesn’t (at least to my eyes).
I’m not surprised about sql trace having the whole list, it is certainly reassuring.
Comment by Andrew Sayer — March 22, 2017 @ 2:37 pm GMT Mar 22,2017
Andrew,
Thanks – I just misinterpreted the point you were trying to make; I thought you’d jumped ahead to checking peeked_binds and found the output looked something like:
Naturally I was irritated that I hadn’t thought of that myself – and then I found it didn’t work for me.
Comment by Jonathan Lewis — March 22, 2017 @ 2:43 pm GMT Mar 22,2017
Thats why I don’t stop at AWR reports for understanding more about a query
Checking v$sql , DBA_HIST* or row source stats helps in these cases. I agree that Oracle should add the rows processed data to the AWR. Nice to know about “index fetch by key” statistics from your comments.
Comment by sachinperfdba — March 23, 2017 @ 2:17 am GMT Mar 23,2017 |
Fyi, real time sql monitoring should report the 100 starts of the statement.
(If you hint monitor as well as otherwise it should be too quick (which will change the sql id obviously))
It also only reports the first bind.
Comment by Dom Brooks — March 23, 2017 @ 12:55 pm GMT Mar 23,2017 |
Dom,
Thanks for the link.
Did you find time to write up the explanation for the sessions doing array based deletes of stats, I’ve tried a couple of searches but haven’t been able to find the follow-up.
Comment by Jonathan Lewis — March 23, 2017 @ 8:38 pm GMT Mar 23,2017 |
Never figured it out. Tried all sorts of circumstances.
Comment by Dominic — March 23, 2017 @ 9:44 pm GMT Mar 23,2017 |
[…] of the “SQL ordered by…” section. It’s often very helpful to check for summary comments like this – and not just in SQL sections of the AWR/Statspack report – sometimes […]
Pingback by Missing SQL_ID | Oracle Scratchpad — March 6, 2024 @ 12:13 pm GMT Mar 6,2024 |