Quiz Night

August 28, 2009

Quiz Night

Filed under: Infrastructure,Performance,Troubleshooting — Jonathan Lewis @ 5:51 pm BST Aug 28,2009

I was using my “recent sql” check on client site some time ago, and found that one of their programmers had clearly been advised to make use of bind variables when writing SQL for an OLTP system, because I found code like the following in the library cache:

select customer_name from customers where cust_id = :v_104356;

Seems okay so far – but the second statement dumped by my query looked like this:

select customer_name from customers where cust_id = :v_27054;

And so it went on – I found about 30 statements like this that had appeared in the previous 60 seconds. The developer was using bind variables but managing to do it in a way that caused a “hard parse” on every statement.

I couldn’t help smiling when I saw this, then I paused to ask myself if this was a deliberate design decision and if so what possible justification there could be for doing something like this. And then I thought of a rational answer that might explain the findings.

This week’s challenge: can you think of a reasonable explanation for finding 30 statements like this as recent arrivals in the library cache ?

Update – for clarification:

While looking at an OLTP system you find 30 statements like the above (yes, absolutely that simple) in v$sql, and say to the client: “This is a little unusual, why are you doing it?” The client supplies an explanation.

Although you were expecting to respond to any explanation with a comment like “You shouldn’t do it like that” (or possibly “We’ll have to find out why that’s happening and stop it” ) you feel the urge to say “That was a good idea”.

So can you think why something that initially looks like an odd design flaw could actually be a clever idea.

See comment 18 below for my response to this observation (and responses to the many suggestions made by the readers.)

Comments (32)

32 Comments »

I don’t know if I would consider it a reasonable explanation, but are the developers using the CUST_ID in the generation of the bind variable?

Comment by Centinul — August 28, 2009 @ 7:14 pm BST Aug 28,2009 | Reply
Could DDL on customers table (I think even truncate counts as such) force a hard parse? If so, the select is probably inside a loop that also includes the DDL in it.

Comment by prodlife — August 28, 2009 @ 7:55 pm BST Aug 28,2009 | Reply
They’re doing it so the bind variable values aren’t hidden to them when they start looking in v$sql, but not realizing the penalty for doing so.

Comment by Dave Johnson — August 28, 2009 @ 7:55 pm BST Aug 28,2009 | Reply
I am able to think of a couple reasons why bind variables may appear with V_ followed by a unique number, where the SQL statement is otherwise identical:
* ADO replaces ? characters found in a SQL statement (these are bind variable placeholders) with :1, :2, :3, etc. It could be that the data interface used by the client’s application automatically replaces bind variable place holders with V_ followed by a sequential number (never resetting to 1 as the starting number as ADO does when a new SQL statement is submitted).
* The programming environment used to write the client application may require every bind variable to be pre-declared within the application module before the module is compiled, thus the SQL statements that are otherwise the same may be submitted by different application modules for the same application, or different procedures within the same module.
* The SQL statements are executed within a PL/SQL function, and the PL/SQL functions have variable names like v_27054 and v_104356 so that each PL/SQL function has a unique set of input variable names for documentation purposes – the input variable name is then specified in the SELECT SQL statement. Thus, the same SQL statement would be executed from several different PL/SQL functions.
* Essentially the same SQL statement is specified several times in a PL/SQL block using SELECT INTO … :v_27054 or :v_104356, and later a second query is executed looking for additional information about those customers issues select customer_name … cust_id = :v_27054 in one part of the PL/SQL block and select customer_name … cust_id = :v_104356 in another. V_ is apparently a common naming convention for PL/SQL variables.

Comment by Charles Hooper — August 28, 2009 @ 8:33 pm BST Aug 28,2009 | Reply
Code generator appends process id to keep variables unique across instantiations?

Comment by joel garry — August 28, 2009 @ 9:16 pm BST Aug 28,2009 | Reply
I’m not sure what you are asking for Jonathan. The other comments are assuming you are asking for why Oracle decides to use the bind variable yet do a hard parse. But you might be asking why you would be wanting Oracle to do a hard parse each time?

Based on the latter, I could imagine that you would not want Oracle to waste time checking if the syntax of the statement is correct or not but you would want Oracle to re-calculate the execution path for each itteration. Reasons would be:-
(a) use of partitions so Oracle would need a new, very similar exection plan for each partition accessed (they are, after all, different segments)
(b) the column is skewed, so you would want Oracle to consider histogram data on the column.
The thing is, I am not sure if the Parsing engine is smart enough to split the parse into “check syntax” and “generate execution plan”.

It might help me make or more informed wild guess { :-) } if you said if the cursors were seen as separate entries as child cursors (in V$SQL) of a parent or as different parents (V$SQLAREA)each time.

Comment by mwidlake — August 28, 2009 @ 11:57 pm BST Aug 28,2009 | Reply
Based on your question, my first thought, like mwidlake, was data skew. But assuming you haven’t over-simplified the query, and assuming that cust_id is really a unique index on the customers table, that would eliminate the possibility of skew (because of the uniqueness).

So unless one of the above assumptions are incorrect, I fall back to something a bit more far-fetched. Perhaps they would do this to take advantage of a function based index? Perhaps converting a number to varchar2 or vice-versa. If there were a query in the library cache using something like v_custid of the correct data type, then an overloaded procedure using the incorrect data type could dynamically create new bind variables encouraging a hard parse and the use of a function based index that could be used with the incorrect data type. Not the way I would implement it, but it’s a possibility (if a bit far-fetched).

Comment by AnthonyC — August 29, 2009 @ 2:28 pm BST Aug 29,2009 | Reply
I think that this is due to a bind variable mismatch. They are not using the same size for their bind variables. The bind variable names (:v_104356,:v_27054) suggest also that the query is not coming from plsql.

Comment by Mohamed — August 29, 2009 @ 2:54 pm BST Aug 29,2009 | Reply
They have other queries on the customer table that benefit from histograms due to skewed data, including ones that look just like this that have a large variance in the number of rows returned, so they want to overide bind peeking, but only sometimes?

Comment by joel garry — August 30, 2009 @ 2:54 am BST Aug 30,2009 | Reply
6,9: My first idea was programers wanted to take benefit of histograms. They probably do not like BIND VARIABLE PEEKING, so their code could be from PL/SQL (using DBMS_SQL or EXECUTE IMMEDIATE). So every hard parse make a PEEK into each query. They could use literals, but they don’t want to show literals in v$sql v$sqlarea etc.

But if you look at the table, cust_id is probably the primary key of the customers table. So there is no need for histogram and to peek every bind variable…

Jonathan wrote that every sql statemnt was executed just once (for every execution there was a hard parse). I think this code is not from PL/SQL too. Some application code is trying always to use bind variables in not very smar way

Comment by Pavol Babel — August 30, 2009 @ 12:49 pm BST Aug 30,2009 | Reply
Jonathan didn’t wrote, that v$sql.executions was always 1. Just another idea: 104356 and 27054 are no cust_ids, but some kind of session identifiers. There were 30 (application) sessions heavily using this statement in the previous 60 seconds. Having only one cursor for this statement could cause heavy contention (library cache lock/pin) on this cursor. Having one cursor per application session eliminates this contention.

Comment by moha — August 30, 2009 @ 3:15 pm BST Aug 30,2009 | Reply
- Well, Jonathan wrote that: “The developer was using bind variables but managing to do it in a way that caused a “hard parse” on every statement” . I think hard parse on every statment means every stament was executed only once. But if not… It was my second idea:
  104356 could be session_id (or process_id) or some hash of them. I would imagine f.e. some internet based application, which makes a lot of queries. It could use just 30 (or a little bit more sessions) connected to DB (maybe from an application server). Then critical SQL (most often executed) could be written in this way. But you shouldn’t use EXECUTE IMMEDIATE, because it performs some kind of soft parse every time (and it will increase library cach lathes again). The very effiecient way would be using DMBS_SQL. The PL/SQL package could store handles (n_handle := DBMS_SQL.open_corsor(…); ) for each critical SQL in package variables and it wouldn’t ommit closing of the cursor. Then every session wouldn’t have to parse statement (not even soft parse) for every execution. It would always just bind and execute, bind and execute…
  
  Comment by Pavol Babel — August 30, 2009 @ 8:24 pm BST Aug 30,2009 | Reply
  - … “it wouldn’t ommit closing” … it should be “it would ommit(skip) closing cursor” . So I mean if sessions opens a cursor, it’ll keep it opened whole time during the session’s life. It’ll always only bind and execute cursor (with help of stored cursor ID from DBMS_SQL.open_cursor )and eliminate soft parsing.
    
    Comment by Pavol Babel — August 30, 2009 @ 9:03 pm BST Aug 30,2009 | Reply
Hi,

My observation is also close to most of yours, but specifically as below:

1. The bind peeking would not give a efficient plan for all possible cus_id’s.
2.This will also reduce the library cache contention.
3.The bind variable is constructed with the cust_id as the suffix so that when the same cust_id comes in the next time there will not be a hard parse and the bind peeking done already would given them the required plan.

Comment by Balaji Ramachandran — August 30, 2009 @ 7:38 pm BST Aug 30,2009 | Reply
- 1. A “CUSTOMERS” table is likely to store either customers or customer versions. I don’t think we had versions here, what could one do with an unsorted set of historic names of a specific customer?
  cust_id seems to be the (whole) PK, if it weren’t, Jonathan would have informed us.
  When cust_id is the PK, bind variable peeking plays no role, does it? Cardinality = 1. Assuming that the table is “well-designed”, is only one execution plan to be expected, no matter whether histograms exist or not. (Probably not.)
  3. When the goal is 2., the client should have rather few customers, otherwise the statement could have too many versions. Of course, the customers could be grouped, e.g. v_{mod(cust_id, group_cnt)}. They probably have fewer (concurrent) database or application sessions of THIS application than customers. (They could also be grouped.)
  
  If there were a hard parse per customer, had the first user after a “flush” (when no version of the statement is in the cache) bad luck. If there were a hard parse per (application) session, were the user experience probably better.
  
  Comment by moha — August 30, 2009 @ 9:21 pm BST Aug 30,2009 | Reply
Perhaps the DBA is trying to track individual user SQL by means of a user id tag embedded in the bind variable of each SQL. This would be one way to overcome anonymisation of the end user identity due to shared connection pools.

Comment by Jeroen — August 30, 2009 @ 8:19 pm BST Aug 30,2009 | Reply
- Wouldn’t that be better served by comments? I can’t help but think whatever explanation that would be clever would have to justify hard parsing. Hmmm, maybe cursor_sharing is being used, that might override this objection.
  
  Comment by joel garry — August 30, 2009 @ 11:26 pm BST Aug 30,2009 | Reply
The three, admittedly weak, things that I can think of
1) Attempt to flush the shared pool by forcing SQL statements in
2) Generate background CPU noise to simulate the system being under load for testing purposes.
3) Trying to make the %Soft Parse Ratio look bad on their AWR report…

Comment by Yorick Iddon — August 30, 2009 @ 11:26 pm BST Aug 30,2009 | Reply
I can think of two options:
1) “customers” is not a table but a view and the optimal access path is dependent on the user executing the query
2) there are 30 schemas, each having it’s own customers table

Comment by Alen — August 31, 2009 @ 7:09 am BST Aug 31,2009 | Reply
- If there were 30 schmeas, it wouldn’t be neccesary to use these (a little bit unusal) bind variables. There would exist 30 child cursors in library cache by default behaviour
  
  Comment by Pavol Babel — August 31, 2009 @ 7:14 pm BST Aug 31,2009 | Reply
They use the client_id as the number of the bind variable, i.e. :v_12345 means that the client_id is 12345, so if you want to know the value of the bind variable all you have to do is to look at the query.

I saw this “implementation” in a Web Application times ago…

Comment by lascoltodelvenerdi — August 31, 2009 @ 8:14 am BST Aug 31,2009 | Reply
I think comment 11 (moha) is the most likely answer. To avoid library cache contention on a heavily used statement.
If the numbers in the bind are the customer_id, then it would be easier to skip the binding and use constants. And Jonathan finished with saying there might be a good reason. So I’ll join moha.

Comment by Arian — August 31, 2009 @ 9:36 am BST Aug 31,2009 | Reply
Whenever I see something a little unusual, I usually ask myself: “why might that happen – why would anyone want to do that ?” If you don’t get into this habit you can easily end up wasting time trying to fix something that isn’t broken.

My question was deliberately a little vague and open-ended because I didn’t want to push people in any particular direction – I just wanted them to prompt their imagination.

Before I tell you the idea I had to justify this odd pattern, I’ll just commment on a few of the suggestions.

First the people who tried to explain the phenomenont:

Ceninual (#1) and others: when I pointed this out to the on-site DBA, his comment was: “It looks like they’re using the customer ID as the name of the bind variable.” So full marks for good guesswork – and Dave’s comment (#3) helps to explain why it sounds like a nice idea – but this isn’t a reasonable justification for what they’ve done, and in this case the developers need a proper explanation of why bind variables are important.

Charles’ comment (#4) is worth a mention because it is possible that some programming enviroment could be responsible for an unexpected and undeirable pattern of behaviour – and Joel’s follow-up in #5 is an interesting extension of that thought, does the process ID appear in the bind variable somehow … the “multiple schema” idea appearing in a later post also ties in here, as it might be one variable name per schema. (But it’s not a pl/sql thing – a couple of the other comments reminds us that pl/sql turns variable names into :Bnnn, where the nnn represents a numeric value).

I’m going to pick up mwidlake for an error – because we often learn from other people’s mistakes. If two pieces of SQL look different they will appear in the library cache (x$kglob) as different parent objects, and different parents will have different children. So if the texts look the same apart from the “:v_NNNNNN” then each text is a different parent with a child holding the execution plan.

Chen (#2): again a possibility – DDL inside the loop – but a little unlikely and not a good justification for what’s going on. If DDL in the loop is actually happening in production it’s definitely an error [but see comment #18] that should be corrected. (Although global temporary tables could be an exception, and did produce some catastrophic effects in earlier versionsof Oracle). A table name of customers makes DDL a little unlikely though.

A throwaway comment about primary keys and histograms – depending on the nature of the key and the queries, a histogram can be relevant to a PK if the values have an odd distribution, even if the frequencies do not. In this case, though, the nature of the query (customer by ID) seems to suggest that there is no intention of avoiding problems of data skew.

There was some discussion about different technologies, or function features, and their effects on how cursor caching and library cache contention may be affected – this type of discussion is important and Oracle has changed all sorts of details of cursor caching and cursor pinning over the versions from 8i to 10g, so it’s always worth testing your old coding ideas each time you start a new project.

Finally we come to the thought I had – and it’s very close to the idea from Moha (#11) about deliberately making identical queries look (slightly) different to avoid library cache latch contention.

The client has 32 web application servers addressing a single database. If this is a very popular query then the single library cache latch protecting a totally sharable query could get hammered and be the focus of a lot of library cache latch contention. In the past when I’ve seen this happening I’ve advised clients to introduce a mechanism that makes each application server use “different” SQL by creating SQL that includes a “hint” like: /*+ from WAS 23 */

Could it be that this client had done the same sort of thing – but made the SQL look different by generating variable names that were dependent in some way on the identity of the Web application server sending in the query ?

(In this case the answer was no – because every single query was executed just once, and they really were using the input value as the name of the bind variable. The DBA has since explained to them the reason why bind variable are good – but only if your bind variable names don’t keep changing).

Apologies for any typos – I’m on a three-hour train journey and the Wifi link keeps dropping.

Comment by Jonathan Lewis — August 31, 2009 @ 7:58 pm BST Aug 31,2009 | Reply
- @Jonathan
  “If DDL in the loop is actually happening in production it definitely an error that should be corrected.”
  
  Well, it’s not a rule.
  
  I’m working in a project where we must create table partitions or “swap tables” on the fly so we have many loops where you can see “add partitions” or “create tables”.
  
  It’s the only way to do it because partitions and “swap tables” depends on data (i.e. we have things like “partition by sysdate”)
  
  So a DDL in a loop can be OK.
  
  Bye,
  Antonio
  
  Comment by lascoltodelvenerdi — September 1, 2009 @ 6:39 am BST Sep 1,2009 | Reply
  - lascoltodelvenerdi,
    Point taken – I should have been more precise: “if DDL in a loop is actually happening in production as part of a lightweight end-user facing operation … “.
    
    Housekeeping code – particularly code for partition maintenance – could reasonably loop through DDL: I’ve event implemented that type of thing myself.
    
    Comment by Jonathan Lewis — September 1, 2009 @ 6:07 pm BST Sep 1,2009 | Reply
- “I’m going to pick up mwidlake for an error ” – Ahh Damn, I’m going to stop trying!
  No I’m not really, and I’m happy to have my mistakes picked up. Yes, the statement has to give exactly the same hash value, ie be byte-by-byte the same. Even lower/upper case and extra spaces create a different entry so my stupid mistake.
  
  I like Piet’s point further down – OK, so it was generating lots of hard parses, but was it causing an impact to the business? Was there a gain by fixing it? I suspect yes, but Piet’s question is perhaps the core of “Why to Tune”.
  
  Moha’s point was very good. I like that.
  
  Comment by mwidlake — September 1, 2009 @ 8:32 pm BST Sep 1,2009 | Reply
- Neat – is there a way to take advantage of this in PL/SQL?
  
  Comment by Chris — December 6, 2009 @ 5:57 am GMT Dec 6,2009 | Reply
  - It will work in PL/SQL too since PL wouldn’t remove the comment if it looked like a hint. The problem with such a technique is to support many different versions of the same code.
    
    Comment by Timur Akhmadeev — December 7, 2009 @ 8:09 am GMT Dec 7,2009 | Reply
  - Chris,
    
    I think Timur has answered your question about using the “pseudo” hint.
    
    There was a problem in some older versions of Oracle where pl/sql lost the hints if you failed to leave a space between the “+” and the first word of the hint, though; and I think there was a more recent (but still pre-10g) version of one of the ODBC or JDBC drivers that managed to strip hints out of code in transit.
    
    Comment by Jonathan Lewis — December 12, 2009 @ 11:23 am GMT Dec 12,2009 | Reply
a DDL(such as Truncate) in a Loop induces an implicit commit; commit inside the loop(or commit across fetch) represents one of the main reasons for ora-01555 rollback segment snapshot too old.

Comment by Mohamed Houri — September 1, 2009 @ 7:53 am BST Sep 1,2009 | Reply
The use of comment or other way to uniquely identify (find, trace) a statement can be quite handy and it would have been my guess. But it turned out to be a totally un-glamorous oversight then.
The simplest explanation eh ?

If Dynamic-SQL or parsing seems a problem, I tend to look for nr of statements using the same plan to spot possible problem-stmnts. I suspect (nearly) all stmnts had the same plan, but did they ?
And you mention the relatively high number of 30 stmnts in 60 seconds and “causing hard parse on every statement”. But was parsing (or CPU usage, or shared-pool) acutally a problem on this system ?

Comment by PdV — September 1, 2009 @ 4:32 pm BST Sep 1,2009 | Reply
[…] the Oracle Scratchpad, it was quiz night again. In this second one, Jonathan Lewis and his contestants submit their explanations for bundle […]

Pingback by Log Buffer #160: a Carnival of the Vanities for DBAs | Pythian Group Blog — September 4, 2009 @ 5:00 pm BST Sep 4,2009 | Reply