Oracle Scratchpad

July 29, 2007

NLS

Filed under: Indexing,Infrastructure,trace files,Tuning — Jonathan Lewis @ 9:50 pm BST Jul 29,2007

A couple of weeks ago, having just arrived home from Spain, I got a phone call in Heathrow airport from an old client that I hadn’t seen for about 18 months. They had a problem – as far as they could tell a business critical query had suddenly stopped using an index and was doing a tablescan instead. At peak load times the query was taking between 15 and 50 seconds to complete when it used to take less than 2 seconds. Did I have any time to come and help?

Luckily I had a free morning the following day, so we set up a VPN over the Internet to connect to his machine to see what was happening. He had been very thorough – he had a trace file from the end-user process, execution plans (including a trace) from an SQL*Plus session – the works.

From his SQL*Plus session the plan showed an index range scan – but from the end-user process he got a tablescan. It was a really simple query but the optimizer refused to use the obvious index when running from the client machine.

The one thing he hadn’t done was pull a full execution plan from memory. Since this was a 10g machine, I hunted down the query (v$sql.sql_text like …) and called the  dbms_xplan.display_cursor() procedure to show me the full execution plan. It looked a bit like this:

-------------------------------------
SQL_ID  83j0azzjtdjbw, child number 0
-------------------------------------
select  * from t1 where v1 = :nv1         

Plan hash value: 3617692013         

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |       |       |     2 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |    75 |     2 |
----------------------------------------------------------         

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(SYS_OP_C2C("V1")=:NV1)         

(The query was just a tiny bit more complex – it was a multi-column index, but it really was a single-table query).

Spot the problem – there’s an odd, internal, function which is disabling the index. The DBA had said that when he’d run the trace from the client he’d used 10046 level 12 but Oracle hadn’t printed the bind values – so I had a look a the trace. It looked a bit like this (the database version was 10.1.0.5):


BINDS #3:
 bind 0: dty=96 mxl=32(22) mal=00 scl=00 pre=00 oacflg=01 oacfl2=800010 size=160 offset=0
   bfp=800003f880024ce0 bln=32 avl=12 flg=05
   value=""

Sure enough, the bind trace details were there – and the values were missing.

But look closely at the data type (dty) – it’s 96, which means nchar(). [But see comments below]

When testing from SQL*Plus the DBA had been using bind variables of type varchar2() – and the indexed column was of type varchar2() – so the optimizer selected the index. But the Java client had been using inputs of type nvarchar() – which resulted in an implicit coercion that disabled the index.

So the index wasn’t being used – and had never been used. But the performance had changed – giving the impression that an index was suddenly ‘no longer working’. When things work well you don’t check what they are doing – so no-one had realised that the query had been running with an in-memory tablescan for months.

In a way, it’s an example of the dangers of solving a problem with hardware – you can keep throwing CPU and memory at the problem to stave off the disaster but one day you have to fix the real problem.

In this case the available memory was so large that the table have been classed as a “short” table and preferentially loaded into the buffer cache – so the tablescan was pretty quick. But little tables grow, and one day this table (and a few others) became just a bit too big to be classed as “short”  – suddenly the only cache benefit was at the disk cabinet, and the response time slowed down dramatically.

Of course, the problem was easy to fix. The table should have been declared with nvarchar2() columns.  But for a quick fix we simply created a function-based index on the relevant columns using the to_nchar() function on each column. Response times for the business critical query suddenly became faster than they had ever been before.

The moral of the story:

Hardware had been hiding the problem for a long time – but when the problem hit it was seen as a big problem and too much time was wasted trying to resolve it because the DBA “knew” that the query must have been using the index when things were working.

Viewing it from the opposite perspective – if you do decide to use hardware to “solve” a problem remember that you’re probably just postponing the day when you really have to fix the problem – and it may be bigger, nastier, and harder to fix when you really are forced to fix it.

11 Comments »

  1. Great article, with easy to follow logic, all the way from identification that a problem exists to developing a solution for the problem. Thanks for taking the time to explain the process of finding correct solutions to problems.

    Based on previous research on Metalink, dty=96 seems to imply that the bind data may have been passed as a CHAR rather than a NCHAR. Metalink references: Note:67701.1, Note:108377.1, Note:154170.1, Bug No: 490872.

    On the Oracle Server Performance Technical Forum (found by a Metalink search), there is a thread from 27-Mar-07 with a subject of “Optimizer not using index when it should” that might provide a hint as to why SYS_OP_C2C might appear in the filter predicates – in this case, the packaged application was using columns defined as NVARCHAR2.

    One of the interesting things that I have found programming with ADO with bind variables is that trace files generated on Oracle 10.2.0.2 often show “oacdty=96 mxl=32(08)” regardless of whether the bind variable is declared as a CHAR or VARCHAR, and the max length always seems to show 32 (unless of course the data passed is longer than 32 characters). This data type oddity does not seem to be confined to only programs using ADO.

    Your thoughts and corrections to my comments/understanding are appreciated.

    Comment by Charles Hooper — July 27, 2007 @ 1:14 pm BST Jul 27,2007 | Reply

  2. If the goal of tuning the business critical query was to make it use the index then won’t it be beneficial to modify the SQL to use a HINT. Creating a new index might potentially impact (either positively or negatively) other SQL statements against this table.

    Thanks!

    Comment by Nitin Pai — July 27, 2007 @ 9:02 pm BST Jul 27,2007 | Reply

  3. That’s a coincidence.

    10 days ago, I had a simular issue with a client who has a bad performance after an application upgrade (oracle 9iR2, the application written in .net).
    I started some trace files to check which sql statements where taking the most time, I came across a query that was doing a full table scan where we expected it to use an index.
    When testing the sql statement with explain plan and with autotrace the query did indeed use the index.
    In the raw trace file I noticed that the bind variable datatype was 1 and that the value was not filled in (although the avl was not zero).
    I had never come accross this, and on a hunch I generated two traces, one when using nvarchar2 as datatype for the bind variable and one using varchar2 as datatype. And indeed, the one where I used the nvarchar2 datatype was showing the full table scan and lacked the values for the bind variables in the trace files, while the one with the varchar2 datatype was using the index and had the bind values filled in.
    What probably happened was that the developers of the application vendor changed a setting in their development tool, causing it to use nvarchar2 datatypes instead of varchar2.

    Note that when I used the nvarchar2 dataype with explain plan or autotrace, oracle still showed me an execution plan with the index range scan instead of the full table scan it was actually using.

    In your post you are saying that dty 96 means nchar, but is it not that it means char or nchar?
    In the case I had, both trace files had dty=1, but the trace file with the nvarchar2 datatype did not listed the bind variable value.

    regards,

    Freek

    Comment by Freek — July 27, 2007 @ 10:17 pm BST Jul 27,2007 | Reply

  4. As usual a good article, thanks Jonathan for all your postings.
    A question: do you think it would be a nice thing if Oracle would place somewhere (for example in v$sql_plan) informations that you get with 10053 event (for example bind peeking variable values)? Or is it that in already in 11g?
    (excuse me for my English)
    Thanks
    Cristian

    Comment by cristiancudizio — July 30, 2007 @ 1:39 pm BST Jul 30,2007 | Reply

  5. [...] 30th, 2007 · No Comments Oggi rileggendo attentamente l’ultimo post di Jonathan Lewis ho ricordato un mio vecchio problema e cruccio, un problema di “performance [...]

    Pingback by Bind Peeking e lista dei desideri « Oracle and other — July 30, 2007 @ 2:00 pm BST Jul 30,2007 | Reply

  6. Hi,
    very intersting topic – why are execution plans changing.
    for about two weeks i had problems that query rewrite was not used by the optimizer
    using a very horrible Execution plan : http://orcasoracle.squarespace.com/journal/2007/7/23/explain-plan-what-you-see-is-sometimes-not-what-you-get.html

    But I did not check the datatypes which for 10.2 I think is also possible with the v$sql_bind_capture
    View. Good article!!

    A hint to an index would not help in the described issue because a hint may not cause wrong results comparing nchar/char directly.

    Karl Reitschuster

    Comment by Karl — August 1, 2007 @ 11:57 am BST Aug 1,2007 | Reply

  7. Charles,
    My error – I didn’t do a complete analysis of the problem. Type 96 is CHAR() or NCHAR(). My sequence of actions was:
    * sys_op_c2c() => “must be NLS”
    * type 96 isn’t varchar2()
    * let’s test with an NCHAR()
    The (fortunately not critical, in this case) step I missed was “how else could you get a type 96″.

    There are a number of interesting points about the mxl entry in this article on bind variables. Most significantly you will see 32, 128, 2000, or 4000 as the defined length, then the actual length in brackets. Since my example has a 2-byte fixed length character set the mxl=32(22) was telling me that I had an input value of 11 characters (22 bytes).

    Nitin,
    Your warning is, of course, correct. It is often easy to spot the index you need, but find you can’t create it without doing a massive regression test (or code scan) for side-effects. In this case the table definition was wrong – the code was right (it had to use national character set semantics) – and the safety of the temporary fix was easy to check.

    Freek,
    See my reply to Charles Hooper – you’re right it is char() or nchar(). And I agree it looks as if the actual values for the national character sets are not printed even though their byte lengths are reported – and that applies to both nchar() and nvarchar2().

    Cristian,
    In 10g you have v$sql_bind_capture which gives you the actual bind values in many cases. [Update: but see this note which describes my error in making this comment]
    You also have v$sql_optimizer_env which gives you the value of (some of) the parameters that affect the optimizer.

    Comment by Jonathan Lewis — August 2, 2007 @ 3:27 pm BST Aug 2,2007 | Reply

  8. [...] Lewis was on holiday, but still :) on NLS and on [...]

    Pingback by Log Buffer #56: a Carnival of the Vanities for DBAs « H.Tonguç YILMAZ Oracle Blog — August 3, 2007 @ 11:57 am BST Aug 3,2007 | Reply

  9. thanks used this today to good effect

    Comment by Ron — July 21, 2012 @ 4:38 pm BST Jul 21,2012 | Reply

  10. [...] cases the problem may simply be an execution plan change – but perhaps the machine has been hiding an inherent performance problem and on this occasion a small change in data (a few more errors, perhaps) resulted in the excess work [...]

    Pingback by Analysing Statspack 13 « Oracle Scratchpad — January 7, 2013 @ 6:46 pm BST Jan 7,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,266 other followers