Oracle Scratchpad

May 6, 2009

Philosophy – 1

Filed under: Infrastructure,Philosophy,Statistics,Troubleshooting — Jonathan Lewis @ 6:15 pm BST May 6,2009

There are some ideas about Oracle that are so fundamental that they should be lodged in your brain and floating behind your eyes whenever you want to investigate SQL performance problems. Here’s one of them:

Histograms and bind variables exist for diametrically opposed reasons – they won’t work well together without help.

You use bind variables because you want everyone to share the same child cursor for an SQL statement because that statement is going to be used very frequently, everyone is going to do the same (small) amount of work with it, the same execution plan should be ideal for everyone, and you don’t want to re-optimise it every time you use it because the overheads for optimisation would probably be greater than the resources need to run the statement. Typically we want to make great (but not exclusive) use of bind variables in OLTP systems.

You create histograms because statements that are virtually identical do enormously different amounts of work, need different execution plans, and the work done in optimising is tiny compared to the work done by the statements, and getting the wrong execution plan would lead to a huge waste of resources. Typically you need to generate histograms in data warehouse or decision support systems where the queries can be brutal and expensive.

Spot the contradiction – one technology is supposed to give you one plan that everyone uses; the other technology is supposed to give each individual the plan that’s best for them.

Remember that and you will remember to be cautious about sticking histograms into OLTP systems, and won’t be tempted to turn absolutely every literal into a bind variable.

Footnote: It seems that this article should have been called “Philosophy – 2″ because I’ve just rediscovered a short note called “Philosophy”

[The Philosophy Series]

18 Comments »

  1. AMEN.

    Comment by Christian Antognini — May 6, 2009 @ 9:13 pm BST May 6,2009 | Reply

  2. yeah, what he said.

    Comment by Bradd Piontek — May 6, 2009 @ 9:34 pm BST May 6,2009 | Reply

  3. I am always amazed how with just a couple of sentences you can make a non-trivial topic like this perfectly clear. The “1” in the title lets me hope there will be more of these…

    Comment by Todor Botev — May 7, 2009 @ 7:28 am BST May 7,2009 | Reply

  4. >> Remember that and you will remember to be cautious about sticking histograms into OLTP systems

    Which makes it all the more disappointing that 10g’s default DBMS_STATS options seem so keen to do so!

    Comment by Doug Burns — May 7, 2009 @ 7:43 am BST May 7,2009 | Reply

  5. This is a perfect example of how a good writing style can make a difficult and confused subject such as the one you exposed in this post, attractive and accessible for normal readers.

    In order to share knowledge, you do not only need to master it, you need also to make it simple via a good writing style.

    This is why I like reading your books and that of Cary Millsap.

    Comment by Houri Mohamed — May 7, 2009 @ 8:02 am BST May 7,2009 | Reply

  6. And there is a distinct lack of intelligence in 10g’s ‘FOR ALL COLUMNS SIZE AUTO’ being the default !

    There are some applications (PeOpLeSoFt !) which mixes SQLs using Literals with SQLs using Bind Variables. The application runs Batch Jobs (running for 10s of minutes to a few hours) and OLTP queries (running for seconds or fractions thereof) on the *same* tables.

    And you suffer when you have significant skew in your data and you know you need, want, Histograms. The right Histograms being used only in the right queries.
    Not have one program use Literals and another use Bind Variables.

    As for your last line :
    “…. you will remember to be cautious about sticking histograms into OLTP systems, and won’t be tempted to turn absolutely every literal into a bind variable.” again you are talking of using both technologies concurrently !

    Your last sentence says :
    Selectively use Histograms.
    Selectively use Bind Variables.

    I know. I know. I seem to be rambling.

    Comment by Hemant K Chitale — May 7, 2009 @ 9:46 am BST May 7,2009 | Reply

    • Hemant:


      Your last sentence says :
      Selectively use Histograms.
      Selectively use Bind Variables.

      No, it doesn’t, it says:

      “Remember that [i.e. the contradiction] and you will remember to be cautious about sticking histograms into OLTP systems, and won’t be tempted to turn absolutely every literal into a bind variable.”

      It is attempting to help people understand the fundamental issue, NOT memorise a sound-bite.

      Comment by Jonathan Lewis — May 8, 2009 @ 9:42 pm BST May 8,2009 | Reply

  7. There is no fundamental difference between the query with bind variables and the one with literals. A Bind Variable is nothing more than a Typed Literal. One of the reasons for introduction of bind variables was performance, but computers changed a lot for the last couple decades! Bind variables peeking is supposed to provide a nice compromise:
    https://www.se.auckland.ac.nz/conferences/VLDB2008resources/presentations/papers/I14.ppt

    Comment by Vadim Tropashko — May 7, 2009 @ 8:30 pm BST May 7,2009 | Reply

    • Vadim,
      There is no fundamental difference in the process of optimisation, but there may be differences in the wisdom of reusing the execution plan the next time the same query re-appears.

      The presentation you references was a nice little item: it reminded me of this comment from page three of chapter 1 of Cost Based Oracle – Fundamentals:

      “Moreover, both 9i and 10g collect run-time statistics in the views v$sql_plan_statistics and v$sql_plan_statistics_all which could, in theory, be fed back to the optimizer to give it a second chance at optimizing a query if the actual statistics differ too much from the assumptions made by the optimizer.”

      In fact, anyone who was at a meeting of the Boston OUG in October 2004 may remember a presentation I did where I started by describing and (apparently) demonstrating this behaviour in a version of Oracle that I claimed was an Alpha release of 10.0 (but was actually 9i playing dirty tricks).

      Comment by Jonathan Lewis — May 8, 2009 @ 9:36 pm BST May 8,2009 | Reply

  8. “be cautious about sticking histograms into OLTP systems”

    Could not understand that Jonatahan – can you kindly elaborate?

    Histograms are useful for skewed data and would be useful when you consiously decide to use literal instead of bind variable due to the data skew, right? Is there a drawback for default histograms – it would just say there is no skew where there is no skew.

    What do I miss?

    Thanks,
    Naresh

    Comment by Naresh Bhandare — May 8, 2009 @ 4:10 am BST May 8,2009 | Reply

  9. Hi Jonathan, what’s your philosophy on designing a decision support system/data warehouse that makes heavy use of histograms but also need to guard against SQL injection as a very high priority? Without using bind variables, is it still possible to make your database bullet proof?

    Comment by peter — May 8, 2009 @ 5:50 am BST May 8,2009 | Reply

  10. One thing that I’m constantly reminding client developers is to not use bind variables for partition key values. Greg Rahn pointed out the problem to me on oracle-l over a year ago, and it’s something I see a lot now. Not only does it generate the KEY:KEY plan, but in Oracle < 10.2.0.3, there’s a bug that results in an explosion of cursors that could choke a system.

    Comment by Don Seiler — May 16, 2009 @ 8:28 pm BST May 16,2009 | Reply

  11. Hi Jonathan,

    What do you suggest in a hybrid system?
    I mean there are instances where a huge RAC database serves both OLTP as well as reporting needs.
    The source tables remain the same but the nature of queries will be different.

    Regards,
    Ashish

    Comment by Ashish — May 18, 2009 @ 4:34 am BST May 18,2009 | Reply

    • I think you’d use bind variables for OLTP queries of course, and rely on dynamic sampling for the reporting queries (which is very often more flexible and accurate than histograms anyway, particularly when data changes quickly or you have multicolumn predicates).

      Comment by David Aldridge — June 2, 2009 @ 10:11 am BST Jun 2,2009 | Reply

  12. Seems like the start of an Excellent Series Jonathan.

    Simple rules that can be explained, understood and applied by Everyone.
    Fundamentals, by any other name…

    Looking forward to the Next Philosophy.

    Regards,
    PdV
    (off looking for a printer and a frame somewhere)

    Comment by PdV — May 18, 2009 @ 5:53 am BST May 18,2009 | Reply

  13. Ha! This is the best (simplest) explantion of an extremely complicated problem that I’ve seen. I’ve tried to explain it numerous times with less than satisfactory results, and I think I have a decent command of the English language (even if it’s not proper English). Well done!

    Kerry

    Comment by Kerry Osborne — May 19, 2009 @ 2:54 am BST May 19,2009 | Reply

  14. This is why bind variable peeking is, in my opinion, one of the worst features Oracle has ever added into the rdbms. It attempts to make a compromise on two things that are diametrically opposed to one another. I’ve spent more time in the last few years dealing with system stability / performance problems because of bind variable peeking and cursor_sharing = force than I care to think about.
    The first time I discovered bind peeking was causing plans to flip seemingly at random it was cool; now it’s just annoying.

    Comment by John Darrah — May 21, 2009 @ 3:07 am BST May 21,2009 | Reply

    • Binds peeking is alright, if you don’t have histograms then the only difference it makes is mostly in range predicates and in a little more scenarios. And you want some intervention in choosing execution plan for ranges because otherwise you’ll end with very bad performance. I agree with you that binds peeking may sometimes cause troubles and I think that this feature should be improved as soon as possible into some “smarter” peeking that will peek at the binds at each execution and if it sees that many times the “important” values for peeking are different it’ll reparse the query…

      Comment by Alon Principal — July 30, 2009 @ 9:03 pm BST Jul 30,2009 | Reply


RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,909 other followers