Oracle Scratchpad

July 30, 2010

Scalability Conflict

Filed under: Indexing,Infrastructure,Performance,Troubleshooting — Jonathan Lewis @ 5:51 pm BST Jul 30,2010

Here’s an example of how you have to think about conflicts of interest when dealing with problems of scalability. It starts with a request (that I won’t give in detail, and was a little different from the shape I describe below)  from a client for advice on how to make a query go faster.

Basic problem: the query runs about 20 times per second, returning a very small number of rows; it’s basically a very simple “union all” of three query blocks that access the same table in slightly different ways.

Step one: check the query text, check the index definitions, run the query on a recent backup of production with plan statistics enabled.  It turns out that a very large fraction of the work goes into one of the three parts of the query – the results from the run-time execution plan show this, and the index definitions show that it’s inevitable.

Step two: build a function-based index that gives the one critical part of the query a very precise access path to the required data – and modify the query to use this index. The query runs about 20 times faster, doing about one sixtieth of the buffer visits. The index is a safe change because it isn’t particularly large (doesn’t introduce a large maintenance overhead) and is defined in a way that means it (almost certainly) won’t get hijacked by other queries and cause surprises.

But the client wants more – why execute three query blocks when you might be able to run just one query block.

Step 3: work out that it would be possible to create a function-based index engineered in such a way that it would identify the entire result set with extreme precision, be the smallest possible index, and be totally unusable for any other task (and therefore safe to add). The index definition would be a little complex (or, as I put it at the time, totally incomprehensible to the next person to read the code) and the select list would need a messy rewrite.

But a simple solution that’s fast enough may be a lot better than a complex solution that seems to be the fastest possible.

Step 4: Explain the generic threats in the “fastest” solution (I’ve put “fastest” in quotation marks because it’s only the fastest approach we can get in our tests on the backup database) and ask a few questions about the processing that goes on behind this solution to see if any of the generic threats are likely to appear as specific threats in production; discover that the “fastest” solution when the data isn’t changing won’t be the fastest solution in the production system.

In the production system, data will be inserted and deleted at a rate of about 100 rows per second by several concurrent processes. The simple solution (step 2) will visit three spots in three separate, well-buffered indexes which are each being updated by some (but not all) of the concurrent processes; the complex solution (step 3) would visit one spot in one very small index while it is constantly updated by every single concurrent process.

The processes inserting and deleting the data will slow down because they will constantly be waiting on “buffer busy wait” on the very small number of index blocks. The process running the query will do a lot of work reading undo blocks to create read consistent copies of the blocks it needs to see.

This is a classic example demonstrating the fact that when you try to minimise the work done by a query by concentrating the relevant data into a small area of the database you are simultaneously focusing and increasing the contention between processes that create, modify and destroy that data.

2 Comments »

  1. This would be a fun one to put through the paces in DB Optimizer’s load editor which is good for easily creating multiple concurrent connections running code in loops. Been meaning to demo code in DB Optimizer that runs well and fast with one user, like for the developer, but hits the wall in production.
    Nice example as always.

    Best
    Kyle

    Comment by Kyle Hailey — July 31, 2010 @ 3:08 am BST Jul 31,2010 | Reply

  2. I must admit I often cringe when I hear the word scalability, as people often talk about that from the point of view of gross volume of data in the database. Number of transactions, volume of data in those transactions rarely get a look in, and concurrency issues between active transactions seems to sit at the bottom. Thanks for a clear description of a real scalability issue.

    Comment by Gary — July 31, 2010 @ 5:12 am BST Jul 31,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,173 other followers