Here’s an example of how you have to think about conflicts of interest when dealing with problems of scalability. It starts with a request (that I won’t give in detail, and was a little different from the shape I describe below) from a client for advice on how to make a query go faster.
Basic problem: the query runs about 20 times per second, returning a very small number of rows; it’s basically a very simple “union all” of three query blocks that access the same table in slightly different ways.
Step one: check the query text, check the index definitions, run the query on a recent backup of production with plan statistics enabled. It turns out that a very large fraction of the work goes into one of the three parts of the query – the results from the run-time execution plan show this, and the index definitions show that it’s inevitable.
Step two: build a function-based index that gives the one critical part of the query a very precise access path to the required data – and modify the query to use this index. The query runs about 20 times faster, doing about one sixtieth of the buffer visits. The index is a safe change because it isn’t particularly large (doesn’t introduce a large maintenance overhead) and is defined in a way that means it (almost certainly) won’t get hijacked by other queries and cause surprises.
But the client wants more – why execute three query blocks when you might be able to run just one query block.
Step 3: work out that it would be possible to create a function-based index engineered in such a way that it would identify the entire result set with extreme precision, be the smallest possible index, and be totally unusable for any other task (and therefore safe to add). The index definition would be a little complex (or, as I put it at the time, totally incomprehensible to the next person to read the code) and the select list would need a messy rewrite.
But a simple solution that’s fast enough may be a lot better than a complex solution that seems to be the fastest possible.
Step 4: Explain the generic threats in the “fastest” solution (I’ve put “fastest” in quotation marks because it’s only the fastest approach we can get in our tests on the backup database) and ask a few questions about the processing that goes on behind this solution to see if any of the generic threats are likely to appear as specific threats in production; discover that the “fastest” solution when the data isn’t changing won’t be the fastest solution in the production system.
In the production system, data will be inserted and deleted at a rate of about 100 rows per second by several concurrent processes. The simple solution (step 2) will visit three spots in three separate, well-buffered indexes which are each being updated by some (but not all) of the concurrent processes; the complex solution (step 3) would visit one spot in one very small index while it is constantly updated by every single concurrent process.
The processes inserting and deleting the data will slow down because they will constantly be waiting on “buffer busy wait” on the very small number of index blocks. The process running the query will do a lot of work reading undo blocks to create read consistent copies of the blocks it needs to see.
This is a classic example demonstrating the fact that when you try to minimise the work done by a query by concentrating the relevant data into a small area of the database you are simultaneously focusing and increasing the contention between processes that create, modify and destroy that data.