From time to time someone publishes a query on the OTN database forum and asks how to make it go faster, and you look at it and think it’s a nice example to explain a couple of principles because it’s short, easy to understand, obvious what sort of things might be wrong, and easy to fix. Then, after you’ve made a couple of suggestions and explained a couple of ideas the provider simply fades into the distance and doesn’t tell you any more about the query, or whether they’ve taken advantage of your advice, or found some other way to address the problem.
Such a query, with its execution plan, appeared a couple of weeks ago:
UPDATE FACETS_CUSTOM.MMR_DTL SET CAPITN_PRCS_IND = 2, FIL_RUN_DT = Current_fil_run_dt, ROW_UPDT_DT = dta_cltn_end_dttm WHERE CAPITN_PRCS_IND = 5 AND HSPC_IND ='Y' AND EXISTS ( SELECT 1 FROM FACETS_STAGE.CRME_FUND_DTL_STG STG_CRME WHERE STG_CRME.MBR_CK = MMR_DTL.MBRSHP_CK AND MMR_DTL.PMT_MSA_STRT_DT BETWEEN STG_CRME.ERN_FROM_DT AND STG_CRME.ERN_THRU_DT AND STG_CRME.FUND_ID IN ('AAB1', '1AA2', '1BA2', 'AAB2', '1AA3', '1BA3', '1B80', '1A80') ) AND EXISTS ( SELECT 1 FROM FACETS_CUSTOM.FCTS_TMS_MBRID_XWLK XWLK WHERE XWLK.MBR_CK = MMR_DTL.MBRSHP_CK AND MMR_DTL.PMT_MSA_STRT_DT BETWEEN XWLK.HSPC_EVNT_EFF_DT AND XWLK.HSPC_EVNT_TERM_DT ) ; ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | UPDATE STATEMENT | | 1 | 148 | 12431 (2)| 00:02:30 | | 1 | UPDATE | MMR_DTL | | | | | | 2 | NESTED LOOPS SEMI | | 1 | 148 | 12431 (2)| 00:02:30 | |* 3 | HASH JOIN RIGHT SEMI | | 49 | 5488 | 12375 (2)| 00:02:29 | | 4 | TABLE ACCESS FULL | FCTS_TMS_MBRID_XWLK | 6494 | 64940 | 24 (0)| 00:00:01 | |* 5 | TABLE ACCESS FULL | MMR_DTL | 304K| 29M| 12347 (2)| 00:02:29 | |* 6 | TABLE ACCESS BY INDEX ROWID| CRME_FUND_DTL_STG | 1 | 36 | 5 (0)| 00:00:01 | |* 7 | INDEX RANGE SCAN | IE1_CRME_FUND_DTL_STG | 8 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("XWLK"."MBR_CK"="MMR_DTL"."MBRSHP_CK") filter("XWLK"."HSPC_EVNT_EFF_DT"<=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT") AND "XWLK"."HSPC_EVNT_TERM_DT">=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT")) 5 - filter("CAPITN_PRCS_IND"=5 AND "HSPC_IND"='Y') 6 - filter(("STG_CRME"."FUND_ID"='1A80' OR "STG_CRME"."FUND_ID"='1AA2' OR "STG_CRME"."FUND_ID"='1AA3' OR "STG_CRME"."FUND_ID"='1B80' OR "STG_CRME"."FUND_ID"='1BA2' OR "STG_CRME"."FUND_ID"='1BA3' OR "STG_CRME"."FUND_ID"='AAB1' OR "STG_CRME"."FUND_ID"='AAB2') AND "STG_CRME"."ERN_FROM_DT"<=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT") AND "STG_CRME"."ERN_THRU_DT">=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT")) 7 - access("STG_CRME"."MBR_CK"="MMR_DTL"."MBRSHP_CK")
The most informative bit of narrative that went with this query said:
“The table MMR_DTL doesnt have index on these columns CAPITN_PRCS_IND , HSPC_IND .. Since this is an update stmt which will update 85k records, In a dilema whether to index these columns or not .. And this table MMR_DTL is an ever growing table. Worried about the update performance. “
This was in response an observation that there was a full tablescan on MMR_DTL at operation 5 despite the predicate “CAPITN_PRCS_IND”=5 AND “HSPC_IND”=’Y’. You’ll note that the predicted cardinality for that scan is 304K and the update statement is going to change CAPITN_PRCS_IND from the value 5 to the value 2 – so it’s not entirely unreasonable to be worried about the impact of creating an index that included the column capitn_prcs_ind.
What more can we say about this query, given the limited information. Quite a lot – unfortunately the owner of the query isn’t giving anything else away.
I’m going to leave this note unfinished to give people a little chance to think about the clues in the request, the questions they might ask, reasons why there might be a performance problem, and strategies they might investigate, then I’ll update the posting with a few ideas some time in the next 24 hours.
Update 1 – 24th May
There are so many ideas that spring up from a small amount of information that it’s very hard to write a concise and coherent description of what you’ve noticed, when and how far you pursued it, and how relevant the ideas might be to the problem in hand – especially when most of the thoughts require you to ask for more information. Unfortunately the time I had available to write this note has just disappeared, so I’m just going to have to complete it in rapidly written installments. The first bit is an outline of the immediate response I had to the initial presentation of the problem and the execution plan that went with it.
The only comment from the OP on this statement and plan was: “I couldn’t optimize this query for better performance and optimized cost.. Can some one guide me on this.”
We have no idea how many rows would be updated, how long it took, or how long the OP thinks it ought to take; it’s not until a subsequent post that we learn that the number of rows targetted for update is 85,000 – which tells us that the optimizer has run into some problems with its cardinality estimates. This suggests that IF there’s a serious performance problem then POSSIBLY there’s a better execution plan and we might get the optimizer to find it automatically if we could tell it how to adjust its cardinality estimates. It would be nice, however to know where the query spent its time (i.e. can we re-run it with rowsource execution stats or monitoring enabled, and see the actual run-time work in the plan).
If it took a couple of minutes to update that 85,000 rows, I probably wouldn’t want to spend time making it go faster; if it took 2 hours, of which 1 hour 50 minutes was spent waiting for a transaction (row) lock then I’d want to look at why the update collision could happen and see if that problem could be avoided – it might then be the case that the last 10 minutes was spent rolling back and restarting an update that ought to have taken 2 minutes “in vacuo”. Underlying all this, I would want to be sure (as I’ve implicitly, and I think reasonably, assumed) that it’s an update that runs only occasionally, perhaps once per day or once per week.
In the absence of hard information – let’s resort to a few hypotheticals; looking at the plan itself (and knowing the target 85,000 rows) I am prepared to make a few guesses about the run-time activity.
- We build an inmemory hash table from the whole of FCTS_TMS_MBRID_XWLK, a step for which the optimizer ought to be able to give a reasonable cost and cardinality – assuming (as I will from now on) that the basic stats are reasonably accurate.
- We scan the (fairly large) MMR_DETAIL table, applying a couple of filters; again the optimizer ought to do a reasonable job of estimating the cost of such a tablescan, and we might expect a significant fraction of the time to be spent on multiblock (possibly direct path) reads of the table. The cardinality reported is 304,000 but we note there are two predicates and both are for columns which we might guess have a small number of distinct values – one of which we are changing. Perhaps there’s a bad cardinality error there and maybe a couple of single column histograms would help, but maybe column group stats with a histogram on the pair would be even better. I also wonder when (if) HSPC_IND ever changes from Y to N, and contemplate the possibility of creating a function-based index that records ONLY the rows that match this predicate pair (see the note on indexing that will appear some time over the next week further down the page). It’s at this point that we might ask whether the number of rows returned by this scan should be very similar to the number of rows updated, or whether the scan identifies far too many rows and the two existence tests do a lot of work to eliminate the excess and, if the latter, which test should we apply first and how should we apply it.
- Having scanned the MMR_DTL we probe the in-memory hash table copy of FCTS_TMS_MBRID_XWLK for the first match, using an equality predicate (which will be the access predicate) and a range-based (filter) predicate which looks as if it is checking that some “start date” is between an “effective date” and a “termination date”. The estimated size of the result set is FAR too small at 49 rows when we know we have to have at least 85,000 rows survive this test; moreover, this tiny estimate comes from inputs of 6,500 and 304,000 rows being joined so we ought to wonder how such a tiny estimate could appear. A possible explanation is that the application has used some extreme dates to represent NULL values. If that’s the case then it’s possible that suitable histograms might help the optimizer recognise the extreme distribution; alternatively virtual columns that change the extreme values back to NULL and a predicate referencing the virtual columns may help.
- After estimating the cardinality of the intermediate result so badly, the optimizer decides that the second existence test can be performed as a semi-join using a nested loop. The thing to note here is that the optimizer “knows” that this is an expensive step – the cost of each table access operation is 5 (4 + 1) – but it’s a step that the optimizer “thinks” shouldn’t happen very frequently so the cost is considered acceptable. We know, however, that this step has to execute at least 85,000 times, so the optimizer’s prediction of visiting 4 blocks in the table to identify (on average) 8 rows and discard (on average) 7 of them looks nasty. Again we note that one of the predicates is range-based on a pair of dates – and in this case we might wonder whether or not most of the rows we discard are outside the date range, and whether we ought to consider (as a general point, and not just for this query) whether or not we should add one, other, or both the ERN_FROM_DT and ERN_THRU_DAT to the IE1_CRME_FUND_DTL_STG index. It’s at this point in the query that we are most ignorant of time spent witht the present data set and how that time will change in the future as the data in the MMR_DTL table grows – on one hand it’s possible that the rows for each MMR_DTL are widely scattered across the CRME_FUND_DTL_STG and this step could do a lot of random I/O, on the other hand the (assumed) time-dependent nature of the data may mean that the only MMR_DTL rows we look at are recently entered and the associated CRME_FUND_DTL_STG rows are therefore also recently entered and closely clustered – leading to a beneficial “self-caching” effect at the “high” end of the table as the query runs, which introduces an efficiency that the optimizer won’t notice. There is one numerical highlight in this join – we have a cost of 5 for each probe and 49 rows to test, so we might expect the incremental cost of the query to be around 250 (i.e. 49 * 5), but the difference between operations 3 and 2 is only 56 – suggesting that the optimizer does have some “self-caching” information, possibly based on there being a significant difference between the two tables for the number of distinct values of the join column. (See, for example: http://oracle-randolf.blogspot.co.uk/2012/05/nested-loop-join-costing.html )
Update 2 – 25th May
Scalability is a question that should always be considered – and there’s a scalability threat in the information we have so far. The plan shows a full tablescan of the MMR_DTL table, and while tablescans are not necessarily a bad thing we’ve been told that: “this table MMR_DTL is an ever growing table“. It’s possible that Oracle can be very quick and efficient when doing the existence tests on the rows it selects from the table – but it is inevitable that the tablescan will take longer to complete as time passes. Whether or not this is likely to matter is something we can’t decide from the information given: we don’t know how much of the time is the tablescan, we don’t know what fraction of the total time is due to the tablescan, and we don’t know how much larger the table will grow each day (although, taking a wild guess, if we assume that the query runs once per day to manipulate recently arrived data the table would appear to be growing fairly rapidly.)
Another scalability detail we ought to ask about is the volume of data that we expect to update each time we run this statement. As time passes do we expect to see the same number of rows waiting to be updated, or are we expecting the business (whatever that may be) to grow steadily each month with an increase of a few percent in the number of rows to be updated on each execution. Our coding strategy may vary depending on the answer to that question – we might, for example, try to pre-empt a future problem by introducing some partitioning now.
The final scalablility issue is one I’ve raised already and comes from the CRME_FUND_DTL_STG table. According to the plan there about 8 rows (the number of rowids reporteed by the index range scan in operation 7) in this table for each distinct value of MMR_DTL.MBRSHP_CK; if MMR_DTL is large and growing, is CRME_FUND_DTL_STG very large and growing at 8 times the speed, or worse – as time passes will there be more rows for each distinct value of MMR_DTL.MBRSHP_CK. Answers to these questions will help us decide whether we should use a hash join or a nested loop in the join to this table, and how to index the table to minimise random I/O.
Update 3 – 1st June
Since we know that the optimizer has come up with some very bad estimates, and given that the statement is short and simple, I’d be happy to pass the statement to the Tuning Advisor to see what suggestions it made. At the least it ought to come up with a suggestion for an SQL profile (i.e a set of opt_estimate() hints) to address the extremely poor cardinality estimates; it’s also likely to make some suggestions about potentially helpful indexes. I can think of three basic warnings, though:
- Make sure you are licensed to use the advisor
- Don’t run the advisor right after you’ve just done the big update – wait until the next big batch of data is ready for update
- Be cautious about the indexes – the indexing suggestions may point you at the problem, but you may do better by using function-based indexes and modified SQL
The most obvious “simple” index to (assuming a lot of time goes on the table scan) is one of:
create index dtl_cpi on mmr_dtl(capitn_prcs_ind) compress; create index dtl_cpi on mmr_dtl(capitn_prcs_ind, hspc_ind) compress;
If almost all the rows with capitn_prcs_ind = 5 also had hspc_ind = ‘Y’ then I’d probably choose the former, if a large number of rows (actually index rowids) could be eliminated before visiting the table I’d choose the latter. I would also make sure I had a frequency – or Top-N frequency in 12c – histogram on capitn_prcs_ind, and may even “fake” it to ensure that an automatic call to gather table stats didn’t do something that made the histogram a threat instead of a benefit. If I created the two-column index I would also create a frequency histogram on hspc_ind. The benefit of the histograms is based on my assumption that both columns have a very small number of values and a highly skewed distribution. I might go one step further and create a column group, but keeping an eye open for an “out of range” anomaly, on the pair of columns if there was a strong degree of correlation between the two columns (in particular if there were a relatively small number of ‘Y’, but most of the ‘Y’ was also capitn_prcs_ind).
A simple index, though, might get used for other statements where it was not appropriate; there’s also the extra overhead of maintence as the capitn_prcs_ind value changes from 5 to 2 – to update the index we delete the old entry and insert the new entry. Just to make matters a little worse, when an index has a large number of rows for the same key value it’s fairly common to find that its space utilisation averages about 50% per leaf block. On top of everything else, if 2 is a “final state” value for most of the rows in this very large table then a very large fraction of the index we’re building is probably a total waste of space. So we might look at maximising efficiency and minimising space (and undo and redo wastage) with a function-based index. Again one or two columns as appropriate.
create index dtl_cpi_1 on mmr_dtl(case when capitn_prcs_ind = 5 then 0 end) compress; create index dtl_cpi_2 on mmr_dtl(case when capitn_prcs_ind = 5 and hspc_ind = 'Y' then 0 end) compress; execute dbms_stats.gather_table_stats(user, 'mmr_dtl',method_opt=>'for all hidden columns size 1')
Again my choice of index might be affected by the extra benefit of having two columns to eliminate the maximum number of rows from the table, but since the index would be very small either way I’d probably go for the two-column index. Note that once you’ve created a function-based index you have to create stats on the underlying hidden column, which I’ve done a little lazily in this case by simple gathering “for all hidden columns”. I don’t need a histogram since there’s only a single value in the index. Apart from the size benefit, the other efficiency benefit is that when I update the table I only have to worry about deleting a row from the index, I don’t have to insert a replacement row – getting rid of the third undo record and redo vector is likely to save about 30% on the index maintenance background costs.
I now have to change the original SQL to match the index definition, of course, so my driving where clause would become one of:
WHERE (case when CAPITN_PRCS_IND = 5 then 0 end) = 0 AND HSPC_IND ='Y' AND EXISTS (... WHERE (case when CAPITN_PRCS_IND = 5 AND HSPC_IND ='Y' then 0 end ) = 0 AND EXISTS (...
Update 4 – 2nd June 2015
At this point we really do need better information to make sensible decisions about what to do next; however, as I indicated earlier on, the hash semi join to the small table FCTS_TMS_MBRID_XWLK does look like a sensible choice: it’s small so it’s a good target for a build table and small tables joined to big tables tend to be things that are used to eliminate rapidly (at least, in the case of existence tests).
The final big question is what to do about the table CRME_FUND_DTL_STG which looks as if it might be quite big (several rows for each MMR_DTL row) and therefore be contributing a lot of random I/O. We don’t really have many options here – and we’ve seen what they are in a previous post about “NOT EXISTS” subqueries. We could try to make the nested loop semi-join more efficient by adding columns to the index so that we can do more filtering in the index; we could consider forcing Oracle to stick with a filter subquery and see if we can introduce some indexing that could make the subquery run a very small number of times, or we could consider the impact of switching to a Hash semi-join (possibly switch the build and probe data sets to make it a “hash join right semi”).
A full tablescan on the CRME_FUND_DTL_STG to do the hash join might be rather expensive, though, so we might look at the significance of the predicate on STG_CRME.FUND_ID which lists 8 different literal values. Perhaps there’s some scope for creating an index on that column if those fund_ids represent a small fraction of the total data; maybe even consider list-partitioning the table (not necessarily one fund per partition) on the FUND_ID.
The only other significant “pre-processing” predicate on the CRME_FUND_DTL_STG table is the one that requires the MMR_DTL.PMT_MSA_STRT_DT to be between STG_CRME.ERN_FROM_DT and the STG_CRME.ERN_THRU_DT; so we might look at the how many rows in the table could be eliminated by finding the maximum and minimum values for MMR_DTL.PMT_MSA_STRT_DT and eliminating any rows where STG_CRME.ERN_FROM_DT was greater than the maximum PMT_MSA_START_DT or the STG_CRME.ERN_THRU_DT was less than the minimum PMT_MSA_START_DT.
We might play around with CTEs (subquery factoring / common table expressions) to make this happen – I’m not going to work the exact SQL, but it might start something like:
with starting_view as ( select from mmr_detail where ... and exists ( select from FACETS_CUSTOM.FCTS_TMS_MBRID_XWLK XWLK where ... ) ), minmax_view as ( select /*+ materialize */ min(PMT_MSA_STRT_DT), max(PMT_MSA_STRT_DT) max_date from starting_view ), crme_fund_view as ( select from minmax_view, CRME_FUND_DTL_STG where ... ) select from starting_view sv, crme_fund_view cv where ...
An alternative strategy to get the minimum and maximum dates more efficiently might be to include them in the function-based index, and then include TWO inline views, or maybe scalar subqueries in the where clause, that depend on the min/max range scan to find the values that can be used to eliminate as much data from CRME_FUND_DTL_STG as early as possible.
I think it took me about 10 minutes looking at the original posting to come up with a few thoughts that might be appropriate – but it’s taken five sessions spread over nearly two week to write those ideas down. The same difference can appear between theoretical strategy and final action – everything I’ve said revolves around the two simple ideas of “how much data” (absolute and relative) and “how hard to we work to collect it” so it’s easy to come up with ideas; but it may take time to learn what the data looks like, and finding a SAFE way of making it possible to collect it efficiently.