Oracle Scratchpad

May 21, 2015

Understanding SQL

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 6:12 pm BST May 21,2015

From time to time someone publishes a query on the OTN database forum and asks how to make it go faster, and you look at it and think it’s a nice example to explain a couple of principles because it’s short, easy to understand, obvious what sort of things might be wrong, and easy to fix. Then, after you’ve made a couple of suggestions and explained a couple of ideas the provider simply fades into the distance and doesn’t tell you any more about the query, or whether they’ve taken advantage of your advice, or found some other way to address the problem.

Such a query, with its execution plan, appeared a couple of weeks ago:

UPDATE FACETS_CUSTOM.MMR_DTL
SET
	CAPITN_PRCS_IND = 2,
	FIL_RUN_DT = Current_fil_run_dt,
	ROW_UPDT_DT = dta_cltn_end_dttm
WHERE
	CAPITN_PRCS_IND = 5
AND	HSPC_IND ='Y'
AND	EXISTS (
		SELECT	1
		FROM	FACETS_STAGE.CRME_FUND_DTL_STG STG_CRME
		WHERE	STG_CRME.MBR_CK = MMR_DTL.MBRSHP_CK
		AND	MMR_DTL.PMT_MSA_STRT_DT BETWEEN STG_CRME.ERN_FROM_DT AND STG_CRME.ERN_THRU_DT
		AND	STG_CRME.FUND_ID IN ('AAB1', '1AA2', '1BA2', 'AAB2', '1AA3', '1BA3', '1B80', '1A80')
	)
AND	EXISTS (
		SELECT	1
		FROM	FACETS_CUSTOM.FCTS_TMS_MBRID_XWLK XWLK
		WHERE	XWLK.MBR_CK = MMR_DTL.MBRSHP_CK
		AND	MMR_DTL.PMT_MSA_STRT_DT BETWEEN XWLK.HSPC_EVNT_EFF_DT AND XWLK.HSPC_EVNT_TERM_DT
	)
;

-------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT              |                       |     1 |   148 | 12431   (2)| 00:02:30 |
|   1 |  UPDATE                       | MMR_DTL               |       |       |            |          |
|   2 |   NESTED LOOPS SEMI           |                       |     1 |   148 | 12431   (2)| 00:02:30 |
|*  3 |    HASH JOIN RIGHT SEMI       |                       |    49 |  5488 | 12375   (2)| 00:02:29 |
|   4 |     TABLE ACCESS FULL         | FCTS_TMS_MBRID_XWLK   |  6494 | 64940 |    24   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL         | MMR_DTL               |   304K|    29M| 12347   (2)| 00:02:29 |
|*  6 |    TABLE ACCESS BY INDEX ROWID| CRME_FUND_DTL_STG     |     1 |    36 |     5   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN          | IE1_CRME_FUND_DTL_STG |     8 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("XWLK"."MBR_CK"="MMR_DTL"."MBRSHP_CK")
       filter("XWLK"."HSPC_EVNT_EFF_DT"<=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT") AND
              "XWLK"."HSPC_EVNT_TERM_DT">=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT"))
   5 - filter("CAPITN_PRCS_IND"=5 AND "HSPC_IND"='Y')
   6 - filter(("STG_CRME"."FUND_ID"='1A80' OR "STG_CRME"."FUND_ID"='1AA2' OR
              "STG_CRME"."FUND_ID"='1AA3' OR "STG_CRME"."FUND_ID"='1B80' OR "STG_CRME"."FUND_ID"='1BA2' OR
              "STG_CRME"."FUND_ID"='1BA3' OR "STG_CRME"."FUND_ID"='AAB1' OR "STG_CRME"."FUND_ID"='AAB2') AND
              "STG_CRME"."ERN_FROM_DT"<=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT") AND
              "STG_CRME"."ERN_THRU_DT">=INTERNAL_FUNCTION("MMR_DTL"."PMT_MSA_STRT_DT"))
   7 - access("STG_CRME"."MBR_CK"="MMR_DTL"."MBRSHP_CK")

The most informative bit of narrative that went with this query said:

“The table MMR_DTL doesnt have index on these columns CAPITN_PRCS_IND , HSPC_IND .. Since this is an update stmt which will update 85k records, In a dilema whether to index these columns or not .. And this table MMR_DTL is an ever growing table. Worried about the update performance. “

This was in response an observation that there was a full tablescan on MMR_DTL at operation 5 despite the predicate “CAPITN_PRCS_IND”=5 AND “HSPC_IND”=’Y’. You’ll note that the predicted cardinality for that scan is 304K and the update statement is going to change CAPITN_PRCS_IND from the value 5 to the value 2 – so it’s not entirely unreasonable to be worried about the impact of creating an index that included the column capitn_prcs_ind.

What more can we say about this query, given the limited information. Lots – unfortunately the owner of the query isn’t giving anything else away.

I’m going to leave this note unfinished to give people a little chance to think about the clues in the request, the questions they might ask, reasons why there might be a performance problem, and strategies they might investigate, then I’ll update the posting with a few ideas some time in the next 24 hours.

Update 1 – 24th May

There are so many ideas that spring up from a small amount of information that it’s very hard to write a concise and coherent description of what you’ve noticed, when and how far you pursued it, and how relevant the ideas might be to the problem in hand – especially when most of the thoughts require you to ask for more information. Something has come up that has taken up the free time I had to write this note, so I’m just going to have to complete it in rapidly written installments. The first bit is an outline of the immediate response I had to the initial presentation of the problem and the execution plan that went with it.

The only comment from the OP on this statement and plan was: “I couldnt optimize this query for better performance and optimized cost.. Can some one guide me on this.”

We have no idea how many rows would be updated, how long it took, or how long the OP thinks it ought to take; it’s not until a subsequent post that we learn that the number of rows targetted for update is 85,000 – which tells us that the optimizer has run into some problems with its cardinality estimates. This suggests that IF there’s a serious performance problem then POSSIBLY there’s a better execution plan and we might get the optimizer to find it automatically if we could tell it how to adjust its cardinality estimates. It would be nice, however to know where the query spent it’s time (i.e. can we re-run it with rowsource execution stats or monitoring enabled, and see the actual run-time work in the plan).

If it took a couple of minutes to update that 85,000 rows, I probably wouldn’t want to spend time making it go faster; if it took 2 hours, of which 1 hour 50 minutes was spent waiting for a transaction (row) lock then I’d want to look at why the update collision could happen and see if that problem could be avoided – it might then be the case that the last 10 minutes was spent rolling back and restarting an update that ought to have taken 2 minutes “in vacuo”. Underlying all this, I would want to be sure (as I’ve implicitly, and I think resonably, assumed) that it’s an update that runs only occasionally, perhaps once per day or once per week.

In the absence of hard information – let’s resort to a few hypotheticals; looking at the plan itself (and knowing the target 85,000 rows) I am prepared to make a few guesses about the run-time activity.

  1. We build an inmemory hash table from the whole of FCTS_TMS_MBRID_XWLK, a step for which the optimizer ought to be able to give a reasonable cost and cardinality – assuming (as I will from now on) that the basic stats are reasonably accurate.
  2. We scan the (fairly large) MMR_DETAIL table, applying a couple of filters; again the optimizer ought to do a reasonable job of estimating the cost of such a table, and we might expect a significant fraction of the time to be spent on multiblock (possibly direct path) reads of the table. The cardinality reported is 304,000 but we note there are two predcicates and both are for columns which we might guess have a small number of distinct values – one of which we are changing. Perhaps there’s a bad cardinality error there and maybe a couple of single column histograms would help, but maybe column group stats with a histogram on the pair would be even better. I also wonder when (if) HSPC_IND ever changes from Y to N, and contemplate the possibility of creating a function-based index that records ONLY the rows that match this predicate pair (see the note on indexing that will appear some time over the next week further down the page). It’s at this point that we might ask whether the number of rows returned by this scan should be very similar to the number of rows updated, or whether the scan identifies far too many rows and the two existence tests do a lot of work to eliminate the excess and, if the latter, which test should we apply first and how should we apply it.
  3. Having scanned the MMR_DTL we probe the in-memory hash table copy of FCTS_TMS_MBRID_XWLK for the first match, using an equality predicate (which will be the access predicate) and a range-based (filter) predicate which looks as if it is checking that some “start date” is between an “effective date” and a “termination date”. The estimated size of the result set is FAR too small at 49 rows when we know we have to have at least 85,000 rows survive this test; moreover, this tiny estimate comes from inputs of 6,500 and 304,000 rows being joined so we ought to wonder how such a tiny estimate could appear. A possible explanation is that the application has used some extreme dates to represent NULL values. If that’s the case then it’s possible that suitable histograms might help the optimizer recognise the extreme distribution; alternatively virtual columns that change the extreme values back to NULL and a predicate referencing the virtual columns may help.
  4. After estimating the cardinality of the intermediate result so badly, the optimizer decides that the second existence test can be performed as a semi-join using a nested loop. The thing to note here is that the optimizer “knows” that this is an expensive step – the cost of each table access operation is 5 (4 + 1) – but it’s a step that shouldn’t happen very frequently so the cost is considered acceptable. We know, however, that this step has to execute at least 85,000 times, so the optimizer’s prediction of visiting 4 blocks in the table to identify (on average) 8 rows and discard (on average) 7 of them looks nasty. Again we note that one of the predicates is range-based on a pair of dates – and in this case we might wonder whether or not most of the rows we discard are outside the date range, and whether we ought to consider (as a general point, and not just for this query) whether or not we should add one, other, or both the ERN_FROM_DT and ERN_THRU_DAT to the IE1_CRME_FUND_DTL_STG index. It’s at this point in the query that we are most ignorant of time spent at present (and in the future as the MMR_DTL table grows) – on one hand it’s possible that the rows for each MMR_DTL are widely scattered across the CRME_FUND_DTL_STG and this step could do a lot of random I/O, on the other hand the (assumed) time-dependent nature of the data may mean that the only MMR_DTL rows we look at are recently entered and the associated CRME_FUND_DTL_STG rows are therefore also recently entered and closely clustered – leading to a beneficial “self-caching” effect at the “high” end of the table as the query runs, which introduces an efficiency that the optimizer won’t notice. There is one numerical highlight in this join – we have a cost of 5 for each probe and 49 rows to test, so we might expect the incremental cost of the query to be around 250, but the difference between operations 3 and 2 is only 56 – suggesting that the optimizer does have some “self-caching” information, possibly based on there being a significant difference between the two tables for the number of distinct values of the join column. (See, for example: http://oracle-randolf.blogspot.co.uk/2012/05/nested-loop-join-costing.html )

Update 2 – 25th May

Scalability is a question that should always be considered – and there’s a scalability threat in the information we have so far. The plan shows a full tablescan of the MMR_DTL table, and while tablescans are not necessarily a bad thing we’ve been told that: “this table MMR_DTL is an ever growing table“. It’s possible that Oracle can be very quick and efficient when doing the existence tests on the rows it selects from the table – but it is inevitable that the tablescan will take longer to complete as time passes; whether or not this is likely to matter is something we can’t decide from the information given: we don’t know how much of the time is the tablescan, we don’t know what fraction of the total time is due to the tablescan, and we don’t know  how much larger the table will grow each day.

Another scalability detail we ought to ask about is the volume of data that we expect to update each time we run this statement. A time passes do we expect to see the same number of rows waiting to be updated, or are we expecting the business (whatever that may be) to grow steadily each month with an increase of a few percent in the number of rows to be updated on each execution. Our coding strategy may vary depending on the answer to that question – we might, for example, try to pre-empt a future problem by introducing some partitioning now.

The final scalablility issue is one I’ve raised already and comes from the CRME_FUND_DTL_STG table. According to the plan there about 8 rows in this table for each distinct value of MMR_DTL.MBRSHP_CK; if MMR_DTL is large and growing, is CRME_FUND_DTL_STG very large and growing even faster – as time passes will there be more rows for each distinct value of MMR_DTL.MBRSHP_CK.  Answers to these questions will help us decide whether we should use a hash join or a nested loop in the join to this table, and how to index the table to minimise random I/O.

 

To be continued …

 

May 11, 2015

Parallel Execution

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 10:16 am BST May 11,2015

This is another little reference list I should have created some time ago. It covers a series of posts on interpreting parallel execution plans and understanding where the work happens.

I may add further links to this page in the future relating to other aspects of parallel execution.

 

April 13, 2015

Not Exists

Filed under: CBO,Execution plans,Oracle,Performance — Jonathan Lewis @ 12:51 pm BST Apr 13,2015

The following requirement appeared recently on OTN:


=========================================================================================================
I have a following query and want to get rid of the "NOT EXISTS' clause without changing the end results.

SELECT   A.c,
         A.d,
         A.e,
         A.f
  FROM   A
WHERE   NOT EXISTS (SELECT   1
                       FROM   B
                      WHERE   B.c = A.c AND B.d = A.d AND B.e = A.e);
===========================================================================================================

Inevitably this wasn’t the problem query, and almost inevitably the OP was asking us how to implement a solution which wasn’t appropriate for a problem that shouldn’t have existed. Despite this it’s worth spending a little time to take the request at its face value and look at the sort of thing that could be going on.

First, of course, you cannot get rid of the “not exists” clause, although you may be able to make it look different. If you want “all the rows in A that are not referenced in B” then you HAVE to examine all the rows in A, and you have to do some sort of check for each row to see whether or not it exists in B. The only option you’ve got for doing something about the “not exists” clause is to find a way of making it as a cheap as possible to implement.

A couple of people came up with suggestions for rewriting the query to make it more efficient. One suggested writing it as a “NOT IN” subquery, but it’s worth remembering that the optimizer may cheerfully transform a “NOT IN” subquery to a “NOT EXISTS” subquery if it’s legal and a manual rewrite may overlook the problem of NULLs; another suggested rewriting the query as an outer join, but again it’s worth remembering that the optimimzer may transform a “NOT EXISTS” subquery to an “ANTI-JOIN” – which is a bit like an outer join with filter, only more efficient. So, before suggesting a rewrite, it’s worth looking at the execution plan to see what the optimizer is doing just in case it’s doing something silly. There are two options – anti-join or filter subquery.

Here, with code I’ve run under 10.2.0.5 to match the OP, is a demonstration data set, with the two plans you might expect to see – first, some the data:


execute dbms_random.seed(0)

create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(0,4))           c,
        trunc(dbms_random.value(0,5))           d,
        trunc(dbms_random.value(0,300))         e,
        rownum                                  f,
        rpad('x',100)                   padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t2
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(0,4))           c,
        trunc(dbms_random.value(0,5))           d,
        trunc(dbms_random.value(0,300))         e,
        rownum                                  f,
        rpad('x',100)                   padding
from
        generator       v1,
        generator       v2
where
        rownum <= 24000
;

create index t1_i1 on t1(c,d,e);
create index t2_i1 on t2(c,d,e);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

The OP had followed up their original query with a claim that “Table A holds 100 million rows and table B holds 24,000″ – that’s a lot of checks (if true) and you ought to be asking how quickly the OP expects the query to run and how many of the 100 M rows are going to survive the check. I’ve set up just 1M rows with 6,000 distinct values for the column combination (c,d,e), and a reference table with 24,000 rows which are likely to include most, but not all, of those 6,000 combinations.

Rather than generate a very large output, I’ve written a query that generates the required data set, then counts it:


select
        max(f), count(*)
from (
        SELECT   /*+ no_merge */
                 A.c,
                 A.d,
                 A.e,
                 A.f
          FROM   t1 A
        WHERE   NOT EXISTS (SELECT   /* no_unnest */
                                      1
                               FROM   t2 B
                              WHERE   B.c = A.c AND B.d = A.d AND B.e = A.e)
)
;

This took about 0.35 seconds to run – aggregating roughly 14,500 rows from 1M. The plan was (as I had expected) based on a (right) hash anti join:


---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |     1 |    13 |  2183   (5)| 00:00:11 |
|   1 |  SORT AGGREGATE         |       |     1 |    13 |            |          |
|   2 |   VIEW                  |       |   999K|    12M|  2183   (5)| 00:00:11 |
|*  3 |    HASH JOIN RIGHT ANTI |       |   999K|    23M|  2183   (5)| 00:00:11 |
|   4 |     INDEX FAST FULL SCAN| T2_I1 | 24000 |   234K|    11  (10)| 00:00:01 |
|   5 |     TABLE ACCESS FULL   | T1    |  1000K|    14M|  2151   (4)| 00:00:11 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("B"."C"="A"."C" AND "B"."D"="A"."D" AND "B"."E"="A"."E")

Oracle has built an in-memory hash table from the 24,000 rows in t2, then scanned the t1 table, probing the hash table with each row in turn. That’s 1M probe in less than 0.35 seconds. You ought to infer from this that most of the time spent in the original query should have been spent scanning the 100M rows, and only a relatively small increment appear due to the “not exists” clause.

You’ll notice, though that there was a comment in my subquery with the /* no_unnest */ hint embedded – if I change this from a comment to a hint (/*+ */) I should get a plan with a filter subquery, and maybe that’s what’s happening to the OP for some odd reason. Here’s the plan:


------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |     1 |    13 | 15166   (1)| 00:01:16 |
|   1 |  SORT AGGREGATE      |       |     1 |    13 |            |          |
|   2 |   VIEW               |       |   999K|    12M| 15166   (1)| 00:01:16 |
|*  3 |    FILTER            |       |       |       |            |          |
|   4 |     TABLE ACCESS FULL| T1    |  1000K|    14M|  2155   (4)| 00:00:11 |
|*  5 |     INDEX RANGE SCAN | T2_I1 |     4 |    40 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter( NOT EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "B"
              WHERE "B"."E"=:B1 AND "B"."D"=:B2 AND "B"."C"=:B3))
   5 - access("B"."C"=:B1 AND "B"."D"=:B2 AND "B"."E"=:B3)

The query took 1.65 seconds to complete. (And re-running with rowsource execution statistics enabled, I found that the subquery had executed roughly 914,000 times in that 1.65 seconds). Even if the original query had used the filter subquery plan the subquery shouldn’t have made much difference to the overall performance. Of course if T2 didn’t have that index on (c,d,e) then the filter subquery plan would have been much more expensive – but then, we would really have expected to see the hash anti-join.

If you’re wondering why the subquery ran 914,000 times instead of 1M times, you’ve forgotten “scalar subquery caching”.  The session caches a limited number of results from subquery execution as a query runs and may be able to use cached results (or simply a special “previous-execution” result) to minimise the number of executions of the subquery.

Did you notice the index I created on t1(c,d,e) ? If I drive the query through this index I’ll access all the rows for a given combination of (c,d,e) one after the other and only have to run the subquery once for the set. To make this happen, though, I’ll have to declare one of the columns to be NOT NULL, or add a suitable “column is not null” predicate to the query; and then I’ll probably have to hint the query anyway:


select
        max(f)
from (
        SELECT   /*+ no_merge index(a) */
                 A.c,
                 A.d,
                 A.e,
                 A.f
          FROM   t1 A
        WHERE   NOT EXISTS (SELECT   /*+ no_unnest */
                                      1
                               FROM   t2 B
                              WHERE   B.c = A.c AND B.d = A.d AND B.e = A.e)
        and     c is not null
)
;

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    13 | 65706   (1)| 00:05:29 |
|   1 |  SORT AGGREGATE               |       |     1 |    13 |            |          |
|   2 |   VIEW                        |       |   999K|    12M| 65706   (1)| 00:05:29 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T1    | 50000 |   732K| 52694   (1)| 00:04:24 |
|*  4 |     INDEX FULL SCAN           | T1_I1 | 50000 |       |  2869   (2)| 00:00:15 |
|*  5 |      INDEX RANGE SCAN         | T2_I1 |     4 |    40 |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("C" IS NOT NULL AND  NOT EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM
              "T2" "B" WHERE "B"."E"=:B1 AND "B"."D"=:B2 AND "B"."C"=:B3))
   5 - access("B"."C"=:B1 AND "B"."D"=:B2 AND "B"."E"=:B3)

Re-running this code with rowsource execution statistics enabled showed that the subquery ran just 6,000 times (as expected) – for a total run time that was slightly faster than the hash anti-join method (0.17 seconds – but I do have a new laptop using SSD only, with a 3.5GHz CPU and lots of memory).

Every which way, if we can get reasonable performance from the underlying table access there’s no way that introducing a “NOT EXISTS” ought to be a disaster. The worst case scenario – for some reason Oracle chooses to run a filter subquery plan and the appropriate index hasn’t been created to support it.

Footnote:

Of course, table A didn’t really exist, it was a three table join; and it didn’t produce 100M rows, it produced anything between zero and 5 million rows, and the effect of the subquery (which correlated back to two of the joined tables) was to leave anything between 0 and 5 million rows. And (apparently) the query was quick enough in the absence of the subquery (producing, for example, 1 million rows in only 5 minutes), but too slow with the subquery in place.

But that’s okay. Because of our tests we know that once we’ve produced a few million rows it takes fractions of a second more to pass them through a hash table with an anti-join to deal with the “not exists” subquery; and I doubt if we have to play silly games to push the data through a filter subquery plan in the right order to squeeze a few extra hundredths of a second from the query.

If the OP is happy with the basic select statement before the “not exists” subquery, all he has to do is take advantage of a no_merge hint:


select  {list of columns}
from
        (
        select /*+ no_merge */ .... rest of original query
        )    v1
where
        not exists (
                select  null
                from    b
                where   b.c = v1.c and b.d = v1.d and b.e = v1.e
        )
;

You’re probably wondering why the OP currently sees a performance problem as the subquery is added. The best guess is that the subquery has introduce a “magic 5% fudge factor” to the arithmetic (did you notice the cardinality of t1 dropping to 50,000 from 1M in the plan above) and made it pick a worse execution plan for the rest of the query. We can’t tell, though, since the OP hasn’t yet given us the information that would allow us to see what’s going wrong.

April 10, 2015

Counting

Filed under: Execution plans,Indexing,Oracle,Performance — Jonathan Lewis @ 5:27 pm BST Apr 10,2015

There’s a live example on OTN at the moment of an interesting class of problem that can require some imaginative thinking. It revolves around a design that uses a row in one table to hold the low and high values for a range of values in another table. The problem is then simply to count the number of rows in the second table that fall into the range given by the first table. There’s an obvious query you can write (a join with inequality) but if you have to join each row in the first table to several million rows in the second table, then aggregate to count them, that’s an expensive strategy.  Here’s the query (with numbers of rows involved) that showed up on OTN; it’s an insert statement, and the problem is that it takes 7 hours to insert 37,600 rows:


    INSERT INTO SA_REPORT_DATA
    (REPORT_ID, CUTOFF_DATE, COL_1, COL_2, COL_3)
    (
    SELECT 'ISRP-734', to_date('&DateTo', 'YYYY-MM-DD'),
           SNE.ID AS HLR
    ,      SNR.FROM_NUMBER||' - '||SNR.TO_NUMBER AS NUMBER_RANGE
    ,      COUNT(M.MSISDN) AS AVAILABLE_MSISDNS
    FROM
           SA_NUMBER_RANGES SNR          -- 10,000 rows
    ,      SA_SERVICE_SYSTEMS SSS        --  1,643 rows
    ,      SA_NETWORK_ELEMENTS SNE       --    200 rows
    ,      SA_MSISDNS M                  --    72M rows
    WHERE
           SSS.SEQ = SNR.SRVSYS_SEQ
    AND    SSS.SYSTYP_ID = 'OMC HLR'
    AND    SNE.SEQ = SSS.NE_SEQ
    AND    SNR.ID_TYPE = 'M'
    AND    M.MSISDN  >= SNR.FROM_NUMBER
    AND    M.MSISDN  <= SNR.TO_NUMBER
    AND    M.STATE  = 'AVL'
    GROUP BY
           SNE.ID,SNR.FROM_NUMBER||' - '||SNR.TO_NUMBER
    )  

The feature here is that we are counting ranges of MSISDN: we take 10,000 number ranges (SNR) and join with inequality to a 72M row table. It’s perfectly conceivable that at some point the data set expands (not necessarily all at once) to literally tens of billions of rows that are then aggregated down to the 37,500 that are finally inserted.

The execution plan shows the optimizer joining the first three tables before doing a merge join between that result set and the relevant subset of the MSISDNs table – which means the MSISDNs have to be sorted and buffered (with a probably spill to disc) before they can be used. It would be interesting to see the rowsource execution stats for the query – partly to see how large the generated set became, but also to see if the ranges involved were so large that most of the time went in constantly re-reading the sorted MSISDNs from the temporary tablespace.

As far as optimisation is concerned, there are a couple of trivial things around the edges we can examine: we have 10,000 number ranges but insert 37,600 results, and the last stages of the plan generated those results so we’ve scanned and aggregated the sorted MSISDNs 37,600 times. Clearly we could look for a better table ordering that (eliminated any number ranges early), then did the minimal number of joins to MSISDN, aggregated, then scaled up to 37,600: with the best join order we might reduce the run time by a factor of 3 or more. (But that’s still a couple of hours run time.)

What we really need to do to make a difference is change the infrastructure in some way – prefereably invisibly to the rest of the application. There are a number of specific details relating to workload, read-consistency, timing, concurrency, etc. that will need to be considered, but broadly speaking, we need to take advantage of a table that effectively holds the “interesting” MSISDNs in sorted order. I’ve kept the approach simple here, it needs a few modifications for a production system. The important bit of the reports is the bit that produces the count, so I’m only going to worry about a two-table join – number ranges and msidn; here’s some model data:


execute dbms_random.seed(0)

create table msisdns
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(1e9,1e10))      msisdn
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table number_ranges
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(dbms_random.value(1e9,1e10))      from_number,
        trunc(dbms_random.value(1e9,1e10))      to_number
from
        generator       v1
where
        rownum  <= 1000
;

update number_ranges set
        from_number = to_number,
        to_number = from_number
where
        to_number < from_number
;

commit;

I’ve created a table of numbers with values between 10e9 and 10e10 to represent 1 million MSISDNs, and a list of 1,000 number ranges – making sure that the FROM number is not greater than the TO number. Now I need a “summary” table of the MSISDNs, which I’m going to create as an index-organized table:


create table tmp_msisdns (
        msisdn,
        counter,
        constraint tmp_pk primary key (msisdn, counter)
)
organization index
as
select
        msisdn,
        row_number() over(order by msisdn)      counter
from
        msisdns
;

This is only a demonstration so I’ve haven’t bothered with production-like code to check that the MSISDNs I had generated were unique (they were); and I’ve casually included the row_number() as part of the primary key as a performance fiddle even though it’s something that could, technically, allow some other program to introduce bad data if I made the table available for public use rather than task specific.

Finally we get down to the report. To find out how many MSISDN values there are between the FROM and TO number in a range I just have to find the lowest and highest MSISDNs from tmp_msisdn in that range and find the difference between their counter values, and add 1. And there’s a very fast way to find the lowest or highest values when you have the appropriate index – the min/max range scan – but you have to access the table twice, once for the low, once for the high. Here’s the necessary SQL, with execution plan from 12.1.0.2:


select
        nr.from_number, nr.to_number,
--      fr1.msisdn, fr1.counter,
--      to1.msisdn, to1.counter,
        1 + to1.counter - fr1.counter range_count
from
        number_ranges   nr,
        tmp_msisdns     fr1,
        tmp_msisdns     to1
where
        fr1.msisdn = (
                select min(msisdn) from tmp_msisdns where tmp_msisdns.msisdn >= nr.from_number
        )
and     to1.msisdn = (
                select max(msisdn) from tmp_msisdns where tmp_msisdns.msisdn <= nr.to_number
        )
;

-------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |               |       |       |  4008 (100)|          |
|   1 |  NESTED LOOPS                   |               |  1000 | 38000 |  4008   (1)| 00:00:01 |
|   2 |   NESTED LOOPS                  |               |  1000 | 26000 |  2005   (1)| 00:00:01 |
|   3 |    TABLE ACCESS FULL            | NUMBER_RANGES |  1000 | 14000 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN             | TMP_PK        |     1 |    12 |     2   (0)| 00:00:01 |
|   5 |     SORT AGGREGATE              |               |     1 |     7 |            |          |
|   6 |      FIRST ROW                  |               |     1 |     7 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| TMP_PK        |     1 |     7 |     3   (0)| 00:00:01 |
|*  8 |   INDEX RANGE SCAN              | TMP_PK        |     1 |    12 |     2   (0)| 00:00:01 |
|   9 |    SORT AGGREGATE               |               |     1 |     7 |            |          |
|  10 |     FIRST ROW                   |               |     1 |     7 |     3   (0)| 00:00:01 |
|* 11 |      INDEX RANGE SCAN (MIN/MAX) | TMP_PK        |     1 |     7 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("FR1"."MSISDN"=)
   7 - access("TMP_MSISDNS"."MSISDN">=:B1)
   8 - access("TO1"."MSISDN"=)
  11 - access("TMP_MSISDNS"."MSISDN"<=:B1)

Execution time – with 1 million MSISDNs and 1,000 ranges: 0.11 seconds.

For comparative purposes, and to check that the code is producing the right answers, here’s the basic inequality join method:


select
        nr.from_number, nr.to_number, count(*) range_count
from
        number_ranges   nr,
        msisdns         ms
where
        ms.msisdn >= nr.from_number
and     ms.msisdn <= nr.to_number
group by
        nr.from_number, nr.to_number
order by
        nr.from_number
;

-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |               |       |       |       |   472K(100)|          |
|   1 |  HASH GROUP BY        |               |   707K|    14M|  6847M|   472K (17)| 00:00:19 |
|   2 |   MERGE JOIN          |               |   255M|  5107M|       | 13492  (77)| 00:00:01 |
|   3 |    SORT JOIN          |               |  1000 | 14000 |       |     3  (34)| 00:00:01 |
|   4 |     TABLE ACCESS FULL | NUMBER_RANGES |  1000 | 14000 |       |     2   (0)| 00:00:01 |
|*  5 |    FILTER             |               |       |       |       |            |          |
|*  6 |     SORT JOIN         |               |  1000K|  6835K|    30M|  3451   (7)| 00:00:01 |
|   7 |      TABLE ACCESS FULL| MSISDNS       |  1000K|  6835K|       |   245  (14)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - filter("MS"."MSISDN"<="NR"."TO_NUMBER")
   6 - access("MS"."MSISDN">="NR"."FROM_NUMBER")
       filter("MS"."MSISDN">="NR"."FROM_NUMBER")

The two queries produced the same results (apart from ordering); but the second query took 2 minutes 19.4 seconds to complete.

 

Update:

In a moment of idle curiosity I recreated the data with 40 Million rows in the MSISDNs table to get some idea of how fast the entire report process could go when re-engineered (remember the OP has 72M rows, but select the subset flagged as ‘AVL’). It took 1 minute 46 seconds to create the IOT – after which the report for 1,000 number ranges still took less than 0.2 seconds.

 

 

 

 

 

February 12, 2015

Parallel rownum

Filed under: Execution plans,Oracle,Parallel Execution — Jonathan Lewis @ 7:27 am BST Feb 12,2015

It’s easy to make mistakes, or overlook defects, when constructing parallel queries – especially if you’re a developer who hasn’t been given the right tools to make it easy to test your code. Here’s a little trap I came across recently that’s probably documented somewhere, which could be spotted easily if you had access to the OEM SQL Monitoring screen, but would be very easy to miss if you didn’t check the execution plan very carefully. I’ll start with a little script to generate some data:


create table t1 nologging
as
select * from all_objects where rownum <= 50000
;

insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;
insert /*+ append */ into t1 select * from t1;
commit;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create table t2 as select * from t1;
alter table t2 add id number(10,0);

All I’ve done is create some data – 800,000 rows – and then create a table to copy it to; and while I copy it I’m going to add a temporary id to the rows, which I’ll do with a call to rownum; and since there’s a lot of data I’ll use parallel execution:


alter session enable parallel dml;

insert /*+ parallel(t2 3) */ into t2
select /*+ parallel(t1 4) */ t1.* , rownum from t1;

For the purposes of experiment and entertainment I’ve done something a little odd by supplying two possible degrees of parallelism, but this lets me ask the query: will this statement run parallel 3, parallel 4, both of the above, or neither ? (You may assume that I have parallel execution slaves available when the statement runs.)

The answer is both – because that rownum does something nasty to the execution plan (I didn’t include the 50,000 limit in my first test, which is why the plan reports 993K rows instead of 800,000):


--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT           |          |   993K|    92M|  1076   (1)| 00:00:13 |        |      |            |
|   1 |  PX COORDINATOR            |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)      | :TQ20001 |   993K|    92M|  1076   (1)| 00:00:13 |  Q2,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT          | T2       |       |       |            |          |  Q2,01 | PCWP |            |
|   4 |     PX RECEIVE             |          |   993K|    92M|  1076   (1)| 00:00:13 |  Q2,01 | PCWP |            |
|   5 |      PX SEND ROUND-ROBIN   | :TQ20000 |   993K|    92M|  1076   (1)| 00:00:13 |        | S->P | RND-ROBIN  |
|   6 |       COUNT                |          |       |       |            |          |        |      |            |
|   7 |        PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   8 |         PX SEND QC (RANDOM)| :TQ10000 |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | P->S | QC (RAND)  |
|   9 |          PX BLOCK ITERATOR |          |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | PCWC |            |
|  10 |           TABLE ACCESS FULL| T1       |   993K|    92M|  1076   (1)| 00:00:13 |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------

See that “P->S” (parallel to serial) at operation 8. The select statement runs in parallel (degree 4) to scan the data, and then sends it all to the query co-ordinator to supply the rownum; then the query co-ordinator re-distributes the data (including rownum) to another set of slave (S->P) to do the parallel (degree 3) insert. The P->S at line 2 shows the parallel execution slaves passing details to the query co-ordinator of the private segments that they have created so that the query co-ordinator can stitch the segments together into a single data segment for the table. (If you watch closely you’ll see the query co-ordinator doing a few local writes as it tidies up the header blocks in those segment blocks.)

There are two threats to this rownum detail. The first, of course, is that the operation essentially serialises through the query co-ordinator so it’s going to take longer than you might expect; secondly an accident of this type is typically going to allocate twice as many parallel execution slaves as you might have expected – the select and the insert are two separate data flow operations (note how the Name column shows TQ1xxxx and TQ2xxxx), each gets its own slave sets, and both sets of slaves are held for the duration of the statement. If this statement is demanding twice the slaves it should be using, then you may find that other statements that start running at the same time get their degree of parallelism downgraded because you’ve run out of PX slaves. Although the rownum solution is nice and clean – it require no further infrastructure – you probably need to introduce a sequence (with a large cache) to get the same effect without losing parallelism.

If you look at v$pq_tqstat after running this statement the results are a little disappointing – there are a few problems connecting lines from the plan with rows in the view – here’s my original output (and you’ll now see why I chose to have two different degrees of parallelism):


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Consumer               1 P000                331330   39834186         74         71           0
                                             1 P001                331331   39844094         75         72           0
                                             1 P002                330653   39749806         74         71           0

                    1 Producer               1 P000                     1        131       2263        396           0
                                             1 P001                     1        131       2238        417           0
                                             1 P002                     1        131       2182        463           0

         2          0 Producer               1 P003                247652   28380762         13          0           0
                                             1 P004                228857   26200574         13          1           0
                                             1 P005                267348   30496182         14          0           0
                                             1 P006                249457   28401982         13          0           0
                                             1 QC                  993314  119428086 4294967269 4294967286           0
                      Consumer               1 QC                  993314  113479500        125         65           0

                    1 Consumer               1 QC                       3        393          2          1           0

The first problem is that the DFO_number reported in the view doesn’t match with the :TQ1xxxx and :TQ2xxxx reported in the plan – the parallel 4 bit is the select, which is covered by :TQ1000, but it’s listed under DFO_Number = 2 in the view, and the insert is the parallel 3 bit, which is covered by :TQ2000 and :TQ20001 but listed under DFO_Number = 1.

More confusingly, potentially, is that the all appearances of the query coordinator have been assigned to DFO_Number = 2. Ignoring the fact that the DFO_Number column switches the 1 and 2 from the plan, what we should see is as follows:

  • The consumer at line 16 is consuming from the 4 producers at lines 11 – 14.
  • The producer at line 15 is producing FOR the 3 consumers at lines 3 – 5
  • The consumer at line 18 is consuming from the producers at lines 7 – 9

Ideally (including the correction for the DFO_Number) I think the view content should be as follows:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P003                247652   28380762         13          0           0
                                             1 P004                228857   26200574         13          1           0
                                             1 P005                267348   30496182         14          0           0
                                             1 P006                249457   28401982         13          0           0
                      Consumer               1 QC                  993314  113479500        125         65           0

         2          0 Producer               1 QC                  993314  119428086 4294967269 4294967286           0
                      Consumer               1 P000                331330   39834186         74         71           0
                                             1 P001                331331   39844094         75         72           0
                                             1 P002                330653   39749806         74         71           0

                    1 Producer               1 P000                     1        131       2263        396           0
                                             1 P001                     1        131       2238        417           0
                                             1 P002                     1        131       2182        463           0
                      Consumer               1 QC                       3        393          2          1           0

Just don’t ask me why the waits and timeouts for the QC as producer seem to be counting backwards from 2^32.

January 15, 2015

Execution Plans

Filed under: Execution plans,Oracle — Jonathan Lewis @ 11:00 am BST Jan 15,2015

This is the index to a series of articles I’ve been writing for redgate, published on their AllThingsOracle site, about generating and interpreting execution plans.

When I started I didn’t really know how many parts it was going to end up as, I had thought maybe 5 or 6 but that was a wildly inaccurate estimate. It has ended up at 14, and I’ve finally finished the last article.

 

 

January 14, 2015

Bind Effects

Filed under: Execution plans,Oracle,Troubleshooting — Jonathan Lewis @ 1:24 pm BST Jan 14,2015

A couple of days ago I highlighted an optimizer anomaly caused by the presence of an index with a descending column. This was a minor (unrelated) detail that appeared in a problem on OTN where the optimizer was using an index FULL scan when someone was expecting to see an index RANGE scan. My earlier posting supplies the SQL to create the table and indexes I used to model the problem – and in this posting I’ll explain the problem and answer the central question.

Here’s the query and execution plan (from 11.2.0.x) as supplied by the OP – the odd appearance of the sys_op_descend() function calls is the minor detail that I explained in the previous post, but that’s not really relevant to the question of why Oracle is using an index full scan rather than an index range scan. The /*+ first_rows */ hint isn’t something you should be using but it was in the OP’s query, so I’ve included it in my model:


select /*+ FIRST_ROWS gather_plan_statistics scanned */ count(1)      FROM  XXX
where  (((((COL1 = '003' and COL2 >= '20150120') and COL3 >= '00000000') and COL4>= '000000000000' )
or ((COL1 = '003' and COL2 >= '20150120') and COL3> '00000000' )) or (COL1= '003' and COL2> '20150120'))
order by COL1,COL2,COL3,COL4  

Plan hash value: 919851669  

---------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |      1 |        |  18533 |00:01:47.04 |    156K |  70286 |
|   1 |  TABLE ACCESS BY INDEX ROWID| XXX    |      1 |  7886K |  18533 |00:01:47.04 |    156K |  70286 |
|*  2 |   INDEX FULL SCAN           | XXXXPK |      1 |  7886K |  18533 |00:01:30.36 |    131K |  61153 |
---------------------------------------------------------------------------------------------------------  

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - filter((("COL2">:B2 AND "COL1"=:B1 AND
              SYS_OP_DESCEND("COL2")<SYS_OP_DESCEND(:B2)) OR ("COL1"=:B1 AND "COL2">=:B2
              AND "COL3">:B3 AND SYS_OP_DESCEND("COL2")<=SYS_OP_DESCEND(:B2)) OR
              ("COL1"=:B1 AND "COL2">=:B2 AND "COL3">=:B3 AND "COL4">=:B4 AND
              SYS_OP_DESCEND("COL2")<=SYS_OP_DESCEND(:B2))))  

If you look closely you’ll see that the OP has NOT supplied the output from a call to dbms_xplan.display_cursor() – the column and table names are highly suspect (but that’s allowable cosmetic change for confidentiality reasons) the giveaway is that the SQL statement uses literals but the execution plan is using bind variables (which are of the form B{number}, suggesting that the real SQL is embedded in PL/SQL with PL/SQL variables being used to supply values): the bind variables make a difference.

Let’s go back to my model to demonstrate the problem. Here’s a query with the same predicate structure as the problem query (with several pairs of brackets eliminated to improve readability) showing the actual run-time plan (from 11.2.0.4) when using literals:


select
        /*+ first_rows */
        *
from t1
where
        (C1 = 'DE' and C2 >  'AB')
or      (C1 = 'DE' and C2 >= 'AB' and C3 > 'AA' )
or      (C1 = 'DE' and C2 >= 'AB' and C3 >= 'AA' and C4 >= 'BB')
order by
        C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |    21 |  2478 |     4  (25)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |    21 |       |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(((SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') AND "C2">'AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C3">'AA' AND "C2">='AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C4">='BB' AND "C2">='AB' AND
              "C3">='AA')))

As you can see, the optimizer has managed to “factor out” the predicate C1 = ‘DE’ from the three disjuncts and has then used it as an access() predicate for an index range scan. Now let’s see what the code and plan look like if we replace the four values by four bind variables:


variable B1 char(2)
variable B2 char(2)
variable B3 char(2)
variable B4 char(2)

begin
        :b1 := 'DE';
        :b2 := 'AB';
        :b3 := 'AA';
        :b4 := 'BB';
end;
/

select
        /*+ first_rows */
        *
from t1
where
        (C1 = :B1  and C2 >  :B2 )
or      (C1 = :B1  and C2 >= :B2 and C3 >  :B3 )
or      (C1 = :B1  and C2 >= :B2 and C3 >= :B3 and C4 >= :B4)
order by C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |    31 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   437 | 51566 |    31   (4)| 00:00:01 |
|*  2 |   INDEX FULL SCAN           | T1_IASC |   437 |       |    27   (4)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("C1"=:B1 AND "C2">:B2) OR ("C1"=:B1 AND "C3">:B3 AND
              "C2">=:B2) OR ("C1"=:B1 AND "C4">=:B4 AND "C2">=:B2 AND "C3">=:B3)))

The optimizer hasn’t factored out the common expression C1 = :B1. The reason for this, I think, is that though WE know that :B1 is supposed to be the same thing in all three occurrences the optimizer isn’t able to assume that that’s the case; in principle :B1 could be the place holder for 3 different values – so the optimizer plays safe and optimizes for that case. This leaves it with three options: Full tablescan with filter predicates, index full scan with filter predicates, three-part concatenation with index range scans in all three parts. The combination of the /*+ first_rows */ hint and the “order by” clause which matches the t1_1asc index has left the optimizer choosing the index full scan path – presumably to avoid the need to collect all the rows and sort them before returning the first row.

Given our understanding of the cause of the problem we now have a clue about how we might make the query more efficient – we have to eliminate the repetition of (at least) the :B1 bind variable. In fact we can get some extra mileage by modifying the repetition of the :B2 bind variable. Here’s a rewrite that may help:


select
        /*+ first_rows */
        *
from t1
where
        (C1 = :B1 and C2 >= :B2)
and     (
             C2 > :B2
         or  C3 > :B3
         or (C3 >= :B3 and C4 > :B4)
        )
order by C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"=:B1 AND "C2">=:B2 AND "C2" IS NOT NULL)
       filter(("C2">:B2 OR "C3">:B3 OR ("C4">:B4 AND "C3">=:B3)))

I’ve factored out as much of the C1 and C2 predicates as I can – and the optimizer has used the resulting conditions as the access() predicate on the index (adding in a “not null” predicate on C2 that looks redundant to me – in fact the index was on the primary key in the original, but I hadn’t included that constraint in my model). You’ll notice, by the way, that the cardinality is now 148; compare this with the previous cardinality of 437 and you might (without bothering to look closely as the 10053 trace) do some hand-waving around the fact that 437 = (approximately) 148 * 3, which fits the idea that the optimizer was treating the three :B1 appearances as if they were three different possible values accessing three sets of data.

Miscellaneous.

This isn’t the end of the story; there are always more complications and surprises in store as you look further into the detail. For example, on the upgrade to 12c the execution plan for the query with bind variables was the same (ignoring the sys_op_descend() functions) as the query using literals – the optimizer managed to factor out the C1 predicate: does this mean SQL*Plus got smarter about telling the optimizer about the bind variables, or does it mean the optimizer got smarter about something that SQL*Plus has been doing all along ?

This change might make you think that the optimizer is supposed to assume that bind variables of the same name represent the same thing – but that’s not correct, and it’s easy to show; here’s a trivial example (accessing the same table with a query that, for my data, identifies the first row):


declare
        m_id number := 1;
        m_c1 char(2) := 'BV';
        m_c2 char(2) := 'GF';
        m_n number := 0;
begin
        execute immediate
                'SELECT /*+ FIND THIS */ COUNT(*) FROM T1 WHERE ID = :B1 AND C1 = :B1 AND C2 = :B1'
                into m_n
                using m_id, m_c1, m_c2
        ;
end;
/

select sql_id, sql_text from V$sql where sql_text like 'SELECT%FIND THIS%';

SQL_ID        SQL_TEXT
------------- ----------------------------------------------------------------------------------
9px3nuv54maxp SELECT /*+ FIND THIS */ COUNT(*) FROM T1 WHERE ID = :B1 AND C1 = :B1 AND C2 = :B1

If you were looking at the contents of v$sql, or a trace file, or an AWR report, you might easily be fooled into thinking that this was a query where the same value had been used three times – when we know that it wasn’t.

So, as we upgrade from 11g to 12c my model of the original problem suggests that the problem is going to go away – but, actually, I don’t really know why that’s the case (yet). On the other hand, I have at least recognised a pattern that the 11g optimizer currently has a problem with, and I have a method for helping the optimizer to be a little more efficient.

 

January 12, 2015

FBI Bug reprise

Filed under: Execution plans,Oracle,Troubleshooting,Uncategorized — Jonathan Lewis @ 12:52 pm BST Jan 12,2015

I’ve just had cause to resurrect a blog note I wrote three years ago. The note says that an anomaly I discovered in 9.2.0.8 wasfixed in 10.2.0.3 – and this is true for the simple example in the posting; but a recent question on the OTN database forum has shown that the bug still appears in more complex cases.  Here’s some code to create a table and two indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	cast(dbms_random.string('U',2) as char(2))	c1,
	cast(dbms_random.string('U',2) as char(2))	c2,
	cast(dbms_random.string('U',2) as char(2))	c3,
	cast(dbms_random.string('U',2) as char(2))	c4,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_iasc  on t1(c1, c2,      c3, c4) nologging;
create index t1_idesc on t1(c1, c2 desc, c3, c4) nologging;

I’ve designed the table to model the way a problem was presented on OTN, it’s possible that the anomaly would appear in simpler circumstance; note that I have two indexes on the same four columns, but the second column of one of the indexes is declared as descending. To identify the indexes easily in execution plans the latter index has the text “desc” in its name. So here’s a query, with execution plan, that should use one of these indexes:


select
        *
from t1
where
        (C1 = 'DE' and C2 >  'AB')
or      (C1 = 'DE' and C2 >= 'AB' and C3 > 'AA' )
or      (C1 = 'DE' and C2 >= 'AB' and C3 >= 'AA' and C4 >= 'BB')
order by
        C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |    21 |  2478 |     4  (25)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |    21 |       |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(((SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') AND "C2">'AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C3">'AA' AND "C2">='AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C4">='BB' AND "C2">='AB' AND
              "C3">='AA')))

Sure enough – the query has used the t1_iasc index – but why has the optimizer introduced all those predicates with the sys_op_descend() function calls in them when we’re not using an index with a descending column ? Somewhere in the code path the optimizer has picked up the other index, and decided to use some information from it that is totally redundant. One side effect of this is that the cardinality prediction is 21 – if I drop the index t1_idesc the sys_op_descend() calls disappear and the cardinality changes to 148.

Oracle 12c behaves differently – it uses concatenation to turn the query into three separate index probes unless I add the hint /*+ first_rows */ (which I shouldn’t be using, but it’s the hint supplied by the OP on OTN). With the hint in place we get an example of the actual execution plan differing from the prediction made through explain plan:


12c execution plan unhinted (concatenation, and lots of sys_op_descend()):

--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |       |       |    12 (100)|          |
|   1 |  SORT ORDER BY                        |          |   149 | 17582 |    12   (9)| 00:00:01 |
|   2 |   CONCATENATION                       |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   142 | 16756 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     6 |   708 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |   118 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN                  | T1_IASC  |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"='DE' AND "C3">='AA' AND "C4">='BB' AND
              SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C4">='BB' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND "C3">='AA')
   6 - access("C1"='DE' AND "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C3">'AA' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND
              (LNNVL("C4">='BB') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL(SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB') OR LNNVL("C3">='AA')))
   8 - access("C1"='DE' AND "C2">'AB' AND "C2" IS NOT NULL)
       filter((LNNVL("C3">'AA') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL("C2">='AB')) AND (LNNVL("C4">='BB') OR
              LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR LNNVL("C2">='AB') OR
              LNNVL("C3">='AA')))

12c Execution plan with first_rows hint (and the sys_op_descend have gone)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |   150 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(("C2">'AB' OR ("C3">'AA' AND "C2">='AB') OR ("C4">='BB' AND
              "C2">='AB' AND "C3">='AA')))

12c Execution plan with first_rows according to Explain Plan (and they're back again)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   148 | 17464 |   150   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter("C2">'AB' AND SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') OR
              "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' OR
              "C4">='BB' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' AND
              "C3">='AA')

On the plus side the upgrade to 12c has removed some of the sys_op_descend() appearances and seems to have fixed the cardinality anomaly when the sys_op_descend() calls do appear – but things are still going wrong in odd ways. (And this looks like another case where you could drop an index that you’re not using and seeing an execution plan change.)

Footnote:

When playing around with 12c, I had to be very careful to avoid confusing the issue as SQL Plan Directives were appearing from time to time, and giving me unexpected results as I repeated some of the experiments.

Footnote 2:

I haven’t actually addressed the question originally posed on OTN, but that can wait for another day.

January 7, 2015

Most Recent

Filed under: Execution plans,Oracle,Performance,subqueries — Jonathan Lewis @ 6:21 pm BST Jan 7,2015

There’s a thread on the OTN database forum at present asking for advice on optimising a query that’s trying to find “the most recent price” for a transaction given that each transaction is for a stock item on a given date, and each item has a history of prices where each historic price has an effective start date. This means the price for a transaction is the price as at the most recent date prior to the transaction date.

There is an absolutely standard way of expressing “the most recent occurrence” in SQL. Assume we have a table of (item_code, effective_date, price) with the obvious primary key of (item_code, effective_date), then a requirement to find “the most recent price for item XXXX as at 25th Dec 2014″ case would give us code like the following (note – all the examples in this note were run against Oracle 11.2.0.4):


select  *
from    prices  pri1
where   item_code = 'XXXX'
and     effective_date = (
                select  max(effective_date)
                from    prices  pri2
                where   pri2.item_code = 'XXXX'
                and     pri2.effective_date <= date'2014-12-25'
        )
/

The ideal execution plan that we should expect to see for this query is as follows (with a small variation if you had created the prices table as an index-organized table – which would probably be sensible in many cases):


-----------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |     1 |    52 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID   | PRICES |     1 |    52 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN            | PRI_PK |     1 |       |     1   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE              |        |     1 |    32 |            |          |
|   4 |     FIRST ROW                  |        |     1 |    32 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN (MIN/MAX)| PRI_PK |     1 |    32 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"= (SELECT
              MAX("EFFECTIVE_DATE") FROM "PRICES" "PRI2" WHERE
              "PRI2"."EFFECTIVE_DATE"<=TO_DATE(' 2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "PRI2"."ITEM_CODE"='XXXX'))

   5 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
             TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

As you can see, this plan is using the “driving subquery” approach – the order of operation is 5, 4, 3, 2, 1, 0: we do an index min/max range scan in line 5 to find the maximum effective date for the item, then pass that up through the (essentially redundant) First Row and Sort Aggregate operations to use as an input to the index unique scan at operation 2 which passes the rowid up to operation 1 to find the specific row. In my case this was 2 consistent gets for the range scan, 2 more for the unique scan, and one for the table access.

You might point out that my example uses the item_code ‘XXXX’ twice, once in the main query, once in the subquery; and you might decide that this was in very poor taste since we should clearly be using a correlated subquery – the correlating predicate ought to be: pri2.item_code = pri1.item_code. Here’s the execution plan I got when I made that change:


----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     1 |    78 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |     1 |    78 |     3   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |     1 |    78 |     3   (0)| 00:00:01 |
|   3 |    VIEW                      | VW_SQ_1 |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |     FILTER                   |         |       |       |            |          |
|   5 |      HASH GROUP BY           |         |     1 |    32 |     2   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN       | PRI_PK  |     1 |    32 |     2   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN         | PRI_PK  |     1 |       |     0   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| PRICES  |     1 |    52 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("PRI2"."ITEM_CODE"='XXXX')
   6 - access("PRI2"."ITEM_CODE"='XXXX' AND "PRI2"."EFFECTIVE_DATE"<=
              TO_DATE('2014-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("ITEM_CODE"='XXXX' AND "EFFECTIVE_DATE"="MAX(EFFECTIVE_DATE)")

The plan changes dramatically, the optimizer has unnested the subquery. In my case this didn’t make any difference to the overall performance as my data set was small, I only had one or two prices per item code, and the query was very basic; but in most other cases the change could be catastrophic.

The Problem Query

The requirement on OTN had a stock transactions (xo_stock_trans) table and a prices (xo_prices) table, and the OP had supplied some code to create and populate these tables with 6.4 million and 4.5 million rows respectively. Unfortunately the xo_prices table didn’t have a suitable unique constraint on it and ended up with lots of items having multiple prices for the same date.  The OP had created a function to return a price for an item given a driving date and price_type, and had a query that called that function three times per row (once for each of three price types); but this did not perform very well and the OP wanted to know if there was a way of addressing the requirement efficiently using pure SQL; (s)he had already tried the following:


select tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = pr.item
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014';  

That was SO close – it’s clearly implementing the right sort of strategy: but it didn’t perform well, so let’s check the execution plan:

------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name               | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   1 |  NESTED LOOPS                 |                    |     1 |    70 |       |   168M(100)|234:06:13 |
|   2 |   NESTED LOOPS                |                    |     9 |    70 |       |   168M(100)|234:06:13 |
|   3 |    NESTED LOOPS               |                    |     9 |   450 |       |   168M(100)|234:06:13 |
|   4 |     VIEW                      | VW_SQ_1            |   286 | 10010 |       |   168M(100)|234:06:11 |
|   5 |      HASH GROUP BY            |                    |   286 |  7722 |       |   168M(100)|234:06:11 |
|   6 |       MERGE JOIN              |                    |   456G|    11T|       |  9153K(100)| 12:42:50 |
|   7 |        SORT JOIN              |                    |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  8 |         INDEX RANGE SCAN      | XO_STOCK_TRANS_IX2 |   202K|  2960K|       |   548   (2)| 00:00:03 |
|*  9 |        SORT JOIN              |                    |  4045K|    46M|   154M| 19043   (6)| 00:01:36 |
|* 10 |         INDEX FAST FULL SCAN  | XO_PRICES_IX1      |  4045K|    46M|       |  1936  (10)| 00:00:10 |
|* 11 |     TABLE ACCESS BY USER ROWID| XO_STOCK_TRANS     |     1 |    15 |       |     1   (0)| 00:00:01 |
|* 12 |    INDEX RANGE SCAN           | XO_PRICES_IX1      |     1 |       |       |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY INDEX ROWID | XO_PRICES          |     1 |    20 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   8 - access("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   9 - access(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
       filter(INTERNAL_FUNCTION("PR2"."PRICE_DATE")<=INTERNAL_FUNCTION("TR"."TRANS_DATE"))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  11 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  12 - access("ITEM_1"="PR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
       filter("PR"."ITEM"="TR"."ITEM")

The query was limited to August 2014, which was about 198,000 rows in my table, so we might expect some signs of a brute-force approach (tablescans and hash joins rather than indexes and nested loops) – but what we get ends up with a high-precision approach with a very bad cardinality estimate after a brute-force unnesting of the “max(price_date)” subquery. The unnesting has done a range scan over 200,000 stock_trans rows, and an index fast full scan on 4.5 million prices to do a merge join and hash aggregation to find the maximum price_date for each target row in the xo_stock_trans table. (See my earlier posting on table duplication for a variation and explanation of what Oracle has done here). This step is a lot of work, but the optimizer thinks it’s going to produce only 286 rows in the aggregated result, so the next steps in the plan are indexed nested loops – which actually operate 198,000 times.

With the clue from my initial description, we need to aim for a strategy where Oracle doesn’t unnest that subquery – so let’s experiment with a basic /*+ no_unnest */ hint in the subquery and see what happens. Here’s the resulting execution plan:


--------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |   527 | 18445 |       |  6602M  (1)|999:59:59 |
|*  1 |  FILTER                       |                |       |       |       |            |          |
|*  2 |   HASH JOIN                   |                |  3423M|   111G|  5336K| 76973  (90)| 00:06:25 |
|*  3 |    TABLE ACCESS FULL          | XO_STOCK_TRANS |   202K|  2960K|       |  2531  (13)| 00:00:13 |
|   4 |    TABLE ACCESS FULL          | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
|   5 |   SORT AGGREGATE              |                |     1 |    12 |       |            |          |
|   6 |    FIRST ROW                  |                |     1 |    12 |       |     3   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   2 - access("PR"."ITEM"="TR"."ITEM")
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

The subquery now survives, and we can see a min/max range scan in the plan – but the subquery is a filter() subquery and is applied to the result of joining the 200,000 transactions to every price that applies for the item in each transaction. The optimizer thinks that this join will produce roughly 3.4 million rows but in fact with the sample data set (which had many prices per item) the join resulted in 4.4 Billion rows. The min/max subquery is as efficient as it can be, but it’s running far too often; ideally we would like it to run at most once per transaction, so why is it running late ? We could try adding the /*+ push_subq */ hint to the subquery but if we do the plan doesn’t change.

Our rapid “most recent occurrence” revolved around accessing the prices table by index while “pre-querying” for the date using a min/max subquery that knew the relevant item code already. In this case, though, we’re doing a full tablescan of the xo_prices table so the method doesn’t apply. So let’s manipulate the query to force an indexed access path for the join to the xo_prices table by adding the hints /*+ leading(tr pr) use_nl(pr) index(pr) */ to the main body of the query. This is the resulting plan:


--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |   527 | 18445 |  6614M  (1)|999:59:59 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|    11M  (2)| 16:29:13 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |    52   (2)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |    59   (2)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM")
       filter("PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */ MAX("PR2"."PRICE_DATE") FROM
              "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

We’re nearly there, the shape of the execution plan – lines 4 to 7, at any rate – matches the shape of the very simple example at the start of this article, we seem to be driving from the min/max subquery at line 7; unfortunately when we look at the predicate section of line 4 of the plan we can see that the subquery is still a filter() subquery not an access() subquery – it’s (nominally) being performed for every index entry in the range scan of the xo_prices index that we do for each xo_stock_trans row. What we want to see is an access() subquery – and checking the SQL we can see how to get there: the subquery currently correlates the item back to the xo_prices table, not to the xo_stock_trans table,  so let’s correct that correlation. Here’s our final query (though not formatted to my preference) with execution plan:


select /*+ leading(tr pr) use_nl(pr) index(pr) */  -- hint added
       tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from xo_stock_trans tr
join xo_prices pr on pr.item = tr.item
                and pr.price_date = (select /*+ no_unnest */  -- hint added
                                         max(pr2.price_date)
                                     from xo_prices pr2
                                     where pr2.item = tr.item  -- correlate to tr, not pr
                                       and pr2.price_date <= tr.trans_date
                                     )
where tr.trans_date between '01-AUG-2014' and '31-AUG-2014'
;

--------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                |  3423M|   111G|  1824K  (1)| 02:32:02 |
|   1 |  NESTED LOOPS                   |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|   2 |   NESTED LOOPS                  |                |  3413K|   113M|  1824K  (1)| 02:32:02 |
|*  3 |    TABLE ACCESS FULL            | XO_STOCK_TRANS |   202K|  2960K|  2531  (13)| 00:00:13 |
|*  4 |    INDEX RANGE SCAN             | XO_PRICES_IX1  |    16 |       |     2   (0)| 00:00:01 |
|   5 |     SORT AGGREGATE              |                |     1 |    12 |            |          |
|   6 |      FIRST ROW                  |                |     1 |    12 |     3   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN (MIN/MAX)| XO_PRICES_IX1  |     1 |    12 |     3   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID   | XO_PRICES      |    17 |   340 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-31 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss'))
   4 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"= (SELECT /*+ NO_UNNEST */
              MAX("PR2"."PRICE_DATE") FROM "XO_PRICES" "PR2" WHERE "PR2"."PRICE_DATE"<=:B1 AND
              "PR2"."ITEM"=:B2))
   7 - access("PR2"."ITEM"=:B1 AND "PR2"."PRICE_DATE"<=:B2)

Finally we can see (from the predicate for line 4) the we run the subquery at most once for each row from xo_stock_trans and we use the result of each subquery execution to drive the index range scan to pick up the matching rows from xo_prices with no further filtering. The order of operation is: 3, 7, 6, 5, 4, 2, 8, 1, 0

The only thing we can do now is decide whether the strategy for indexing into the xo_prices table 200,000 times (for our 30 day requirement) is better than a brute force approach that does a massive join and sort, or a data duplication approach that puts a “price end date” on each xo_prices row to avoid the need to check all prices for an item to find the appropriate one. Ultimately the choice may depend on trading off the human development resources against the machine run-time resources, with an eye on the number of times the query runs and the size of the date range typically involved.

Footnote:

There’s plenty more I could say about this query and how to handle it – but there are too many questions about the correctness of the data definition and content to make it worth pursuing in detail.  You will note, however, that the various execution plans which logically should be returning the same data report dramatically different cardinalities for the final row source; if nothing else this should warn you that maybe the optimizer is going to have trouble producing a good plan because it’s model produced a bad cardinality estimate at some point in a series of transformations.

In fact, when I first saw this query I converted to traditional Oracle syntax (anticipating, incorrectly, a need to do something messy with hints), corrected the subquery correlation to the “obvious” choice, and put in a cardinality hint /*+ cardinality(tr 100) */ for the xo_stock_trans table, and got the execution plan that I’ve managed to produce as the final plan above.

Tactically the correlation column is the really important bit – if that can be set up suitably we just have to work around the optimizer’s arithmetic assumptions.

 

 

January 3, 2015

Table Duplication

Filed under: Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:54 am BST Jan 3,2015

I’ve probably seen a transformation like the following before and I may even have written about it (though if I have I can’t the article), but since it surprised me when I was experimenting with a little problem a few days ago I thought I’d pass it on as an example of how sophisticated the optimizer can be with query transformation.  I’ll be talking about the actual problem that I was working on in a later post so I won’t give you the table and data definitions in this post, I’ll just show some SQL and its plan:


select
        tr.item, tr.trans_date, tr.quantity
    , pr.gross_price
    , pr.net_price
    , pr.special_price
from
        xo_stock_trans tr,
        xo_prices pr
where
        tr.trans_date between '01-AUG-2014' and '3-AUG-2014'
and     pr.item = tr.item
and     pr.price_date = (
                select
                        max(pr2.price_date)
                from
                        xo_prices pr2
                where   pr2.item = tr.item
                and     pr2.price_date <= tr.trans_date
        )
;

The code is a fairly standard expression of “find me the most recent price available for each stock item as at the stock date of that item”. As you can see I’ve referenced the stock table once and the pricing table twice – the second appearance being in a “max()” correlated subquery. Oracle has decided to unnest the subquery – but spot the interesting detail in the plan:


-----------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |    14M|   748M|       | 77955  (84)| 00:06:30 |
|*  1 |  HASH JOIN                 |                |    14M|   748M|    37M| 77955  (84)| 00:06:30 |
|*  2 |   HASH JOIN                |                |   829K|    28M|       | 70867  (92)| 00:05:55 |
|   3 |    JOIN FILTER CREATE      | :BF0000        | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|*  4 |     TABLE ACCESS FULL      | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|   5 |    VIEW                    | VW_SQ_1        |   210M|  4206M|       | 64135  (94)| 00:05:21 |
|   6 |     HASH GROUP BY          |                |   210M|  5408M|       | 64135  (94)| 00:05:21 |
|   7 |      JOIN FILTER USE       | :BF0000        |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  8 |       HASH JOIN            |                |   210M|  5408M|       | 11807  (67)| 00:01:00 |
|*  9 |        TABLE ACCESS FULL   | XO_STOCK_TRANS | 25274 |   370K|       |  2530  (13)| 00:00:13 |
|* 10 |        INDEX FAST FULL SCAN| XO_PRICES_IX1  |  3918K|    44M|       |  1936  (10)| 00:00:10 |
|  11 |   TABLE ACCESS FULL        | XO_PRICES      |  4571K|    87M|       |  2275  (11)| 00:00:12 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("PR"."ITEM"="TR"."ITEM" AND "PR"."PRICE_DATE"="MAX(PR2.PRICE_DATE)")
   2 - access("ITEM_1"=ROWID)
   4 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   8 - access("PR2"."ITEM"="TR"."ITEM")
       filter("PR2"."PRICE_DATE"<="TR"."TRANS_DATE")
   9 - filter("TR"."TRANS_DATE">=TO_DATE(' 2014-08-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
              AND "TR"."TRANS_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
  10 - filter("PR2"."PRICE_DATE"<=TO_DATE(' 2014-08-03 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

I was running 11.2.0.4 at the time, which is why we can get a serial Bloom filter on the hash join – and it’s interesting to see that the filter has been pushed inside the view operator; but the really interesting part of the plan is the second appearance of the XO_STOCK_TRANS table.

My correlated subquery returns a value that is used in a comparison with a column in the XO_PRICES table, but the correlation predicates referred back to the XO_STOCK_TRANS table, so the optimizer has added the XO_STOCK_TRANS to the subquery as it unnested it.

I’ve written several examples of how we can optimise SQL manually by rewriting it to introduce extra copies of some of the tables (typically in a fashion analogous to the optimizer’s mechanism for star transformations), so it’s nice to see another variation on the theme of the optimizer using table duplication to optimise a statement.

Footnote:

The execution plan in 10.2.0.5 is slightly different, but it still unnests the subquery, introducing a second occurrence of XO_STOCK_TRANS as it does so.

November 12, 2014

Parallel Fun

Filed under: Execution plans,Oracle,Parallel Execution,subqueries — Jonathan Lewis @ 4:42 pm BST Nov 12,2014

As I write, there’s an ongoing thread on Oracle-L that started with the (paraphrased) question: “I’ve got this query that returns 7 million rows; when I change it to ‘select count(*)’ it returns in 4 seconds but when we display the full result set on screen it takes hours, and every second or two the screen pauses; how do I make it go faster.”

The general rapid response was: “You shouldn’t be running 7M rows to a screen – the time is the time for the network traffic and display.”

The first part of the statement is right – the second part is quite likely to be wrong and there’s a very strong hint in the question that makes me say that, it’s the “pauses every second or two”. Of course we don’t know what the OP isn’t telling us, and we don’t know how accurate he is in what he is telling us, so any ideas we have may be completely wrong. For example, we haven’t been given any idea of how long a “pause” is, we don’t really know how accurate that “second or two” might be and whether “every” is an exaggeration, and maybe the query is returning CLOB columns (and that could make a big difference to what you can do to improve performance).

If we take the statement at face value, though, there is one very obvious inference: although some of the time will be due to network traffic time, most of the time is probably due to Oracle doing something expensive for a significant fraction of the rows returned. The pattern of activity probably looks like this:

  • client: call server to fetch next array of rows
  • server: spend some time populating array  — this is where the client sees a pause
  • client: display result array
  • client: call server to fetch next array of rows
  •  etc…

Here’s a trivial example:

connect / as sysdba
set arraysize 500
set pagesize 40

select
        o1.spare1 ,
        (
        select  max((ctime))
        from    obj$    o2
        where   o2.owner# = o1.owner#
        and     o2.obj# < o1.obj#
        ) ct
from obj$ o1
;

On my laptop, running an instance of 11.2.0.4 with about 80,000 rows in obj$ (and a lot of them owned by SYS), I can count seconds and find that (approximately) I alternate between one second watching results scrolling up the screen and one second waiting as the server generates the next 500 rows.

Of course it’s possible to argue that the problem really is the network and nothing but the network struggling to cope with the never-ending stream of little packets produced by 7M rows. Could there be a choke point that causes the data to stop and start with great regularity, maybe – but previous experience says probably not. I have experienced bad network problems in the past, but when they’ve occurred I’ve always observed extremely random stop/go behaviour. The regularity implied in the question makes the Oracle-based problem seem far more likely.

Conveniently a couple of people asked for more clues – like the query text and the execution plan; even more conveniently the OP supplied the answers in this response. Since the email format makes them a little hard to read I’ve copied them here:


SELECT  bunch of stuff.....,

        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr1.RELATED_SID
                        ||
                        ',')
                ORDER BY sr1.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr1
                WHERE   sr1.SID                    = slv.SID
                        AND sr1.RELATIONSHIP_LEVEL = '1'
                GROUP BY sr1.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL1,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr2.RELATED_SID
                        ||
                        ',')
                ORDER BY sr2.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr2
                WHERE   sr2.SID                    = slv.SID
                        AND sr2.RELATIONSHIP_LEVEL = '2'
                GROUP BY sr2.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL2,
        (
               SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr3.RELATED_SID
                        ||
                        ',')
                ORDER BY sr3.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr3
                WHERE   sr3.SID                    = slv.SID
                        AND sr3.RELATIONSHIP_LEVEL = '3'
                GROUP BY sr3.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL3,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr4.RELATED_SID
                        ||
                        ',')
                ORDER BY sr4.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr4
                WHERE   sr4.SID                    = slv.SID
                        AND sr4.RELATIONSHIP_LEVEL = '4'
                GROUP BY sr4.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL4,
        (
                SELECT  RTRIM(XMLSERIALIZE(CONTENT EXTRACT( XMLAGG(XMLELEMENT("e", sr5.RELATED_SID
                        ||
                        ',')
                ORDER BY sr5.RELATED_SID), '//text()' ) ) , ',' )
                FROM    service_relationship sr5
                WHERE   sr5.SID                    = slv.SID
                        AND sr5.RELATIONSHIP_LEVEL = '5'
                GROUP BY sr5.SID
        ) AS RELATEDSERVICEINSTANCEIDLEVEL5
FROM    service_lookup slv
        LEFT JOIN service_location sl
        ON      sl.service_location_id = slv.service_location_id;

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1570133209

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                      |  7331K|  5593M|  1877   (5)| 00:00:01 |        |      |            |
|   1 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   2 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)           | :TQ10000             |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q1,00 | PCWP |            |
|   6 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|   7 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|   8 |    PX SEND QC (RANDOM)           | :TQ20000             |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | P->S | QC (RAND)  |
|   9 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWC |            |
|* 10 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q2,00 | PCWP |            |
|  11 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  12 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  13 |    PX SEND QC (RANDOM)           | :TQ30000             |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | P->S | QC (RAND)  |
|  14 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWC |            |
|* 15 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q3,00 | PCWP |            |
|  16 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  17 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  18 |    PX SEND QC (RANDOM)           | :TQ40000             |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | P->S | QC (RAND)  |
|  19 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWC |            |
|* 20 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q4,00 | PCWP |            |
|  21 |  SORT GROUP BY                   |                      |     1 |    22 |   368   (6)| 00:00:01 |        |      |            |
|  22 |   PX COORDINATOR                 |                      |       |       |            |          |        |      |            |
|  23 |    PX SEND QC (RANDOM)           | :TQ50000             |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | P->S | QC (RAND)  |
|  24 |     PX BLOCK ITERATOR            |                      |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWC |            |
|* 25 |      TABLE ACCESS STORAGE FULL   | SERVICE_RELATIONSHIP |    25 |   550 |   368   (6)| 00:00:01 |  Q5,00 | PCWP |            |
|  26 |  PX COORDINATOR                  |                      |       |       |            |          |        |      |            |
|  27 |   PX SEND QC (RANDOM)            | :TQ60002             |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | P->S | QC (RAND)  |
|* 28 |    HASH JOIN RIGHT OUTER BUFFERED|                      |  7331K|  5593M|  1877   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  29 |     PX RECEIVE                   |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,02 | PCWP |            |
|  30 |      PX SEND HASH                | :TQ60000             |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | P->P | HASH       |
|  31 |       PX BLOCK ITERATOR          |                      |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWC |            |
|  32 |        TABLE ACCESS STORAGE FULL | SERVICE_LOCATION     |  3175K|   920M|   366   (3)| 00:00:01 |  Q6,00 | PCWP |            |
|  33 |     PX RECEIVE                   |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,02 | PCWP |            |
|  34 |      PX SEND HASH                | :TQ60001             |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | P->P | HASH       |
|  35 |       PX BLOCK ITERATOR          |                      |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWC |            |
|  36 |        TABLE ACCESS STORAGE FULL | SERVICE_LOOKUP       |  7331K|  3467M|  1507   (5)| 00:00:01 |  Q6,01 | PCWP |            |
--------------------------------------------------------------------------------------------------------------------------------------

We have a simple two-table outer join, and five scalar subqueries in the select list. (Not being very familiar with the various XML calls I had no idea of what the scalar subqueries were doing, or how they produced a result, beyond the fact that they were querying and aggregating multiple rows. In fact the combination of calls does much the same as listagg(), though it allows for a CLOB result (which could be part of the performance problem, of course) rather than being limited to a varchar2() result).

Would you like to guess at this point why I constructed my demonstration query again obj$ the way I did when presenting the idea of high-cost per row queries as a reason for regular pauses in the output ? The execution plan matched one of my two initial guesses about what the query was going to look like. When you “select count(*) from {this query}”, the optimizer will factor out the scalar subqueries and only have to count the result set from the hash join – and it might even manage to use a couple of parallel index fast full scans to get that result rather than doing the tablescans. When you run the query you have to run the scalar subqueries.

If we trust the statistics, we have 5 subqueries to run for each row of the hash join – and the hash join is predicted to return 7.3 million rows. Given that the subqueries are all going to run parallel tablescans against a fairly large table (note – the cost of the tablescans on SERVICE_RELATIONSHIP is 368, compared to the cost of the tablescan on SERVICE_LOCATION which is 366 to return 3.1M rows) that’s an awful lot of work for each row returned – unless we benefit from an enormous amount of scalar subquery caching.

Here’s another performance threat that the plan shows, though: notice where the PX SEND QC operation appears – that means the PX slaves send their (7M) rows to the Query Co-ordinator and the QC is responsible for doing all the work of running the scalar subqueries. Another interesting little threat visible in the plan shows up in the TQ column – the plan uses six “data flow operations” (using the original naming convention, though that changed some time ago but survived in the column names of v$pq_tqstat). In principle each DFO could allocate two sets of PX slaves (and every DFO could have a different degree of parallelism); in this example DFO number 6 (the driving hash join) uses two sets of slave, and the other five DFOs (the scalar subqueries) use a single set each. The upshot of this is that if the default degree of parallelism in play is N this query will allocate 7N parallel query slaves. It gets a little nastier than that, though (based on checking the output from v$sql_plan_monitor), because each time one of the scalar subqueries runs Oracle seems to allocate and deallocate the slaves that are supposed to run it – which is probably going to cause some contention if there are other parallel queries trying to run at the same time.

Optimisation

So what could you do with this query ? It depends on how much change you want to make to the code.

It’s possible that an index on service_relationship(relationship_level, sid) – with compress 1 – might help if it’s very precise, and if the target table stays in the buffer cache for the duration of the query – but, in the absence scalar subquery caching that could still leave the query co-ordinator executing 35 million (5 queries x 7 million rows) subqueries in a serialised process.

A better bet may be to convert from subqueries to joins – remembering that the listagg() / xmlserialize() calls will require you to aggregate (which means sorting in this case) an estimated 25 rows per driving row per relationship_level; in other words you may need to sort 7M * 125 = 875M rows – but at least you could do that in parallel, and there’s always the possibility that the estimated 25 drops off as you work through the different levels. You could choose to do 5 outer hash joins or (as Iggy Fernandez outlined in the thread) you could do a single outer join with a decode on the relationship_level. Another variation on this theme (which would probably have a plan showing ‘join then aggregate’) would be to ‘aggregate then join’. It’s possible that creating a non-mergeable inline view for the 5 values of relationsip_level from a single table access, aggregating it to produce the five required columns, then using the result in an outer join, would be the most efficient option. In the absence of a detailed understanding of the data volume and patterns it’s hard to make any prediction of which strategy would work best.

Footnote:

I may be wrong in my analysis of this problem. When I first saw the question the reason for the performance pattern suggested an “obvious” design error in either the SQL or the infrastructure, and when I saw that the query and execution plan matched my prediction it became very hard for me to think that there might be some other significant cause.

There were a couple of interesting details in the execution plan that made me pursue the problem a little more. In the first case I built a very simple model to get an estimate of the time needed to display 7M rows of a reasonable width in SQL*Plus running across a typical LAN (my estimate was in the order of 45 minutes – not hours); then I spent a little more time (about 10 minutes) to build a model that reproduced the key features of the execution plan shown.

I then spent two or three hours playing with the model, and I’ll be writing a further blog with some of the results later on. One detail to carry away today, though, is that in 12c Oracle can do a new form of subquery unnesting which transformed the query from its 5 scalar subquery form into the seven table join form that was one of the suggestions made on the thread; even more interestingly, if I blocked the unnesting (to force the subquery execution) Oracle 12.1.0.2 came up with a new operator (EXPRESSION EVALUATION) that allowed it to run the subqueries from the PX slaves before passing the results to the query co-ordinator – in other words eliminating the serialisation point.

To be continued …

November 3, 2014

Upgrades

Filed under: Execution plans,Oracle,Upgrades — Jonathan Lewis @ 6:31 pm BST Nov 3,2014

One of the worst problems with upgrades is that things sometimes stop working. A particular nuisance is the execution plan that suddenly stops appearing, to be replaced by an alternative plan that is much less efficient.

Apart from the nuisance of the time spent trying to force the old plan to re-appear, plus the time spent working out a way of rewriting the query when you finally decide the old plan simply isn’t going to re-appear, there’s also the worry about WHY the old plan won’t appear. Is it some sort of bug, is it that some new optimizer feature has disabled some older optimizer feature, or is it that someone in the optimizer group realised that the old plan was capable of producing the wrong results in some circumstances … it’s that last possibility that I find most worrying.

Here’s an example that appeared recently on OTN that’s still got me wondering about the possibility of wrong results (in the general case). We start with a couple of tables, a view, and a pipelined function. This example is a simple model of the problem that showed up on OTN; it’s based on generated data so that anyone who wants to can play around with it to see if they can bypass the problem without making any significant changes to the shape of the code:


create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	mod(rownum,100)		n_100,
	rpad('x',100)		padding
from
	generator	v1
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	rownum			n1,
	mod(rownum,100)		n_100,
	rpad('x',100)		padding
from
	generator	v1
;

alter table t2 add constraint t2_pk primary key(id);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt	 => 'for all columns size 1'
	);

end;
/

create or replace type myScalarType as object (
        x int,
        y varchar2(15),
        d date
)
/

create or replace type myArrayType as table of myScalarType
/

create or replace function t_fun1(i_in number)
return myArrayType
pipelined
as
begin
	pipe row (myscalartype(i_in,     lpad(i_in,15),     trunc(sysdate) + i_in    ));
	pipe row (myscalartype(i_in + 1, lpad(i_in + 1,15), trunc(sysdate) + i_in + 1));
	return;
end;
/

create or replace view v1
as
select
	--+ leading(t2 x) index(t2)
	x.x, x.y, x.d,
	t2.id, t2.n1
from
	t2,
	table(t_fun1(t2.n_100)) x
where
	mod(t2.n1,3) = 1
union all
select
	--+ leading(t2 x) index(t2)
	x.x, x.y, x.d,
	t2.id, t2.n1
from
	t2,
	table(t_fun1(t2.n_100)) x
where
	mod(t2.n1,3) = 2
;

A key part of the problem is the UNION ALL view, where each subquery holds a join to a pipeline function. We’re about to write a query that joins to this view, and wants to push a join predicate into the view. Here’s the SQL:


select
	/*+ leading(t1 v1) use_nl(v1) */
	v1.x, v1.y, v1.d,
	v1.n1,
	t1.n1
from
	t1,
	v1
where
	t1.n_100 = 0
and	v1.id = t1.n1
;

You’ll notice that the join v1.id = t1.n1 could (in principle) be pushed inside the view to become t2.id = t1.n1 in the two branches of the UNION ALL; this would make it possible for the nested loop that I’ve hinted between t1 and v1 to operate efficiently – and in 11.1.0.7 this is exactly what happens:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        | 16336 |   733K|   123   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                         |        | 16336 |   733K|   123   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|   3 |   VIEW                                | V1     |   163 |  6357 |     1   (0)| 00:00:01 |
|   4 |    UNION-ALL PARTITION                |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|*  7 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
|   9 |     NESTED LOOPS                      |        |  8168 |   103K|    16   (0)| 00:00:01 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |     1 |    11 |     2   (0)| 00:00:01 |
|* 11 |       INDEX UNIQUE SCAN               | T2_PK  |     1 |       |     1   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |       |       |            |          |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   6 - filter(MOD("T2"."N1",3)=1)
   7 - access("T2"."ID"="T1"."N1")
  10 - filter(MOD("T2"."N1",3)=2)
  11 - access("T2"."ID"="T1"."N1")

For each row returned by the tablescan at line 2 we call the view operator at line 3 to generate a rowsource, but we can see in the predicate sections for lines 7 and 11 that the join value has been pushed inside the view, allowing us to access t2 through its primary key index. Depending on the data definitions, constraints, view definition, and version of Oracle, you might see the UNION ALL operator displaying the PARTITION option or the PUSHED PREDICATE option in cases of this type.

So now we upgrade to 11.2.0.4 (probably any 11.2.x.x version) and get the following plan:


------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|   1 |  NESTED LOOPS                         |        |  1633K|    99M|   296K  (4)| 00:24:43 |
|*  2 |   TABLE ACCESS FULL                   | T1     |   100 |   700 |    23   (5)| 00:00:01 |
|*  3 |   VIEW                                | V1     | 16336 |   909K|  2966   (4)| 00:00:15 |
|   4 |    UNION-ALL                          |        |       |       |            |          |
|   5 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|*  6 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|   7 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|   8 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
|   9 |     NESTED LOOPS                      |        |   816K|    10M|  1483   (4)| 00:00:08 |
|* 10 |      TABLE ACCESS BY INDEX ROWID      | T2     |   100 |  1100 |   187   (2)| 00:00:01 |
|  11 |       INDEX FULL SCAN                 | T2_PK  | 10000 |       |    21   (0)| 00:00:01 |
|  12 |      COLLECTION ITERATOR PICKLER FETCH| T_FUN1 |  8168 | 16336 |    13   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N_100"=0)
   3 - filter("V1"."ID"="T1"."N1")
   6 - filter(MOD("T2"."N1",3)=1)
  10 - filter(MOD("T2"."N1",3)=2)

In this plan the critical join predicate appears at line 3; the predicate hasn’t been pushed. On the other hand the index() hints in the view have, inevitably, been obeyed (resulting in index full scans), as has the use_nl() hint in the main query – leading to a rather more expensive and time-consuming execution plan.

The first, quick, debugging step is simply to set the optimizer_features_enable back to 11.1.0.7 – with no effect; the second is to try adding the push_pred() hint to the query – with no effect; the third is to generate the outline section of the execution plans and copy the entire set of hints from the good plan into the bad plan, noting as we do so that the good plan actually uses the hint OLD_PUSH_PRED(@”SEL$1″ “V1″@”SEL$1″ (“T2″.”ID”)) – still no effect.

Since I happen to know a few things about what is likely to appear in the 10053 (optimizer) trace file, my next step would be to flush the shared pool, enable the trace, and then check the trace file (using grep or find depending on whether I was running UNIX or Windows) for the phrase “JPPD bypassed”; this is what I got:


test_ora_9897.trc:OJPPD:     OJPPD bypassed: View contains TABLE expression.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.
test_ora_9897.trc:JPPD:     JPPD bypassed: View not on right-side of outer-join.

So 11.1.0.7 had a plan that used the old_push_pred() hint, but 11.2.0.4 explicitly bypassed the option (the rubric near the top of the trace file translates OJPPD to “old-style (non-cost-based) JPPD”, where JPPD translates to “join predicate push-down”). It looks like the plan we got from 11.1.0.7 has been deliberately blocked in 11.2.0.4. So now it’s time to worry whether or not that means I could have been getting wrong results from 11.1.0.7.

In my test case, of course, I can bypass the problem by explicitly rewriting the query – but I’ll have to move the join with t1 inside the view for both subqueries; alternatively, given the trivial nature of the pipeline function, I could replace the table() operator with a join to another union all view. In real life such changes are not always so easy to implement.

Footnote: the restriction is still in place on 12.1.0.2.

Footnote 2: somewhere I’ve probably published a short note explaining that one of my standard pre-emptive strikes on an upgrade is to run the following command to extract useful information from the executable: “strings -a oracle | grep -v bypass”: it can be very helpful to have a list of situations in which some query transformation is bypassed.

 

October 31, 2014

first_rows(10)

Filed under: Bugs,CBO,Execution plans,Oracle — Jonathan Lewis @ 5:31 pm BST Oct 31,2014

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:


begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',12);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t1_n1 on t1(id, n1);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t2_i1 on t2(x1);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

end;
/


create or replace view  v1
as
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700
;

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:


select
	/*+ gather_plan_statistics */
	v1.small_vc,
	v1.n1
from
	v1,
	t2
where
	t2.id = v1.id
and	t2.x1 = 15000
;

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):


first_rows_10 plan
----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

all_rows plan
------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

first_rows_10 plan hinted under all_rows optimisation
---------------------------------------------------------------------------------------------- 
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
---------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:


first_rows_10 plan
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

all_rows plan
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
-----------------------------------------------------------------------------------------------------------------------

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

Footnote:

It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)

October 19, 2014

Plan depth

Filed under: 12c,Bugs,Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:20 pm BST Oct 19,2014

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing 11.2.0.3; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:


create table t1
as
select
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500
;

create table t2
as
select	* from t1
;

-- call dbms_stats to gather stats

explain plan for
select
	case mod(id,2)
		when 1 then (select max(t1.id) from t1 where t1.id <= t2.id)
		when 0 then (select max(t1.id) from t1 where t1.id >= t2.id)
	end id
from	t2
;

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.


SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         0
         1          0
         2          1
         3          0
         4          3
         5          0

--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |
--------------------------------------------------------------------------------

So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.

 

Update 20th Oct 2014

A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().

 

 

 

September 29, 2014

12c Fixed Subquery

Filed under: Execution plans,Oracle,Tuning — Jonathan Lewis @ 4:18 pm BST Sep 29,2014

It’s been about 8 months since I posted a little note about a “notable change in behaviour” of the optimizer when dealing with subqueries in the where clause that could be used to return a constant, e.g.:


select
	*
from	t1
where	id between (select 10001 from dual)
	   and     (select 90000 from dual)
;

There’s been a note at the start of the script ever since saying: Check if this is also true for any table with ‘select fixed_value from table where primary = constant’ I finally had a few minutes this morning (San Francisco time) to check – and it does, in both 11.2.0.4 and 12.1.0.2. With the t1 table from the previous article run the following:


drop table t2 purge;

create table t2 (
        n1 number(6) not null,
        n2 number(6) not null
);

alter table t2 add constraint t2_pk primary key(n1);

insert into t2 values(1,10000);
insert into t2 values(2,90000);

set autotrace traceonly explain

select * from t1
where   id between (select 10000 from t2 where n1 = 1)
           and     (select 90000 from t2 where n1 = 1)
;

set autotrace off

Instead of the historic 5% of 5% selectivity, the plan shows the optimizer predicting (approximately) the 80,000 rows that it will actually get:

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       | 80003 |  8828K|   218   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | T1    | 80003 |  8828K|   218   (4)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN| T2_PK |     1 |    13 |     0   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| T2_PK |     1 |    13 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"<= (SELECT 90000 FROM "T2" "T2" WHERE "N1"=1) AND
              "ID">= (SELECT 10000 FROM "T2" "T2" WHERE "N1"=1))
   2 - access("N1"=1)
   3 - access("N1"=1)

I can’t think it’s very likely that anyone has written SQL that looks like this – but I’m often surprised by what I see in the field, so if this style looks familiar and you’re still on 11.2.0.3 or lower, watch out for changes in execution plan on the upgrade to 11.2.0.4 or 12c.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,094 other followers