Oracle Scratchpad

June 20, 2016

Plan Shapes

Filed under: Uncategorized — Jonathan Lewis @ 12:58 pm BST Jun 20,2016

There are a number of articles, webinars, and blogs online about how to read execution plans, but many of them seem to stop after the the minimum description of the simplest type of plan, so I thought I’d throw out a brief comment on a couple the slightly more complicated things that are likely to appear fairly commonly because you sometimes find plans with very similar shapes but extremely different interpretation.

First: select with scalar subqueries in the select list (there’s no need to worry about what the table definitions look like):


rem
rem     Script:         plan_shapes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2016
rem

select
        id, n1,
        (select /*+  no_unnest */ max(n1) from t2 where t2.id = t1.n1)  t2_max,
        (select /*+  no_unnest */ max(n1) from t3 where t3.id = t1.n1)  t3_max
from
        t1
where
        id between 100 and 119
;


--------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |     20 |00:00:00.01 |       8 |
|   1 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      63 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     20 |      2 |     40 |00:00:00.01 |      63 |
|*  3 |    INDEX RANGE SCAN                  | T2_I1 |     20 |      2 |     40 |00:00:00.01 |      23 |
|   4 |  SORT AGGREGATE                      |       |     20 |      1 |     20 |00:00:00.01 |      83 |
|   5 |   TABLE ACCESS BY INDEX ROWID BATCHED| T3    |     20 |      3 |     60 |00:00:00.01 |      83 |
|*  6 |    INDEX RANGE SCAN                  | T3_I1 |     20 |      3 |     60 |00:00:00.01 |      23 |
|   7 |  TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |     21 |     20 |00:00:00.01 |       8 |
|*  8 |   INDEX RANGE SCAN                   | T1_I1 |      1 |     21 |     20 |00:00:00.01 |       4 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."ID"=:B1)
   6 - access("T3"."ID"=:B1)
   8 - access("ID">=100 AND "ID"<=119)

We have a select statement, with two columns in the select list generated by (correlated) scalar subqueries.

The overall shape of the plan shows the driving query as the last child plan for the SELECT (operations 7-8). The first and second child plans are the plans for the two scalar subqueries in turn (and the order the sub-plans appear is the order of the scalar subqueries in the select list). In this case the main query returned 20 rows (A-Rows=20), and the scalar subqueries executed 20 times each. There are a few other details we could mention, but the key feature of the plan is that the driver is the last sub-plan.

Second: update with scalar subqueries:


update t1
set
        n1 = (select max(n1) from t2 where t2.id = t1.n1),
        v1 = (select max(v1) from t3 where t3.id = t1.n1)
where
        id between 1000 and 1999
;

---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.13 |   10361 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.13 |   10361 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.04 |    3672 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  6 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.05 |    4588 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |    916 |      3 |   2748 |00:00:00.04 |    4588 |
|*  9 |     INDEX RANGE SCAN                  | T3_I1 |    916 |      3 |   2748 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T2"."ID"=:B1)
   9 - access("T3"."ID"=:B1)


In this statement we update two columns by executing (correlated) scalar subqueries. The most important feature of interpreting this plan is that it is the exact opposite of the earlier select statement. In this plan the first subplan is the driving part of the statement – it’s the part of the plan that tells us how to find rows to be updated (and we find 1,000 of them); the 2nd and 3rd sub-plans correspond to the subqueries in the two columns whose value we set. In this case we happen to get some benefit from scalar subquery caching so the two subqueries each run 916 times. (The fact that both subqueries run the same number of times is not a coincidence – the caching benefit is dependent on the value(s) used for driving the subqueries, and that’s the t1.n1 column in both cases.)

Finally: because people do all sorts of complicated things, and it’s easy to get deceived if you read the plan without seeing the SQL, one more example (starting with the plan):


---------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                      |       |      1 |        |      0 |00:00:00.14 |   11257 |
|   1 |  UPDATE                               | T1    |      1 |        |      0 |00:00:00.14 |   11257 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | T1    |      1 |   1001 |   1000 |00:00:00.01 |      22 |
|*  3 |    INDEX RANGE SCAN                   | T1_I1 |      1 |   1001 |   1000 |00:00:00.01 |       4 |
|   4 |   SORT AGGREGATE                      |       |   1103 |      1 |   1103 |00:00:00.06 |    5519 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T3    |   1103 |      3 |   3306 |00:00:00.04 |    5519 |
|*  6 |     INDEX RANGE SCAN                  | T3_I1 |   1103 |      3 |   3306 |00:00:00.01 |    2213 |
|   7 |   SORT AGGREGATE                      |       |    916 |      1 |    916 |00:00:00.11 |    9191 |
|   8 |    TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    916 |      2 |   1832 |00:00:00.03 |    3672 |
|*  9 |     INDEX RANGE SCAN                  | T2_I1 |    916 |      2 |   1832 |00:00:00.01 |    1840 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1000 AND "ID"<=1999)
   6 - access("T3"."ID"=:B1)
   9 - access("T2"."ID"=:B1)


This plan looks very like the previous plan – it’s an update so the first sub-plan will be identifying the rows to be updated, but then what ?

The easiest assumption – always one worth considering, providing you remember that there are other possibilities – is that we have two subqueries executing for each row we update – but in this example there’s a little clue in the numbers that this isn’what the statement does. The driving query (operations 2 – 3) identifies 1,000 rows, so how can operations 4 – 6 start 1,103 times each ? Something more complicated is going on.

I’ve combined the features of the first two examples. I have a single set subquery, but it contains a scalar subquery in the select list – so operations 4 – 9 are the sub-plan for a single select statement – and a select statement with a scalar subquery in the select list puts the main plan last (operations 7 – 9) and its scalar subquery sub-plan first (operations 4 – 6). Here’s the SQL:


update t1
set
        (n1,v1) = (
                select
                        max(n1),
                        max((select /*+  no_unnest */ max(v1) from t3 where t3.id = t2.n1))
                from    t2
                where   t2.id = t1.n1
        )
where
        id between 1000 and 1999
;

What happens here is that the driving body of the update statement identifies 1,000 rows so the scalar subquery against t2 should execute 1,000 times; thanks to scalar subquery caching, though, it only executes 916 times. Each time it executes it finds 2 row and for each of those rows it executes the scalar subquery against t3 which, fortunately, also benefits from its own scalar subquery caching and so runs only 1,103 times in total

I could go on, of course, with increasingly complex examples – for example scalar subqueries that contain decode() calls with scalar subqueries as their inputs; where clauses which compare the results of scalar subqueries, and so on. It can get quite difficult to see, even with the SQL in front of you, what the plan is trying to tell you so I’ll leave you with one thought: always use the qb_name() hint to name every single subquery so that, if interpretation of the plan gets a bit difficult, you can use the ‘alias’ formatting option in the call to dbms_xplan to connect each table reference in the plan with the query block it came from in the query.

 

 

May 3, 2016

Debugging

Filed under: CBO,compression,Execution plans,Infrastructure,Oracle,Uncategorized — Jonathan Lewis @ 8:11 am BST May 3,2016

The OTN database forum supplied a little puzzle a few days ago – starting with the old, old, question: “Why is the plan with the higher cost taking less time to run?”

The standard (usually correct) answer to this question is that the optimizer doesn’t know all it needs to know to predict what’s going to happen, and even if it had perfect information about your data the model used isn’t perfect anyway. This was the correct answer in this case, but with a little twist in the tail that made it a little more entertaining. Here’s the query, with the two execution plans and the execution statistics from autotrace:


SELECT  /* INDEX(D XPKCLIENT_ACCOUNT) */ 
        E.ECID,A.acct_nb
FROM    
        client_account d, 
        client         e, 
        account        a
where
        A.acct_nb ='00000000000000722616216'</li>


AND     D.CLNT_ID = E.CLNT_ID
AND     D.ACCT_ID=A.ACCT_ID;

Plan (A) with a full tablescan of client_account – cost 808, runtime 1.38 seconds, buffer gets 17,955


-------------------------------------------------------------------------------------------------
| Id | Operation                      | Name           | Rows  | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                |     1 |    59  |   808 (14) | 00:00:10 |
|  1 |  NESTED LOOPS                  |                |     1 |    59  |   808 (14) | 00:00:10 |
|  2 |   NESTED LOOPS                 |                |     1 |    59  |   808 (14) | 00:00:10 |
|* 3 |    HASH JOIN                   |                |     1 |    42  |   806 (14) | 00:00:10 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT        |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT    |     1 |        |     4  (0) | 00:00:01 |
|  6 |     TABLE ACCESS FULL          | CLIENT_ACCOUNT |  9479K|   108M |   763 (10) | 00:00:09 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT      |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT         |     1 |    17  |     2  (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 17955  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Plan (B) with an index fast full scan on a client_account index – cost 1,190, runtime 0.86 seconds, buffer gets 28696


----------------------------------------------------------------------------------------------------
| Id | Operation                      | Name              | Rows  | Bytes  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  1 |  NESTED LOOPS                  |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  2 |   NESTED LOOPS                 |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|* 3 |    HASH JOIN                   |                   |     1 |    42  |  1188  (8) | 00:00:14 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT           |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT       |     1 |        |     4  (0) | 00:00:01 |
|  6 |     INDEX FAST FULL SCAN       | XPKCLIENT_ACCOUNT | 9479K |   108M |  1145  (5) | 00:00:13 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT         |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT            |     1 |    17  |     2  (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 28696  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Note, particularly, that the two plans are the same apart from operation 6 where a full tablescan changes to an index fast full scan, predicting the same number of rows but with an increase of 50% in the cost; the increase in cost is matched by an increase in the reported workload – a 60% increase in the number of consistent reads and no disk reads or recursive SQL in either case. Yet the execution time (on multiple repeated executions) dropped by nearly 40%.

So what’s interesting and informative about the plan ?

The cost of a tablescan or an index fast full scan is easy to calculate; broadly speaking it’s “size of object” / “multiblock read count” * k, where k is some constant relating to the hardware capability. The costs in these plans and the autotrace statistics seem to be telling us that the index is bigger than the table, while the actual run times seem to be telling us that the index has to be smaller than the table.

It’s easy for an index to be bigger than its underlying table, of course; for example, if this table consisted of nothing but two short columns the index could easily be bigger (even after a rebuild) because it would be two short columns plus a rowid. If that were the case here, though, we would expect the time to fast full scan the index to be higher than the time to scan the table.

So two thoughts crossed my mind as I looked at operation 6:

  • Mixing block sizes in a database really messes up the optimizer costing, particularly for tablescans and index fast full scans. Maybe the table had been built in a tablespace using 32KB  blocks while the index had been built in a tablespace using the more common 8KB blocksize – I didn’t want to start working out the arithmetic but that might be just enough to produce the contradiction.
  • Maybe the table was both bigger AND smaller than the index – bigger because it held more data, smaller because it had been compressed. If so then the difference in run-time would be the overhead of decompressing the rows before projecting and comparing the data.

Conveniently the OP has included an extract from the 10053 trace:


Table Stats::
  Table: CLIENT_ACCOUNT  Alias:  D
    #Rows: 9479811  #Blks:  18110  AvgRowLen:  71.00  ChainCnt:  0.00
  Column (#1): CLNT_ID(
    AvgLen: 6 NDV: 1261035 Nulls: 0 Density: 0.000001 Min: 0 Max: 4244786
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 239
  Column (#2): ACCT_ID(
    AvgLen: 6 NDV: 9479811 Nulls: 0 Density: 0.000000 Min: 1 Max: 22028568
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255

Index Stats::
  Index: XPKCLIENT_ACCOUNT  Col#: 1 2
    LVLS: 2  #LB: 28543  #DK: 9479811  LB/K: 1.00  DB/K: 1.00  CLUF: 1809449.00

Note that the index is called xpclient_account – which suggests “primary key” –  and the number of distinct keys in the index (#DK) matches the number of rows in the table(#Rows). The index and table stats seem to be consistent so we’re not looking at a problem of bad statistics.

Now to do some simple (ballpark) arithmetic: for the table can we check if  “rows * average row length / 8K =  blocks”. We can read the numbers directly from the trace file:  9,500,000 * 71 / 8,000 = 84,000.  It’s wrong by a factor of about 4 (so maybe it’s a 32K block, and maybe I could rule out that possibility by including more detail in the arithmetic – like allowing properly for the block header, row overheads, pctfree etc).

For the index – we believe it’s the primary key, so we know the number of rows in the index – it’s the same as the number of distinct keys. As for the length of an index entry, we have the index definition (col#: 1 2) and we happen to have the column stats about those columns so we know their average length. Allowing for the rowid and length bytes we can say that the average index entry is (6 +1) + (6 + 1) + 6 = 20 bytes.  So the number of leaf blocks should be roughy 9,500,000 * 20 / 8,000 = 23,750. That’s close enough given the reported 28,543 and the fact that I haven’t bothered to worry about row overheads, block overheads and pctfree.

The aritmetic provides an obvious guess – which turned out to be correct: the table is compressed, the index isn’t. The optimizer hasn’t allowed for the CPU cost of decompressing the compressed rows, so the time required to decompress 9.5M rows doesn’t appear in the execution plan.

Footnote.

Looking at the column stats, it looks like there are roughly 8 acct_ids for each clnt_id, so it would probably be sensible to compress the primary key index (clnt_id, acct_id) on the first column as this would probably reduce the size of the index by about 20%.

Better still – the client_account table has very short rows – it looks like a typical intersection table with a little extra data carried. Perhaps this is a table that should be an index-organized table with no overflow. It looks like there should also be an index (acct_id, clnt_id) on this table to optimse the path from account to client and this would become a secondary index – interestingly being one of those rare cases where the secondary index on an IOT might actually be a tiny bit smaller than the equivalent index on a heap table because (in recent versions of Oracle) primary key columns that are included in the secondary key are not repeated in the index structure. (It’s a little strange that this index doesn’t seem to exist already – you might have expected it to be there given the OP’s query, and given that it’s an “obvious” requirement as an index to protect the foreign key.)

The only argument against the IOT strategy is that the table clearly compresses very well as a heap table, so a compressed heap table plus two B-tree indexes might be more cost-effective than an IOT with a single secondary index.

 

December 15, 2015

Indexing

Filed under: Uncategorized — Jonathan Lewis @ 11:22 am BST Dec 15,2015

A recent question on the OTN database forum asked:

I have a table with a,b,c,d,e,f,g,h,i,j,k. columns and I have an index on (a,b) columns. There is a sql statement now with “where a= ?” and we are wondering if it could also be good to add a single index on just (a).

Does it help at all? Does it help in some cases?

This is one of those questions where the answer for a perfectly designed and managed system could easily contradict the pragmatic answer for a live system in its current state. That may mean you have to do the wrong thing in the short term while working (possibly very slowly) towards the right thing.  I gave the following (slightly edited) answer on the forum:

The basic answer is that you do NOT need the single column index if you have the two-column index.

The complex answer is that you may have to spend some time and effort ensuring that the two-column index is used in all cases where it would have been appropriate to use the single column index. This may simple mean ensuring the clustering_factor of the index is adjusted suitably so that the optimizer “likes” the index enough it may mean you (also) have to modify some code to include the cluster_by_rowid hint (when you’re at 12c) so that you don’t suffer a performance impact at run-time.

Key factors to consider: the two-column index will be physically larger than the single column index – this will increase the (optimizer’s estimated) cost of using it; the clustering_factor of the two-column index will almost certainly be larger than the clustering_factor of the single column index – this will also increase the (optimizer’s estimated) cost of using it.

These two points are echoed at run-time: the two column index will be bigger so you will have to do more work (though probably not very much more work) to read the relevant rowids and, if you walk the two-column index in order for a given value of the first column, you will visit the table blocks in a different order compared to the order of visits from the single column index – this may result in the query actually doing noticeably more work at run-time.

The change in the index leaf_block count is often insignificant (especially if, as per your example, the number of rows required – hence blocks visited in the table – is large); the impact of the clustering_factor can make a dramatic difference to the cost calculations; but you can often work around this. In 11.2.0.4, particularly, you can use the dbms_stats.set_table_prefs() call to set the ‘table_cached_blocks’ parameter for a table so that all its indexes look more desirable to the optimizer.

Bottom line: you don’t need the single column index but if you currently have it and want to drop it the human effort required to ensure that it can be dropped without side effects may make you decide to keep it anyway, especially if it doesn’t seem to be causing any concurrency or other performance overheads.  If you don’t have it yet, then you shouldn’t need to create it – though you might have to do some work to make sure that the optimizer takes full advantage of the two-column index.

Since I’m on the topic, I’ll add that the same arguments apply to a pair of indexes like (a, b, c) and (a, b); if you’ve got the longer index you shouldn’t need the shorter one; however, because the shorter index is a multi-column index, you might find that it’s beneficial to create a column group on that column combination so that the optimizer doesn’t lose information about the number of distinct values for the combination when you drop the index.

December 3, 2015

Five Hints

Filed under: Hints,Oracle,Uncategorized — Jonathan Lewis @ 7:40 am BST Dec 3,2015

This is the content of a “whitepaper” I wrote for my presentation “Five Hints for Optimising SQL” at the recent DOAG conference.

Introduction

Adding hints to production code is a practice to be avoided if possible, though it’s easy to make the case for emergency patching, and hinting is also useful as the basis of a method of generating SQL Plan Baselines. However, notwithstanding (and sometimes because of) the continuing enhancements to the optimizer, there are cases where the only sensible option for dealing with a problem statement is to constrain the broad brush strategy that the optimizer can take in a way that allows it to find a reasonable execution plan in a reasonable time.

This note describes in some detail the use and effects of five of the “classic” hints that I believe are reasonable strategic options to redirect the optimizer when it doesn’t choose a path that you consider to be the most appropriate choice.

The Big Five

At the time of writing, a query against the view v$sql_hint on Oracle 12.1.0.2 reports 332 hints – but there are very few which we should really consider as safe for production code, and it’s best to view even those as nothing more than a medium-term tool to stabilise performance until the optimizer is able to do a better job with our SQL.

The handful of hints that I tend to rely on for solving problems is basically a set of what I call “structural” queries though in recent years it has become appropriate to label them as “query block” hints. These are hints that give the optimizer some idea of the shape of the best plan without trying to enforce every detail of how it should finalize the plan. The hints (with their negatives where appropriate) are:

  • Unnest / no_unnest — Whether or not to unnest subqueries
  • Push_subq / no_push_subq — When to handle a subquery that has not been unnested
  • Merge / no_merge — Whether to use complex view merging
  • Push_pred / no_push_pred — What to do with join predicates to non-merged views
  • Driving_site — Where to execute a distributed query

Inevitably there are a few other hints that can be very helpful, but a key point I want to stress is that for production code I avoid what I call “micro-management” hints (such as use_nl(), index_rs_asc()) – attempts to control the optimizer’s behaviour to the last little detail; it is very easy to produce massive instability in performance once you start down the path of micro-managing your execution plans, so it’s better not to try.

The rest of this document will be devoted to describing and give examples of these hints.

The Optimizer’s Strategy

You can think of the optimizer as working on a “unit of optimization” which consists of nothing more than a simple statement of the form:

select  list of columns
from    list of tables
where   list of simple predicates

To deal with a more complex query the optimizer stitches together a small number (reduced, if it had its way, to just one) of such simple blocks. So one of the first steps taken by the optimizer aims to transform your initial query into a this simple form. Consider this example:


select
        t1.*,v1.*,t4.*
from
        t1,
        (
        select
                t2.n1, t3.n2, count(*)
        from    t2, t3
        where exists (
                select
                        null
                from    t5
                where   t5.id = t2.n1
                )
        and     t3.n1 = t2.n2
        group by t2.n1, t3.n2
        )       v1,
        t4
where
        v1.n1 = t1.n1
and     t4.n1(+) = v1.n1
;

We have an inline view consisting of a two-table join with a subquery correlated to the first table, and from our perspective we have a “simple join” of three objects – t1, v1, and t4. Before it does anything else the optimizer will try to transform this into a straight-line five-table join so that it can join all the tables in order one after the other. As part of that process it will generally attempt to eliminate subqueries in a processing known as unnesting.

Looking at the query as it has been presented author of the code may have been thinking (symbolically) of the underlying problem as:

  • ( ( t1, ( ( t2, subquery t5 ), t3 ) ), t4 )

Take t1, join to it the result of applying the subquery to t2 and joining t3, then join t4.

The optimizer may decide to transform to produce the following:

  • ( ( ( ( t1, t2 ), t3 ), {unnested t5} ), t4 )

Join t2 to t1, join t3 to the result, join the transformed t5 to the result, then join t4 to the result.

If I decide that the original layout demonstrates the appropriate mechanism, my target is to supply the optimizer with just enough hints to lock it into the order and strategy shown, without trying to dictate every little detail of the plan. My hints would look like this:

select
        /*+
            qb_name(main) push_pred(v1@main)
            no_merge(@inline)
            no_unnest(@subq1) push_subq(@subq1)
        */
        t1.*,v1.*,t4.*
from
        t1,
        (
        select  /*+ qb_name(inline) */
                t2.n1, t3.n2, count(*)
        from    t2, t3
        where exists (
                select  /*+ qb_name(subq1) */
                        null
                from    t5
                where   t5.id = t2.n1
                )
        and     t3.n1 = t2.n2
        group by t2.n1, t3.n2
        )       v1,
        t4
where
        v1.n1 = t1.n1
and     t4.n1(+) = v1.n1
;

I’ve labelled the three separate select clauses with a query block name (qb_name() hint), told the optimizer that the query block named “inline” should be considered as a separately optimized block (no_merge(@inline)), and the subquery inside that block called “subq1” should be treated as a filter subquery (no_unnest(@subq1)) and applied as early as possible (push_subq(@subq1)).

In some circumstances I might use one more hint to tell the optimizer to consider a single join order: t1, v1, t4 using the hint /*+ leading(t1 v1 t4) */; but in this case I’ve told the optimizer to push the join predicate v1.n1 = t1.n1 inside the view (push_pred(@inline)) – which will make the optimizer do a nested loop from table t1 to view v1, resolving the view for each row it selects from t1.

Having captured 4 of the “big 5” hints in one sample statement, I’ll now comments on each of them (and the final driving_site() hint separately).

Merge / No_merge

This pair of hints apply particularly to “complex view merging”, but can be used to “isolate” sections of a query, forcing the optimizer to break one large query into a number of smaller (hence easier) sections. I see two main uses for the hints (and particularly the no_merge option) – one is to help the optimizer get started when handling a query with a large number of table, the other is simply to block a strategy that the optimizer sometimes chooses when it is a bad move.

Consider, in the first case, a query involving 20 tables, with several subqueries. With such a long list it is very easy for the optimizer to pick a very bad starting join order and never reach a good join order; moreover, because of the multiplicative way in which the optimizer estimates selectivity it’s very easy for the optimizer to decide after a few tables that the cardinality of the join so far is so small that it doesn’t really matter which table to access next. In cases like this we might start by writing a simpler query joining the first four of five tables that we know to be the key to the whole query – once we have got the core of the query working efficiently we can “wrap” it into an inline view with a no_merge hint, and then join the rest of the tables to it, with some confidence that the optimizer will start well and that it can’t go far wrong with the remainder of the tables so, for example

select  ...
from    t1, t2, t3, ..., t20
where   {various predicates}
and     exists {correlated subquery1}
and     exists {correlated subquery2}
and     column in {non-correlated subquery}

Might become

with v1 as (
        select  /*+ no_merge cardinality(2000) */ ...
        from    t1, t2, t3, t4, t5
        where   {various predicates{
        and     exists {correlated subquery1}
)
select  ...
from    v1, t6, t7, ..., t20
where   {join conditions to v1}
and     {other join conditions}
and     exists {correlated subquery2}
and     column in {non-correlated subquery}
;

I’ve written the example up using subquery factoring; in earlier versions of Oracle the relevant piece of code would have been written as an inline view, but the “with” clause can help to tidy the SQL up and make it easier to see the logic of what’s being done – provided the practice isn’t taken to such extremes that the final query consists of large number of very small factored subqueries.

I’ve included a cardinality() hint in the factored subquery – it’s not fully documented, and it’s not commonly realised that it can be applied to a query block rather than to a table or list of tables. This query block usage is probably the safest example of using the hint – the table-related usage is badly understood and prone to mis-use.

As an example of blocking a badly selected transformation, consider the following query (where I’ve already included qb_name() hints to name the two separate query blocks):

select  /*+ qb_name(main) */
        t1.vc1, avg_val_t1
from    t1,
        (
        Select  /*+ qb_name(inline) */
                id_parent, avg(val) avg_val_t1 
        from	t2
        group by
                id_parent
        ) v1
where	
        t1.vc2 = 'XYZ'
and     v1.id_parent = t1.id_parent
;

There are two basic strategies the optimizer could use to optimize this query, and the choice would depend on its estimate of how much data it had to handle . Whichever choice it makes we might, depending on our better understanding of the data, want it to choose the alternative (without rewriting the query beyond hinting it).

One option is for Oracle to execute the inline view to generate the aggregate data v1 then join the result to t1; the other is to join t2 (the view’s underlying table) to t1 and then work out an aggregation of the join that would give the same result.

If I want to “join then aggregate” I would use the merge hint, if I wanted to “aggregate then join” I would use the no_merge hint. There are three different ways in which I could introduce the hint:

  • In the inline view itself I could simply add the hint “merge”
  • In the main query I could reference the view by view name “no_merge(v1)”
  • In the main query I could reference the inline query block name “no_merge(@inline)”

Note particularly the “@” symbol that I use to point a hint at a query block; and note that this was not needed when I reference the view name. (The reference by query block name is the more modern, preferred strategy.)

Push_pred / No_push_pred

Once we start dealing with non-mergeable views and have to join to them there are two strategies that we could use for the join; the first is (nominally) to create the entire data set for the view and then use that in a merge join or hash join based on the join predicate, or we could “push a join predicate” into the view definition – in other words for each join value we could add a simple filter predicate to the view definition and derive the view result based on that predicate. For example, if we create a database view called avg_val_view with a definition matching the inline view we used in the previous example, we might see one of two possible execution plans for the following query:

select  t1.vc1, avg_val_t1
from    t1, avg_val_view
where   t1.vc2 = 'XYZ'
and     avg_val_view.id_parent = t1.id_parent
;

First – if the view is non-mergeable and we don’t push the predicate, we can see the join predicate appearing at operation 1, as we do a hash join between table t1 and the entire result set from aggregating t2. This may be sensible, but it may be very expensive to create the entire aggregate:

-------------------------------------------------------------------
| Id | Operation            | Name         | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|  0 | SELECT STATEMENT     |              |     1 |    95 |    27 |
|* 1 |  HASH JOIN           |              |     1 |    95 |    27 |
|* 2 |   TABLE ACCESS FULL  | T1           |     1 |    69 |     2 |
|  3 |   VIEW               | AVG_VAL_VIEW |    32 |   832 |    24 |
|  4 |    HASH GROUP BY     |              |    32 |   224 |    24 |
|  5 |     TABLE ACCESS FULL| T2           |  1024 |  7168 |     5 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("AVG_VAL_VIEW"."ID_PARENT"="T1"."ID_PARENT")
   2 - filter("T1"."VC2"='XYZ')

So we may decide to add the hint /*+ push_pred(avg_val_view) */ to the query – we have to use the view-name method since we don’t have a query block containing the view; if we were using the inline view from the previous query we could have used the “query block” format /*+ push_pred(@inline) */. The plan from pushing predicates is:

--------------------------------------------------------------------
| Id | Operation               | Name        | Rows | Bytes | Cost |
--------------------------------------------------------------------
|  0 | SELECT STATEMENT        |             |    1 |    82 |    7 |
|  1 |  NESTED LOOPS           |             |    1 |    82 |    7 |
|* 2 |   TABLE ACCESS FULL     | T1          |    1 |    69 |    2 |
|  3 |   VIEW PUSHED PREDICATE | AVG_VAL_VIEW|    1 |    13 |    5 |
|* 4 |    FILTER               |             |      |       |      |
|  5 |     SORT AGGREGATE      |             |    1 |     7 |      |
|* 6 |      TABLE ACCESS FULL  | T2          |   32 |   224 |    5 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."VC2"='XYZ')
   4 - filter(COUNT(*)>0)
   6 - filter("ID_PARENT"="T1"."ID_PARENT")

It would actually be a bad idea in this particular case, but if we could access the rows for a given id_parent in t2 efficiently this query could be much more efficient than the previous plan because it would only aggregate the small number of rows that it was going to need at each point, with the smallest row size.

You might note that Oracle has cleverly introduced a filter as operation 4 to eliminate t1 rows where the aggregate would return a row with a zero when there was no matching data. It’s details like this that typical programmers tend to forget when trying to transform SQL by hand.

Unnest / No_unnest

The optimizer prefers joins to subqueries, and will generally try to transform a query to turn a subquery into a join – which often means a semi-join for existence/in, or an anti-join for not exists/not in). As the optimizer has improved with version many such transformations (or decisions to not transform) changed from being driven by rules to being driven by cost – and sometimes we want to override the optimizer because we know its cost calculation is bad. Most commonly we might want to write a query with a subquery – to show our intentions – but tell the optimizer to unnest the subquery: it’s much safer to take this approach rather than to rewrite the query in unnested form ourselves – I’ve seen people do the rewrite incorrectly too many times to trust a user-created rewrite. For example:

select
        /*+ qb_name(main) unnest(@subq) */
        outer.* 
from 
        emp outer
where   outer.sal > (
                select 
                        /*+ qb_name(subq) unnest */
                        avg(inner.sal) 
                from    emp inner 
                where 
                inner.dept_no = outer.dept_no
        )
;

I’ve show the unnest hint here, and demonstrated the two possible forms – you can either use it in the main query block hint to point it at a give query block name (@subq), or you can use it without a “parameter” in the query block you want unnested. In effect the unnest hint causes Oracle to rewrite the query as:

select
        outer.* 
from
        (
        select 
                dept_no, avg(sal) av_sal
        from    emp 
        group by 
                dept_no
        )               inner,
        emp             outer
where 
        outer.dept_no = inner.dept_no
and     outer.sal > inner.av_sal
;

You’ll notice that this gives us an in-line aggregate view, so the optimizer could take (or be pushed) one more step into doing complex view merging as well, joining emp to itself before aggregating on a very messy set of columns.

Here’s the plan if we unnest:

----------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |  1000 | 98000 |   114 |
|*  1 |  HASH JOIN           |         |  1000 | 98000 |   114 |
|   2 |   VIEW               | VW_SQ_1 |     6 |   156 |    77 |
|   3 |    HASH GROUP BY     |         |     6 |    48 |    77 |
|   4 |     TABLE ACCESS FULL| EMP     | 20000 |   156K|    36 |
|   5 |   TABLE ACCESS FULL  | EMP     | 20000 |  1406K|    36 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("ITEM_1"="OUTER"."DEPT_NO")
       filter("OUTER"."SAL">"AVG(INNER.SAL)")

Notice the appearance at operation 2 of a “view” names VW_SQ_1: there are a number of internal view names that appear in Oracle as it transforms queries – the fact that a view name starts with VW_ is a good clue that it’s an internal one. Note, in this particular case that the main work done in the query is the two tablescans of EMP.

Here’s the plan if we don’t unnest:

------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   167 | 12024 |   252 |
|*  1 |  FILTER             |      |       |       |       |
|   2 |   TABLE ACCESS FULL | EMP  | 20000 |  1406K|    36 |
|   3 |   SORT AGGREGATE    |      |     1 |     8 |       |
|*  4 |    TABLE ACCESS FULL| EMP  |  3333 | 26664 |    36 |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OUTER"."SAL"> (SELECT /*+ NO_UNNEST */
              AVG("INNER"."SAL") FROM "EMP" "INNER" 
              WHERE "INNER"."DEPT_NO"=:B1))
   4 - filter("INNER"."DEPT_NO"=:B1)

The FILTER at operation 1 tells us that nominally the optimizer will run the subquery once for every row in the emp table, but the optimizer costing (252) tells us that it thinks that really it will execute the table scan only 7 times in total (7 * 36 = 252): once for the driving scan and six more times because there are only six departments in my emp table.  (This “how many executions” type of estimate appeared in the costing calculations relatively recently.)

Push_subq / No_push_subq

Once we can control whether or not Oracle will unnest a subquery or run it as a filter we can then choose whether the subquery should run early or late. Historically the optimizer would always leave subqueries to the very end of query operation – but recently the choice of timing acquired a costing component. “Pushing” a subquery means pushing it down the execution tree – i.e. running it earlier in the plan. To demonstrate this we need a minimum of a two-table join with subquery:

select
        /*+ leading(t1 t2) push_subq(@subq) */
        t1.v1
from    t1, t3
where   t1.n2 = 15
and     exists (
                select  --+ qb_name(subq) no_unnest push_subq 
                        null
                from    t2
                where   t2.n1 = 15
                and     t2.id = t1.id
        )
and     t3.n1 = t1.n1
and     t3.n2 = 15
;

In this query I have a subquery where I’ve blocked unnesting, so it has to run as a filter subquery (in passing, I’ve use the alternative, less commonly known, format for hinting: the single-line hint/comment that starts with – – for a comment and – – + for a hint).

I’ve shown the push_subq hint (run the subquery early) in two different ways – first at the top of the query referencing the query block that I want pushed, and then in the subquery itself where it doesn’t need a parameter.

As you can see, the subquery is correlated to table t1 and I’ve told Oracle to examine only the join order t1 -> t3. The effect of the push_subq hint, therefore, is to tell Oracle to run the subquery for each row of t1 that it examines and join any survivors to t3. The alternative is for Oracle to join t1 to t3 and then run the subquery for every row in the result. Depending on the data and indexes available either option might be the more efficient.

Here are the two plans – first if I don’t push the subquery (note the FILTER operation):

--------------------------------------------------------------------
| Id | Operation                    | Name  | Rows | Bytes | Cost  |
--------------------------------------------------------------------
|  0 | SELECT STATEMENT             |       |    1 |    28 |   289 |
|* 1 |  FILTER                      |       |      |       |       |
|* 2 |   HASH JOIN                  |       |  173 |  4844 |   116 |
|* 3 |    TABLE ACCESS FULL         | T1    |  157 |  3140 |    57 |
|* 4 |    TABLE ACCESS FULL         | T3    |  157 |  1256 |    57 |
|* 5 |   TABLE ACCESS BY INDEX ROWID| T2    |    1 |     8 |     2 |
|* 6 |    INDEX UNIQUE SCAN         | T2_PK |    1 |       |     1 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ QB_NAME ("SUBQ2") NO_UNNEST */ 0
              FROM "T2" "T2" WHERE "T2"."ID"=:B1 AND "T2"."N1"=15))
   2 - access("T3"."N1"="T1"."N1")
   3 - filter("T1"."N2"=15)
   4 - filter("T3"."N2"=15)
   5 - filter("T2"."N1"=15)
   6 - access("T2"."ID"=:B1)

Then if I push the subquery

--------------------------------------------------------------------
| Id |Operation                     | Name  | Rows | Bytes | Cost  |
--------------------------------------------------------------------
|  0 |SELECT STATEMENT              |       |    9 |   252 |   117 |
|* 1 | HASH JOIN                    |       |    9 |   252 |   115 |
|* 2 |  TABLE ACCESS FULL           | T1    |    8 |   160 |    57 |
|* 3 |   TABLE ACCESS BY INDEX ROWID| T2    |    1 |     8 |     2 |
|* 4 |    INDEX UNIQUE SCAN         | T2_PK |    1 |       |     1 |
|* 5 |  TABLE ACCESS FULL           | T3    |  157 |  1256 |    57 |
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3"."N1"="T1"."N1")
   2 - filter("T1"."N2"=15 AND  EXISTS (SELECT /*+ QB_NAME ("SUBQ2")
              PUSH_SUBQ NO_UNNEST */ 0 FROM "T2" "T2" 
              WHERE "T2"."ID"=:B1 AND "T2"."N1"=15))
   3 - filter("T2"."N1"=15)
   4 - access("T2"."ID"=:B1)
   5 - filter("T3"."N2"=15)

Notice how the access to t2 has squeezed itself between t1 and t3 and is also indented one place as a clue that it is a subordinate action on t1, but the FILTER operation visible in the previous plan has disappeared. This plan is an example of a plan that doesn’t follow the well-known “first child first / recursive descent” guideline – Oracle has hidden the FILTER operation and twisted the plan slightly out of its “tradiational” shape as a consequence.

Driving_site

The final hint is for distributed queries, and has no “negative” version. Sometimes the only way you can “tune” a distributed query is to minimise the time spent on network traffic, and this means dictating WHERE the query executes. The driving_site hint lets you make that choice. (Sometimes, having made that choice you also have to include a leading() hint to tell Oracle about the single join order you want it to consider – it’s possible for the optimizer to do some very strange things with distributed queries, especially if the instances have different NLS settings).

Consider the following query (I’ll fill in the XXXX in the hint shortly):

select  /*+ driving_site (XXXX) */
        dh.small_vc,
        da.large_vc
from
        dist_home               dh,
        dist_away@remote_db     da
where
        dh.small_vc like '1%'
and     da.id = dh.id;

This query extracts a small amount of data from a table called DIST_HOME in the local database, and joins it to some data in a table called DIST_AWAY in a remote database, producing a reasonably large number of medium-sized rows. There are basically two obvious plans:

  • nested loop – for each row in dist_home, query dist_away for matching data
  • hash join – create an in-memory hash table from the dist_home data, and then probe it with data from all the rows in dist_away.

The first plan will produce a large number of network round trips – so that’s not very good; the second plan will pull a very large amount of data from the remote database if the query operates at the local database (it’s only the columns we need, but it will be ALL the rows from the remote database).

Choosing the second plan but executing it at the remote database means we’ll send a small parcel of data to the remote database, do the join there to produce (we hope) a reasonable result set, then send it back to the local database. The network traffic will be minimised without causing an undesirable increase in other resource usage. To make this plan happen all I needed to do in the query was change the XXXX in the driving_site() hint to reference a table alias from a table in the remote database, in this case driving_site(da).

Here’s the execution plan:

-----------------------------------------------------------------------
| Id | Operation              | Name     | Rows | Bytes | Inst |IN-OUT|
-----------------------------------------------------------------------
|  0 | SELECT STATEMENT REMOTE|          |  216 | 48600 |      |      |
|* 1 |  HASH JOIN             |          |  216 | 48600 |      |      |
|  2 |   REMOTE               | DIST_HOME|  216 |  4320 |    ! | R->S |
|  3 |   TABLE ACCESS FULL    | DIST_AWAY| 2000 |   400K| TEST |      |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  1 - access("A1"."ID"="A2"."ID")

Remote SQL Information (identified by operation id):
----------------------------------------------------
  2 - SELECT "ID","SMALL_VC" FROM "DIST_HOME" "A2" WHERE "SMALL_VC" 
      LIKE '1%' (accessing '!')

Notice how the top line (id 0) includes the keyword REMOTE – this tells you that this is the plan from the viewpoint of the remote database/instance that will be executing it. Remember that from its viewpoint the database that we think is the local database is one that it thinks is remote – hence the REMOTE operation 2 which is addressing (our) local table DIST_HOME.

Other key points to note are the appearance of the Inst (instance) and IN-OUT columns. These tell you where each table is located – when a query executes remotely “our” database is tagged only by the name “!”.

A nice feature of the execution plan for a distributed query is that you can see how the query has been decomposed for execution at the remote site. In this case the other database will be sending our database the query at operation 2 to pull the rows it wants from small_vc so that it can do the join at its site and send the result back to us.

The thing you generally don’t want to see in more complex distributed queries is a separate query being generated for each remote table involved in the join – tables that live remotely should be joined remotely with just the join result being pulled back to the local database.

There is a special warning that goes with this hint – it isn’t valid for the select statements in “create as select” and “insert as select”. There seems to be no good reason for this limitation, but for CTAS and “insert as select” the query has to operate at the site of the table that is receiving the data. This means that you may be able to tune a naked SELECT to perform very well and then find that you can’t get the CTAS to use the same execution plan. A typical workaround to this problem is to wrap the select statement into a pipelined function and do a select from table(pipelined_function).

Conclusion

There are literally hundreds of hints available but, as a general guideline, there are only a few that are particularly useful and strategically sound. In this article I’ve listed the five hints that I’ve long considered to be the ones that are of most help and least risk. I have mentioned a couple of other hints in passing, and know that there are a couple of hints in the newer versions of Oracle that should eventually be added to the list; but the five I’ve mentioned give a sound basis to work from in understanding the benefits of using hints that shape the optimizer’s strategy for a query without trying to micro-manage it.

November 30, 2015

Trouble-shooting

Filed under: Oracle,Troubleshooting,Uncategorized — Jonathan Lewis @ 6:03 am BST Nov 30,2015

This is the text of the “whitepaper” I submitted to DOAG for my presentation on “Core Strategies for Troubleshooting”.

Introduction

In an ideal world, everyone who had to handle performance problems would have access to ASH and the AWR through a graphic interface – but even with these tools you still have to pick the right approach, recognise the correct targets, and acquire information at the boundary that tells you why you have a performance problem and the ways in which you should be addressing it.

There are only three ways you can waste resources on an Oracle system, and only three different types of activity that need to be investigated. If you don’t appreciate that this is the case then you can waste a lot of time following the wrong strategy and attempting to solve the wrong problems. Once you have a clear statement of what you are trying to achieve it becomes much easier to achieve those aims.

Three ways to cause problems

There really are only three symptoms you need to look out for in the database

  • You’re doing it the hard way
  • You’re doing it too often
  • You’re queueing / locking

Inevitably you can see that there is going to be some overlap between the three (and it would be easy to argue that the third is simply a side effect of the other two). If you are executing a single statement “the hard way” you’re likely to be doing single block reads or buffer gets too often. If you’re executing a very lightweight statement too often it’s probably a step in a process that is attempting get a job done “the hard way”, and it’s probably causing (and getting stuck in) queues for latches and mutexes. I’ve included queueing as the third option because the simple action of locking data (deliberately, or by accident) doesn’t fall neatly into the other two.

Another way of looking at this list is to reduce it to just two items with a slightly different flavour: when you spend too much time on a task it’s because you’re either doing too much work or you’re not being allowed to work.

Three classes of problems to solve

An important aspect of performance problems is the high-level classification. Labelling the class of problem properly points you to the correct strategy for investigating the problem. Again there are only three possibilities in the list, which I’ll express as typical complaints:

  • My report is taking too long to run / screen is taking to long to refresh
  • The batch job over-ran the SLA last night
  • The “system” is slow

What’s the most significant difference between three classes of complaint ?

  • “My XXX takes too long”: XXX is a repeatable event that can be isolated and watched – just do it again, and again, and again, and again while I watch every step of what’s going on.
  • The batch job over-ran last night: It’s not a repeatable process, so you’ve got to infer what the problem was from historical evidence; it’s (usually) made up of a number of concurrent processes, which may interfere with each other to varying degrees depending on when their relative start and finish times were.
  • The “system” is slow: possibly all the time, possibly intermittently – if there’s no specific complaint then the only option is to keep an eye open for resource-intensive activity to see if you can reduce the resource usage of specific individual tasks (for which read SQL or PL/SQL statements) or reduce the number of times that those tasks are executed.

The common source

Despite the fact that we can classify performance problems in three ways, it’s worth remembering that ALL the information we might use to drive our diagnosis comes from one place – the Oracle database. At any moment we have sessions that are active, operating statements (cursors), and using a resource such as a file, a latch, a buffer, and so on. We could almost represent each moment by a cube with sessions along one side, cursors along another, and resources along the third – the cube changes moment by moment, allowing us to visualise time as the fourth dimension in a hypercube of activity.

Instant by instant the Oracle kernel code knows which session is using what resource to operate which cursor and, although the total volume of all that information is far more than could reasonably be recorded, Oracle has many different ways of slicing, dicing and capturing parts of that hypercube – running totals, rolling aggregates, snapshots by session, by cursor, by resource and so on – that are made visible as the dynamic performance views (v$ objects). Trouble-shooting is largely a case of deciding which dynamic performance views are the most appropriate to use for our three classes of task.

Active Session History

Before reviewing the three classes, it’s worth breaking off for a moment to say a few things about one of the most important and useful views that we have of the wealth of information available; this is the active session history (v$active_session_history / ASH) which Oracle uses to capture a snapshot once every second of what each currently active session is doing; every 10th snapshot is then echoed down into the Automatic Workload Repository (AWR) by a process that runs every hour to copy the dynamic view to a table that can be accessed through the database view dba_hist_active_sess_history.

The capture rates can all be adjusted: I have never seen anyone change from one snapshot per second, or every 10th snapshot in a production system, but I’ve often seen the dump to the AWR taken every 30 minutes, occasionally 20 or even 15 minutes. On occasion I have asked clients to do a CTAS (create table as select – nologging) to capture the whole of the v$active_session_history to a table that can be exported and reviewed at a later date. Oracle tries to keep a minimum of at least an hour’s ASH in memory but, with a large enough SGA, you may find that this will stretch out to 3 or 4 hours.

Unfortunately although (or perhaps because) ASH and its AWR history are extremely helpful, you have to pay extra licence fees to use the information, and the technology can only be licensed with the Enterprise Edition of Oracle. (Which is why there are some ASH emulators on the Internet)

My report is slow

The special feature of someone complaining about a specific task is that it’s likely to be repeatable – so we can run it again and again and watch every single detail to see where the time goes. Our slice through the hypercube could take a single session over a period of time and report every action along that path. This, of course, is the 10046 – a.k.a extended SQL trace event. We can enable it in many ways, perhaps through a logon trigger, perhaps through a call to dbms_monitor:


begin
        dbms_monitor.session_trace_enable( 
                session_id => &m_sid, 
                serial_num => &m_serial,
                waits      => true, 
                bind       => true, 
                plan_stat  => 'all_executions'
        );
end; 
/

In this example I’ve request all wait states and bind variable to be dumped into the trace file, I’ve also requested that the execution plan (with rowsource execution stats) be dumped for every single execution of every single statement. Sometimes a problem arises because a particular set of bind variables represents a special case that causes a “reasonable” plan to behave very badly. If we’re going to look closely we may as well get as much detail as possible.

The entire “trace” interface was upgraded dramatically in 11g, and one of the useful variants on this theme is particularly relevant to a commonly used Web-based implementation. If you know that a specific screen task corresponds to a particular PL/SQL package you can enable tracing of a cursor (across the system, if necessary) by SQL_ID. So, for example, you might issue the following two commands, with a couple of minutes gap between the two:


alter system
        set events '
                sql_trace[SQL:1wthpj7as7urp]
                plan_stat=all_executions,
                wait=true, 
                bind=true
        '
;

-- wait a few minutes

alter system
        set events 'sql_trace[SQL:1wthpj7as7urp] off'
;

Every time the statement with SQL_ID =‘1wthpj7as7urp’ is executed, the session executing it will start adding information to the session trace file, and when the statement ends the tracing will end. This is particularly nice if the “statement” is a top-level call to a PL/SQL procedure because all the SQL inside the procedure will be traced as the package executes.

For a highly focused, highly repeatable task, the 10046 trace event is almost always all you need to do.

The batch over-ran

The big difference between this case and the previous one is that “the batch” is not something you can simply repeat and watch. Moreover, “the batch” is likely to be a large number of separate sections of code that are scheduled to run with a fairly fluid timetable that can result in changes from day to day (or, more likely, night to night) in the set of jobs that might be running concurrently. This means that even if you could re-run the batch job (perhaps on the previous night’s backup) you might not see the same problem appear because a small change in timing could result in a large change in contention).

One of the most important steps of dealing with the batch is pre-emptive: instrument your code and make it possible to compare the run on one night with the run on another. At the very least you need to have something capturing the start and end times of each “significant component” of the batch so you can quickly answer questions like: “which jobs took much longer than usual”, “which job was the first job that took longer than usual”, “which jobs were running concurrently with job X last night when they never usually overlap?”

Ideally you should have much more information than this about each job – basically a report from Oracle which says “how much work did I do, how much time did I spend”: for a session this is simply a report of v$mystat or v$sesstat (joined to v$statname) and v$session_event for the session (v$mystat is a less well- known view that is v$sesstat restricted to “my” session) ; if you classify each job as “connect to the database, do something, disconnect” then this critical log is simply a pair of select statements spooled out somewhere convenient, or written to the database; if you want to break a single connection into significant pieces then a simple pl/sql procedure could read the statistics into a pl/sql array as the piece starts, then re-read the stats and calculate the differences as the piece ends.

Knowing where the time went, and knowing how tasks have behaved differently from previous runs is a big step forward to identifying the problem.

If you don’t have the instrumentation you need then the AWR (if you’re licensed) or Statspack (if you’re not licensed) is a step in the right direction. Apart from the typical hourly snapshots and reports you can take a snapshot as the first and last steps of the batch so that you’ve got “the whole batch” in a single AWR/Statspack report. If you’ve got that you can then do comparisons for things like:

  • Which event consumed much more time than usual
  • Which SQL took much more time than usual
  • Which segment(s) saw much more activity than usual
  • Was there some unusual enqueue activity
  • Can we see some unusual outliers in the event histograms
  • Can we see some unusual memory demands in the pga histogram

Although system-wide summaries rarely point us at exact causes, they can often give us strong clues of areas (and times) where problem originated.

In this respect the “Top Activity” screen from Enterprise Manager (Grid Control / Cloud Control) can be very helpful as it produces a nice graphical presentation of “working”. Where, in the picture of last night’s activity, does the graph start to ramp up, and what colour is the bit that’s growing, and how does that picture compare to the same picture the previous night. (Having two windows open with two different nights makes it wonderfully easy to switch between displays and spot the differences.) Since the “historical” top activity screen is created from the dba_hist_active_sess_history, which contains about 100 different details per session of each captured moment, it’s very easy to drill though the spikes to answer questions like: “which object”, “what event”, “which SQL”, “what was the execution plan”, “how much work did that take”, to follow the chain of time back to the cause.

The system is slow

If no-one is going to tell you about specific tasks, and if you don’t have any sort of boundary that allows you to focus on tasks or time-ranges, then the simplest thing to do is look for anything expensive (i.e. time-consuming) and see if you can make it cheaper.

Again, the graphic “Top Activity” screen is very helpful, and I often tell people to arrange to have a system that shows the top activity screens for the most important 2 or 3 databases on a large screen on the wall where any passing DBA might notice a brief spike in workload. There are systems that can be improved by constant monitoring – so long as the monitoring doesn’t take out 100% of an individual’s time but is driven as an informal glance at a picture.

If you’re not licensed to take advantage of the AWR then Statspack can help – but with the smallest time interval (though 15 minutes is as low as I’ve ever gone) between snapshots so that “anomlies” that are short-lived don’t fall out of memory before they can be captured.

An important feature of reading Statspack is that you need to check for missing information – if the headline figure for physical reads is 25M but the “SQL ordered by reads” is 12M then you know that there must be 13M reads that didn’t get captured in the report and that might be the 13M that is causing the problem. Similarly if the “Segments by physical reads” reports 16M reads that’s 4M more than the SQL – but is the 12M a subset of the 16M, or is there only a 3M overlap between the two figures so that between them the 12M and 16M cover the entire 25M. There’s more information in the Statspack report than immediately meets the eye, and a careful drilldown into other areas of the report (typically the Instance Activity) may be needed to clarify the meaning of what you’re seeing as a headline figure.

The other thing you can do with “the slow system” when you don’t have ASH to help is take snapshots (or get some freeware to do the same sort of thing). If the system is slow “right now” you could, for example, take a snapshot of v$sess_io (session I/O), wait 30 seconds then take another snapshot, find the difference and see who is doing most of the I/O work – then chase that session; or take snapshots of v$sesstat limited to (say) statistics like “%redo%” and find out who is generating lots of redo.

Oracle allows you to take this approach back into recent history – there are a number of “metric” views which give you thing like the rolling average, min, and max I/O volumes for the last few intervals of 5 minutes or 1 minute each – telling you, in other words, whether there were any interesting bursts of extreme activity in the recent past. For example, a query against v$sysmetric_summary might give you an output like the following:

METRIC_NAME                     MAXVAL     AVERAGE STANDARD_DEV METRIC_UNIT
------------------------ ------------- ----------- ------------ -----------------------
Physical Reads Per Sec        1,618.95      105.92       358.16 Reads Per Second
Physical Reads Per Txn       97,202.00    5,539.19    20,811.56 Reads Per Txn
Redo Generated Per Sec    6,773,108.94  218,132.86 1,023,458.57 Bytes Per Second
User Calls Per Txn              395.00       43.39        79.85 Calls Per Txn
Total Parse Count Per Sec        31.14        1.88         4.25 Parses Per Second
Host CPU Utilization (%)         64.51        3.93         9.07 % Busy/(Idle+Busy)
Database Time Per Sec            82.96        6.65        15.37 CentiSeconds Per Second
I/O Megabytes per Second         35.58        2.62         5.73 Megabtyes per Second

This summarises the last 12 intervals of 5 minutes. If we look at “Physical Reads per Txn” we can see that there were some extreme swings in activity over that period, so we could drill down into v$sysmetric_history for “Physical Reads per txn”, looking at the 1 minute granularity and see:

METRIC_UNIT                    BEGIN_TIME                VALUE
Physical Reads Per Txn         05-feb 12:45:55          421.00
                               05-feb 12:44:55          477.00
                               05-feb 12:43:55          351.00
                               05-feb 12:42:55          406.84
                               05-feb 12:41:55        1,550.00
                               05-feb 12:40:55       93,984.00
                               05-feb 12:39:55       97,202.00
                               05-feb 12:38:55       93,323.00
                               05-feb 12:37:55          391.00
                               05-feb 12:36:55          504.00
                               05-feb 12:35:55          504.00
                               05-feb 12:34:55          252.00

Yes, a few minutes ago something exploded onto the system doing a huge amount of I/O for about 3 minutes. If we’re lucky we might now drill into the v$sesstat, or v$sess_io, or v$session_event to see if we can find a session that is responsible for a large amount of I/O; and then check v$open_cursor to see if it still has some open cursors that might (if we check v$sql) show us what caused the I/O.

When there are no specific complaints, we just keep an eye open for spikes in activity and try track them down as quickly and cheaply as possible to see if they’re worth addressing.

Summary

Oracle gives you a huge amount of information about the work that’s going on and the time that’s being used in the database. Unfortunately the most useful repository of that information is in a dynamic performance view that can only be viewed in the Enterprise Edition after purchasing additional licences. However, the information is summarised, in many different ways in literally hundreds of other dynamic performance views, and it’s easy to pick out helpful information from those views in a variety of ways.

Key to making the best use of those views, though, is recognising that different classes of performance problems require different strategies – and there are only three different classes of problems to worry about.

November 25, 2015

Tech 15

Filed under: Uncategorized — Jonathan Lewis @ 1:00 pm BST Nov 25,2015

Updated 27th Nov 2015:

Thanks for all the questions so far – there are plenty of useful topics coming out. At this rate I may have to pass some of these on to the round-table that Tony Hasler.

I’ve made a couple of comments in response, but generally I plan to avoid making any comments until after the panel.

Monday 7th Dec: 11:20 – 12:05

I’ve arranged a panel session on the Cost Based Optimizer for UKOUG Tech 2015, with Maria Colgan, Nigel Bayliss, and Chris Antognini joining me to answer (and maybe argue about) questions from the audience.

To keep things moving along, we aim to have a few questions available before the day and collect a few questions at the door as well as accepting questions from the floor. Martin Widlake and Neil Chandler will be acting as MCs, wielding the microphones, and collecting questions at the door.

So far we have a handful of questions – if you have a question you’d like to raise, add it to the comments below. Even if you can’t attend we may get to your question, and I may be able to write up a summary of questions and answers after the event.

Questions so far:

  • Why do queries that have been working fine for months suddenly change to a very slow plan ?
  • Are smaller, simpler queries less likely to have a bad plan or change from a good plan to a bad one ?
  • What is the down-side of using SQL Profiles to lock in plans ?
  • What’s the best way to collect stats on a partitioned table where you add a daily / weekly / monthly partition ?
  • Is there a good way to make sure you’ve collected stats for interval partitions when you don’t know they’ve been added ?
  • Why do the manuals talk about “prefixed” local indexes (on partitioned tables) as if they have some special effect ?
  • Why does a query with literals run faster than one with bind variables? Stats are up to date on all tables concerned.
  • For a single table an SQL Plan directive will create extended statistics, but how does the Optimizer resolve SPD for the group by or join queries ?
  • I’ve heard that for the group by and join queries, SPD always forces dynamic sampling, is this true ?
  • Will SPD with dynamic sampling take precedence over e.g. columns with histograms?
  • What is the order of precedence the optimizer uses when resolving table cardinality?
  • Will 12.2 allow us to combine column groups and expression as a single extended statistic?

May 8, 2015

Cost

Filed under: Uncategorized — Jonathan Lewis @ 8:21 am BST May 8,2015

I’ve just been checking “Cost Based Oracle – Fundamentals” (Apress 2005) to see what I said on a particular topic, and I couldn’t resist quoting the following from the opening page of Chapter 1:

One of the commonest questions about the CBO on the Internet is: “What does the cost represent?” This is usually followed by comments like: “According to explain plan the cost of doing a hash join for this query is seven million and the cost of a nested loop is forty-two – but the hash join completes in three seconds and the nested loop takes 14 hours.”

The answer is simple: the cost represents (and has always represented) the optimizer’s best estimate of the time it will take to execute the statement. But how can this be true when people can see oddities like the hash join / nested loop join example above? The answer can usually be found in that good old acronym GIGO: Garbage In, Garbage Out.

The CBO makes errors for six main reasons:

  • There are some inappropriate assumptions built into the cost model.
  • The relevant statistics about the data distribution are available, but misleading
  • The relevant statistics about the data distribution are not available
  • The performance characteristics of the hardware are not known
  • The current workload is not known
  • There are bugs in the code

Still true – although there are more features and cunning bits where inappropriate assumptions and bugs can appear.

 

 

January 12, 2015

FBI Bug reprise

Filed under: Execution plans,Oracle,Troubleshooting,Uncategorized — Jonathan Lewis @ 12:52 pm BST Jan 12,2015

I’ve just had cause to resurrect a blog note I wrote three years ago. The note says that an anomaly I discovered in 9.2.0.8 wasfixed in 10.2.0.3 – and this is true for the simple example in the posting; but a recent question on the OTN database forum has shown that the bug still appears in more complex cases.  Here’s some code to create a table and two indexes:


create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum			id,
	cast(dbms_random.string('U',2) as char(2))	c1,
	cast(dbms_random.string('U',2) as char(2))	c2,
	cast(dbms_random.string('U',2) as char(2))	c3,
	cast(dbms_random.string('U',2) as char(2))	c4,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e5
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create index t1_iasc  on t1(c1, c2,      c3, c4) nologging;
create index t1_idesc on t1(c1, c2 desc, c3, c4) nologging;

I’ve designed the table to model the way a problem was presented on OTN, it’s possible that the anomaly would appear in simpler circumstance; note that I have two indexes on the same four columns, but the second column of one of the indexes is declared as descending. To identify the indexes easily in execution plans the latter index has the text “desc” in its name. So here’s a query, with execution plan, that should use one of these indexes:


select
        *
from t1
where
        (C1 = 'DE' and C2 >  'AB')
or      (C1 = 'DE' and C2 >= 'AB' and C3 > 'AA' )
or      (C1 = 'DE' and C2 >= 'AB' and C3 >= 'AA' and C4 >= 'BB')
order by
        C1, C2, C3, C4
;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |    21 |  2478 |     4  (25)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |    21 |       |     3  (34)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(((SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') AND "C2">'AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C3">'AA' AND "C2">='AB') OR
              (SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C4">='BB' AND "C2">='AB' AND
              "C3">='AA')))

Sure enough – the query has used the t1_iasc index – but why has the optimizer introduced all those predicates with the sys_op_descend() function calls in them when we’re not using an index with a descending column ? Somewhere in the code path the optimizer has picked up the other index, and decided to use some information from it that is totally redundant. One side effect of this is that the cardinality prediction is 21 – if I drop the index t1_idesc the sys_op_descend() calls disappear and the cardinality changes to 148.

Oracle 12c behaves differently – it uses concatenation to turn the query into three separate index probes unless I add the hint /*+ first_rows */ (which I shouldn’t be using, but it’s the hint supplied by the OP on OTN). With the hint in place we get an example of the actual execution plan differing from the prediction made through explain plan:


12c execution plan unhinted (concatenation, and lots of sys_op_descend()):

--------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |          |       |       |    12 (100)|          |
|   1 |  SORT ORDER BY                        |          |   149 | 17582 |    12   (9)| 00:00:01 |
|   2 |   CONCATENATION                       |          |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |   142 | 16756 |     4   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     6 |   708 |     4   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN                  | T1_IDESC |     1 |       |     2   (0)| 00:00:01 |
|   7 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |   118 |     3   (0)| 00:00:01 |
|*  8 |     INDEX RANGE SCAN                  | T1_IASC  |     1 |       |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("C1"='DE' AND "C3">='AA' AND "C4">='BB' AND
              SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C4">='BB' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND "C3">='AA')
   6 - access("C1"='DE' AND "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB'))
       filter("C3">'AA' AND SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB' AND
              (LNNVL("C4">='BB') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL(SYS_OP_UNDESCEND(SYS_OP_DESCEND("C2"))>='AB') OR LNNVL("C3">='AA')))
   8 - access("C1"='DE' AND "C2">'AB' AND "C2" IS NOT NULL)
       filter((LNNVL("C3">'AA') OR LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR
              LNNVL("C2">='AB')) AND (LNNVL("C4">='BB') OR
              LNNVL(SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB')) OR LNNVL("C2">='AB') OR
              LNNVL("C3">='AA')))

12c Execution plan with first_rows hint (and the sys_op_descend have gone)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |   150 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter(("C2">'AB' OR ("C3">'AA' AND "C2">='AB') OR ("C4">='BB' AND
              "C2">='AB' AND "C3">='AA')))

12c Execution plan with first_rows according to Explain Plan (and they're back again)
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   148 | 17464 |   150   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |   148 | 17464 |   150   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IASC |   148 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C1"='DE')
       filter("C2">'AB' AND SYS_OP_DESCEND("C2")<SYS_OP_DESCEND('AB') OR
              "C3">'AA' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' OR
              "C4">='BB' AND SYS_OP_DESCEND("C2")<=SYS_OP_DESCEND('AB') AND "C2">='AB' AND
              "C3">='AA')

On the plus side the upgrade to 12c has removed some of the sys_op_descend() appearances and seems to have fixed the cardinality anomaly when the sys_op_descend() calls do appear – but things are still going wrong in odd ways. (And this looks like another case where you could drop an index that you’re not using and seeing an execution plan change.)

Footnote:

When playing around with 12c, I had to be very careful to avoid confusing the issue as SQL Plan Directives were appearing from time to time, and giving me unexpected results as I repeated some of the experiments.

Footnote 2:

I haven’t actually addressed the question originally posed on OTN, but that can wait for another day.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,549 other followers