Oracle Scratchpad

June 8, 2011

How to hint – 1

Filed under: dbms_xplan,Execution plans,Hints,Oracle,subqueries,Troubleshooting — Jonathan Lewis @ 3:00 pm BST Jun 8,2011

Here’s a quick tutorial in hinting, promped by a question on the OTN database forum.
The OP has a hash semi-join and Oracle appears to be ignoring a hint to use a nested loop:

> I tried forcing the optimizer to not use hash join by adding NO_USE_HASH, USE_NL to my sql but it doesn’t seem to work.
> Can anyone please help check what I have done wrong.

> select /*+ NO_USE_HASH(C2)  USE_NL(C2) */
>         SC.SID, SC.MID, SC.INDATE, SC.EXDATE, SC.AUDATE
> FROM    SSCLASS SC
> WHERE   SC.SID = 0
> AND     SC.CID = 0
> AND     SC.MID = 1
> AND     SC.INDATE <= SC.EXDATE
> AND     EXISTS (
>                 SELECT  SSCID FROM SSCLASS C2
>                 WHERE   C2.SSCID = SC.SSCID
>                 AND     C2.AUDATE >= to_date('2009-01-01','yyyy-MM-dd')
>         )
> ORDER BY
>        SSCID, INDATE, EXDATE
>
> PLAN_TABLE_OUTPUT
> Plan hash value: 1476588646
>
> ------------------------------------------------------------------------------------------------------
> | Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
> ------------------------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT              |              |   204K|    10M|       | 35799   (1)| 00:07:10 |
> |   1 |  SORT ORDER BY                |              |   204K|    10M|    25M| 35799   (1)| 00:07:10 |
> |*  2 |   HASH JOIN SEMI              |              |   204K|    10M|    10M| 33077   (1)| 00:06:37 |
> |*  3 |    TABLE ACCESS BY INDEX ROWID| SSCLASS      |   204K|  7983K|       |  9110   (1)| 00:01:50 |
> |*  4 |     INDEX RANGE SCAN          | X5_SSCLASS   |   204K|       |       |   582   (1)| 00:00:07 |
> |*  5 |    INDEX RANGE SCAN           | X6_SSCLASS   |  4955K|    66M|       | 17276   (1)| 00:03:28 |
> ------------------------------------------------------------------------------------------------------

I’m not going to argue about what plans might be good or bad, and I’m going to assume the OP simply wants a nested loop semi join using a “good” index into the table aliased as C2; so I’m just going to demonstrate on this simple example how to approach that specific problem. The critical error the OP has made is that the join he’s trying to affect doesn’t exist in the query block where he’s put his hint – so he needs to find out what query will exist after the subquery has been nested and the optimizer is looking at the semi-join.

Here’s initial query, with default execution plan, I’ll point out that there is an index on the n1 column that I’m using in the existence test:

select
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |    15 |  2865 |    26   (4)| 00:00:01 |
|*  1 |  HASH JOIN SEMI       |       |    15 |  2865 |    26   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL   | T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| T1_I1 |  3000 | 12000 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")
   2 - filter("T2"."N2"=15)

So I’ve emulated the hash semi-join into the second table that the OP wants to get rid of, and I’m not using the target index in a “precision” fashion.

I happen to know that there is a hint that I can use to make the subquery operate as a nested loop semijoin. It’s /*+ nl_sj */ and it has to go in the subquery. Unfortunately it’s a hint that’s deprecated in 10g, but never mind that for the moment. I’m also going to adopt “sensible practice” and give each of my query blocks a name. Let’s see what we get from dbms_xplan with the hint.

explain plan
set statement_id = 'sj_hinted'
for
select
	/*+
		qb_name(main)
	*/
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			/*+
				qb_name(subq) nl_sj
			*/
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

select * from table(dbms_xplan.display(null,'sj_hinted','outline'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 635111780

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    15 |  2865 |    37   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |       |    15 |  2865 |    37   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |  3000 | 12000 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$A93AFAED" "T1"@"SUBQ")
      LEADING(@"SEL$A93AFAED" "T2"@"MAIN" "T1"@"SUBQ")
      INDEX(@"SEL$A93AFAED" "T1"@"SUBQ" ("T1"."N1"))
      FULL(@"SEL$A93AFAED" "T2"@"MAIN")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ")
      OUTLINE_LEAF(@"SEL$A93AFAED")
      ALL_ROWS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N2"=15)
   3 - access("T1"."N1"="T2"."N1")

Note how I’ve used a statement_id to label my plan, and I’ve added the extra predicate ‘outline’ to the call to dbms_xplan. The outline shows me the complete set of hints I need to reproduce the execution plan; technically it’s the information that would be stored by Oracle as an outline or an SQL Baseline.

There are a few session-level parameter settings I don’t really need included, and a couple of things which can’t qualify as “legal” SQL hints, though, and I’m going to ignore those. (Don’t you love the “ignore the hints” hint, though!)

So let’s take the minimum set of hints back into the SQL:

explain plan
set statement_id = 'full_hints'
for
select
	/*+
		qb_name(main)
		unnest(@subq)
		leading(@sel$a93afaed t2@main t1@subq)
		use_nl(@sel$a93afaed t1@subq)
		full(@sel$a93afaed t2@main)
		index(@sel$a93afaed t1@subq(t1.n1))
	*/
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			/*+
				qb_name(subq)
			*/
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

select * from table(dbms_xplan.display(null,'full_hints','outline'));

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    15 |  2865 |    37   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |       |    15 |  2865 |    37   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |  3000 | 12000 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$A93AFAED" "T1"@"SUBQ")
      LEADING(@"SEL$A93AFAED" "T2"@"MAIN" "T1"@"SUBQ")
      INDEX(@"SEL$A93AFAED" "T1"@"SUBQ" ("T1"."N1"))
      FULL(@"SEL$A93AFAED" "T2"@"MAIN")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ")
      OUTLINE_LEAF(@"SEL$A93AFAED")
      ALL_ROWS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N2"=15)
   3 - access("T1"."N1"="T2"."N1")

Job done – we used a bit of hackery to get the plan we wanted, then used the legal hints to reproduce the plan.

It is important to name your query blocks as this helps you to identify what transformations apply when, and how to label your tables correctly in your code; and you have to remember that the “strange” query block names that appear (such as @”SEL$A93AFAED”) are dependent on the query block names you originally supplied.

The method isn’t perfect since (a) sometimes hints that are needed don’t get into the outline, and (b) sometimes the outline actually doesn’t reproduce the plan if all you use are the “legal” hints – but it may help you in most cases.

29 Comments »

  1. Excellent.
    And if this demonstration – that to properly hint this two table statement to use a nested loop semi join you need seven hints – is not enough to really, really put people off from manually hinting (especially complex statements with one or two hints), nothing ever will be :)

    Comment by Dom Brooks — June 8, 2011 @ 3:22 pm BST Jun 8,2011 | Reply

  2. Jonathan,

    in your example I would prefer to use following hints (just to prevent strange query block names as @”SEL$A93AFAED”). It looks really awful in code

     
    select
       /*+ LEADING(t2) USE_NL(@subq t1) INDEX(@subq t1 t1_i) UNNEST(@subq) */ *
    from
        t2
    where   t2.n2 = 15
    and exists (
            select
                /*+ QB_NAME(subq) */ null
            from    t1
            where   t1.n1 = t2.n1
        )
    ;
    Plan hash value: 1451647935
    
    ---------------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      |     1 |    91 |     4   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS SEMI |      |     1 |    91 |     4   (0)| 00:00:01 |
    |*  2 |   TABLE ACCESS FULL| T2   |     1 |    78 |     2   (0)| 00:00:01 |
    |*  3 |   INDEX RANGE SCAN | T1_I |     1 |    13 |     2   (0)| 00:00:01 |
    ---------------------------------------------------------------------------
    
    Query Block Name / Object Alias (identified by operation id):
    -------------------------------------------------------------
    
       1 - SEL$7FF3919B
       2 - SEL$7FF3919B / T2@SEL$1
       3 - SEL$7FF3919B / T1@SUBQ
    
    Outline Data
    -------------
    
      /*+
          BEGIN_OUTLINE_DATA
          USE_NL(@"SEL$7FF3919B" "T1"@"SUBQ")
          LEADING(@"SEL$7FF3919B" "T2"@"SEL$1" "T1"@"SUBQ")
          INDEX(@"SEL$7FF3919B" "T1"@"SUBQ" ("T1"."N1"))
          FULL(@"SEL$7FF3919B" "T2"@"SEL$1")
          OUTLINE(@"SUBQ")
          OUTLINE(@"SEL$1")
          UNNEST(@"SUBQ")
          OUTLINE_LEAF(@"SEL$7FF3919B")
          ALL_ROWS
          OPT_PARAM('optimizer_index_caching' 50)
          OPT_PARAM('_gby_hash_aggregation_enabled' 'false')
          OPT_PARAM('_optim_peek_user_binds' 'false')
          OPTIMIZER_FEATURES_ENABLE('10.2.0.5')
          IGNORE_OPTIM_EMBEDDED_HINTS
          END_OUTLINE_DATA
      */
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("T2"."N2"=15)
       3 - access("T1"."N1"="T2"."N1")
    
    
    
    

    It will not work in all situations, of course. Generally speaking, system generated query block names are needed in LEADING hints (when using global hints referencing two different query blocks).
    You also mentioned deprecating of %_SJ and %_AJ hints. Interestingly, undocumented hint syntax table@query_block was not allowed to use in 10g. It appeared in 11g R2 documentation several months after my service request, but I think we have already discussed it here: http://jonathanlewis.wordpress.com/2010/11/26/index-join-2/#comment-37930

    Comment by Pavol Babel — June 8, 2011 @ 10:20 pm BST Jun 8,2011 | Reply

    • Pavol,

      Thanks for the comment. I keep forgetting that the syntax alias@query_block doesn’t appear in the 10g manuals.
      On the other hand, I wouldn’t want to hint the query the way you have done because I would be more worried that you are getting the right plan for the wrong reasons. Your use_nl() hint is directed at query block subq which doesn’t have a join in it so, logically, the hint ought to be ignored. Oracle isn’t entirely consistent with it’s treatment of hints like this, and I would worry that one day they’ll get the treatment sorted out and your hint would therefore stop working.

      An interesting point about the use of internal query block names like sel$A93AFAED is that the 10g manuals do sanction it. The performance tuning guide for 10gR2 page 16-6 explains how to find a generated query block name for use in a hint. In their case the name they find is sel$4, but the principle is no different.

      I have found a note on MOS (metalink) by the way which sanctions the use of hints with the alias@queryblock format in 10g2: 956462.1 explains how you can cut and paste the entire OUTLINE content into your original query!

      Comment by Jonathan Lewis — June 9, 2011 @ 8:13 am BST Jun 9,2011 | Reply

      • > On the other hand, I wouldn’t want to hint the query the way you have done because I would be more worried that you are getting the right plan for the wrong reasons

        Jonathan,

        if you are worried about this, why do you then suggest the second step above in your post – copying back the hints from the outline in the original query? This would be subject to the same worries then if that version of the query was used in a production code since it contains similar “logically to be ignored” hints.

        Randolf

        Comment by Randolf Geist — June 9, 2011 @ 10:24 am BST Jun 9,2011 | Reply

        • Randolf,

          The hint use_nl(@subq t1) is internally inconsistent – it directs the optimizer to use the NL join method in query block subq; but there is no join in that query block, so the hint cannot be obeyed. (But Oracle transfers it to a different query block, and that behviour it not consistent with the general behaviour of hints).

          I don’t think you can make a similar inconsistency argument about any of the hints I have used.

          The whole area of hinting, though, is like quicksand – you just don’t know how fluid things can be. That’s why I advise people quite strongly to minimise their use of hints, anyway, so that the whole problem of “this might change” is only a problem that applies to a small number of code fragments that can be checked relatively easily on an upgrade.

          Comment by Jonathan Lewis — June 9, 2011 @ 11:37 am BST Jun 9,2011

      • Jonathan,

        I don’t think USE_NL(@subq t1) is inconsistent. I think it is telling optimizer to use nested loops on table t1, which is situated in query block subq. But is not telling that the nested loops join has to happen in query block subq. However, I have to agree it is more safe to copy hint from outline (it just looks terrible, but works perfectly at this situation).
        The problem of MOS is, that advices of analytics in Service Request are not consistent. My analytic said I shouldn’t use table@qb syntax and you found note, which allows it. Crazy, isn’t it? But I have use table@qb syntax many times in 10g, anyway. I’m also using SWAP_JOIN_INPUTS hint and it does not take place in documentation even in 11gR2.

        Internal qb names are nott sanctioned, of course (hopefully), but I have one more remark. As you can see, my database is 10.2.0.5 (64bit, HP-UX 11.31). Why is my system generated hint sel$A93AFAED different from your sel$a93afaed? We have same name of table in subquery with same queryblock name, the query unnestion transformation happened in both cases. I think system generated qb names should have consistent names across all platforms

        Comment by Pavol Babel — June 9, 2011 @ 7:06 pm BST Jun 9,2011 | Reply

        • Pavol,
          I agree that it’s very difficult when different notes on Metalink/MOS contradict each other; and it’s a situation not helped by the way that the notes often have an “last updated” or “last reviewed” date that conceal how badly out of date they might be.

          I never assume that the dialogues with customers that appear on some of the SR or bug notes are to be taken as generally approved – but I do take the view that a document that has been constructed in one of the standard information templates should be reasonably trustworthy. But even the things that have an official, definitive look are sometimes clearly inappropriate.

          We’ll have to agree to differ on our opinions of use_nl(@subq t1) – since there is no definitive statement in the manuals I can only cite the case that Oracle can generate hints of the form: operation(@qb_name1 alias@qb_name2) to support my suggestion that the @qb_name1 is telling you WHEN to apply the hint and the @qb_name2 that tells you where the object came from.

          Comment by Jonathan Lewis — June 10, 2011 @ 11:25 am BST Jun 10,2011

        • Jonathan,

          “We’ll have to agree to differ on our opinions of use_nl(@subq t1) – since there is no definitive statement in the manuals” – I absolutely agree.

          Comment by Pavol Babel — June 10, 2011 @ 3:28 pm BST Jun 10,2011

  3. Isn’t it interesting when your own customer’s turn to the web for help rather than turning to you the application software supplier that sold them the software? And in doing so publish sections of the application code itself? Yes, I do work for the software vendor that makes this application, as I have spent the past 3 days looking at this SQL. And have provided them with a solution.

    My recommendation for a short term solution were to use the following hints:

    INDEX (SC (SID CID SSCID))
    INDEX (C2 (AUDDATE SSCID))
    NO_USE_HASH (C2)
    USE_NL (C2)
    

    I got these by a similar route of examining the hints corresponding to the outline of the execution plan, and picking a minimal number that were sufficient. I know that one of the troubles with hints is when are you using too many, and how many is enough? This seemed enough by specifying both NL and the 2 INDEXes, while also disabling HASH (I see someone elses comment about it being a SEMI JOIN not a full JOIN, but it worked). I did not bother with things like LEADING or UNNEST because these seemed to be picked automatically by the optimizer.

    The difference in elapsed time is from 15 seconds down to 0.15 seconds or so (simple ‘set timing on’ in SQL*Plus) when using the other execution plan.

    By the way, the *actual* problem is that the data in the table is really skewed. There are millions of rows in the table, and for most audit date values there are thousands of matching rows – hence Oracle chooses the HASH JOIN based on cost estimates. But for this particular audit date value there are about ten matching rows only, and a NESTED LOOP JOIN is much faster. There is a height balanced histogram on the audit date column, but it has about a thousand distinct values in it, and because of the way the histograms work such small row counts for specific values are not recorded and it has to use the overall Density value which is an average of all unpopular values.

    Furthermore, the other viable solution is to alter the density value for the audit date column and lower it by a factor of 100, and the optimizer will then choose the NESTED LOOP JOIN in the execution plan produced. This avoids hints completely. I ended up with this other solution after re-reading Tuning by Cardinality Feedback by Wolfgang Breitling. The benefits to this solution are that no SQL changes are required, it avoids hints, and it impacts all SQL statements referencing this table in this way. And there are more than one such SQL statement doing this kind of self join against this table.

    To repeat – the problem is due to highly skewed data in the table. Essentially it wants to find rows that have recently been updated (audited), and there are very few of these as a percentage of the rows in the table. Given the average number of rows per date value across all the data values in the table, the optimizer estimates a high matching row count (1000+) and chooses a HASH JOIN. If you can lie to the optimizer somehow about the row count then you can get it to choose a NESTED LOOP JOIN, otherwise you resort to things like hints.

    John

    Comment by John Brady — June 9, 2011 @ 4:08 pm BST Jun 9,2011 | Reply

    • If you know that the cardinality for that query will be about 10 rows and can use hints, wouldn’t one of the “nudge” hints (opt_estimate) have the same effect (lower the density value)? E.g.:

      ...
      > AND     EXISTS (
      >                 SELECT /*+ cardinality(10) */ SSCID FROM SSCLASS C2
      ...
      

      By the way,
      [quote]
      … it impacts all SQL statements referencing this table in this way.
      [/quote]
      Isn’t that a problem? Do all such statements look for unpopular dates? If not, it might still be better to take the (nudge) hint route.

      Cheers,
      Flado

      Comment by Flado — June 9, 2011 @ 5:06 pm BST Jun 9,2011 | Reply

      • Flado,

        Putting the /*+ cardinality(10) */ probably wouldn’t make any difference.

        If the query transforms to a join, the cardinality hint ceases to matter. If the subquery runs as a filter subquery (for example if we put a /*+ no_unnest */ hint in the subquery, or a /*+ no_unnest(@subq) */ hint in the main query) then the subquery would be optimized as a separate query block under first_rows_1 optimization.

        Comment by Jonathan Lewis — June 10, 2011 @ 11:52 am BST Jun 10,2011 | Reply

    • John,

      I don’t have any idea, how you can suggest hints USE_NL(c2) NO_USE_HASH(c2) could work. If you run 10053 trace, you would definitely see they will be discarded at the parse time by CBO. The only hint accepted by optimizer would be INDEX (SC (SID CID SSCID)). So you can feel really happy, it it is really working.
      The next remark is when you add hints to your code, you should at least two hints to each table. One for join order and one for access path. It is the safest way to prevent changing of execution plan in the future (after applying patchset or upgrading).
      Setting modified statistics is good technique, but it can help to this particular query and spoil others.

      Comment by Pavol Babel — June 9, 2011 @ 7:54 pm BST Jun 9,2011 | Reply

      • A couple more points to try and clarify some things I did not explicitly mention before. First Pavol, hints are not ignored by the optimizer and are always obeyed – at least as far as I understand them. They are more like directives rather than pure hints. USE_NL is documented by Oracle in their Performance Tuning Guide, as is NO_USE_HASH, so I don’t see how you can claim that USE_NL will be ignored. I’ll agree with whatever Jonathan says about how hints work, but I’m pretty much sure this is how I’ve heard him describe them before.

        In general terms, either something is wrong with the way the SQL is written or something is wrong with the data in the table (it does not conform to the statistics Oracle has on it). The SQL involved is simple enough that there is nothing wrong with it. The problem is really that the data in the table does not agree with the statistics Oracle has on it, and the problem is that the optimizer is calculating the wrong value for the cardinality of the sub-query.

        I actually did a 10053 trace to see how close the alternative access methods were in cost when considered. At the end it repeats the SQL statement and an Outline of the equivalent hints to achieve that execution plan. Within these were references to “USE_HASH (@QBNAME C2@QB2)” and INDEX, as well as a bunch of others such as ALL_ROWS, UNNEST, OUTLINE, LEADING, SWAP_JOIN_INPUTS.

        Hence I knew that USE_HASH was being used, and that NO_USE_HASH would be one way of stopping it, as well as USE_NL.

        I decided not to qualify my suggested hints with query block names, even though I do know about them (from Jonathan) and fully agree with the general principle of being clear and unambiguous. In this particular case I did not believe that omitting the query block names would stop the hints working. There are only 2 tables, each with unique aliases, so the lack of query block names could not affect the meaning of each hint in this case.

        I also decided not to include hints that I suspected the optimizer would automatically be considering anyway i.e. UNNEST, LEADING and SWAP_JOIN_INPUTS. While you can argue that this makes the result somehow less guaranteed, I did not feel that the query was complex enough to need yet more hints in it. I was working on the Keep It Simple principle.

        As I said before, the suggested hints do work, as the execution plan did change as a result of them, though I did not do a 10053 trace to see what the optimizer reported. But with subsequent further analysis I would now equally consider the solutions of changing the density statistic or specifying the cardinality via a hint, as both are more representative of the nature of the problem. And the advantage of changing the density is that no SQL needs to change. I agree with all comments by everyone about the use of hints being awkward, and the potential need to use of a lot of hints to guarantee you always get the execution plan you want.

        Comment by John Brady — June 10, 2011 @ 8:45 am BST Jun 10,2011 | Reply

        • John,

          you definetely posted just following hints.

          INDEX (SC (SID CID SSCID))
          INDEX (C2 (AUDDATE SSCID))
          NO_USE_HASH (C2)
          USE_NL (C2)

          I wonder if you ever had read Jonathan’s note at the beginnig: “I’m not going to argue about what plans might be good or bad, and I’m going to assume the OP simply wants a nested loop semi join using a “good” index into the table aliased as C2; so I’m just going to demonstrate on this simple example how to approach that specific problem. The critical error the OP has made is that the join he’s trying to affect doesn’t exist in the query block”. Could you explain me how cold optimizer obey USE_NL(c2) or NO_USE_HASH(C2) hints when you are referencing table, which does not exist in main query block? Let’s take a test:

          
          select
             /*+ LEADING(t2) USE_NL(t1) */ *
          from
              t2
          where   t2.n2 = 15
          and exists (
                  select
                      null
                  from    t1
                  where   t1.n1 = t2.n1
              )
          ;
          
          ---------------------------------------------------------------------------
          | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
          ---------------------------------------------------------------------------
          |   0 | SELECT STATEMENT   |      |     1 |    91 |     3   (0)| 00:00:01 |
          |   1 |  NESTED LOOPS SEMI |      |     1 |    91 |     3   (0)| 00:00:01 |
          |*  2 |   TABLE ACCESS FULL| T2   |     1 |    78 |     3   (0)| 00:00:01 |
          |*  3 |   INDEX RANGE SCAN | T1_I |     1 |    13 |     0   (0)| 00:00:01 |
          ---------------------------------------------------------------------------
          

          OK you might thing, then optimizer OBEYED USE_NL(t1). But it is tricky, because CBO choosed NESTED LOOPS SEMI itself, not beacuse of hint. Lets change USE_NL(t1) to USE_HASH(t1). Oops, hint is ignored!

          
          select
             /*+ LEADING(t2) USE_HASH(t1) */ *
          from
              t2
          where   t2.n2 = 15
          and exists (
                  select
                      null
                  from    t1
                  where   t1.n1 = t2.n1
              )
          ;
          
          ---------------------------------------------------------------------------
          | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
          ---------------------------------------------------------------------------
          |   0 | SELECT STATEMENT   |      |     1 |    91 |     3   (0)| 00:00:01 |
          |   1 |  NESTED LOOPS SEMI |      |     1 |    91 |     3   (0)| 00:00:01 |
          |*  2 |   TABLE ACCESS FULL| T2   |     1 |    78 |     3   (0)| 00:00:01 |
          |*  3 |   INDEX RANGE SCAN | T1_I |     1 |    13 |     0   (0)| 00:00:01 |
          ---------------------------------------------------------------------------
          

          So I’m pretty sure 3 of your 4 suggested hints will be discarded by optimizer. If your hints had changed the execution plan, it would have been just becuase of INDEX (SC (SID CID SSCID)) hint (since it is the only valid one). This hint will be accepted by CBO

          
          select
             /*+ LEADING(t2) USE_HASH(@SEL$7FF3919B t1@subq)*/ *
          from
              t2
          where   t2.n2 = 15
          and exists (
                  select /*+ QB_NAME(subq) */ 
                      null
                  from    t1
                  where   t1.n1 = t2.n1
              )
          ;
          
          ---------------------------------------------------------------------------
          | Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
          ---------------------------------------------------------------------------
          |   0 | SELECT STATEMENT   |      |     1 |    91 |     4  (25)| 00:00:01 |
          |*  1 |  HASH JOIN SEMI    |      |     1 |    91 |     4  (25)| 00:00:01 |
          |*  2 |   TABLE ACCESS FULL| T2   |     1 |    78 |     3   (0)| 00:00:01 |
          |   3 |   INDEX FULL SCAN  | T1_I |     1 |    13 |     0   (0)| 00:00:01 |
          ---------------------------------------------------------------------------
          
          

          And one more remark. It could be that suboptimal execution plan occured becuase of skewd data. But I think your index (ADATE, SSCID) is wrong. Join in your query behaves like non-equi join when nested loops is used (hash join is using equi jion) and it can work really ony in case of skewed data. At least for this query you it should have swapped fileds (SSCID, ADATE)

          Comment by Pavol Babel — June 10, 2011 @ 9:57 am BST Jun 10,2011

        • Pavol – I cannot reply to your reply, so I’m replying to mine instead, if you see what I mean.

          I also think I may have misunderstood a bit your original reply – its clear from re-reading your replies that you do know about hints and how they work.

          I think the USE_NL(C2) should be used by the optimizer for the following reasons:

          1. The 10053 trace file showed that the query was being rewritten as a join between the two aliases of the table.

          *****************************
          Cost-Based Subquery Unnesting
          *****************************
          SU: No subqueries to consider in query block SEL$2 (#2).
          SU: Considering subquery unnesting in query block SEL$1 (#1)
          SU: Performing unnesting that does not require costing.
          SU: Considering subquery unnest on SEL$1 (#1).
          SU:   Checking validity of unnesting subquery SEL$2 (#2)
          SU:   Passed validity checks.
          SU:   Transforming EXISTS subquery to a join.
          Registered qb: SEL$5DA710D3 0x7dc108e0 (SUBQUERY UNNEST SEL$1; SEL$2)
            signature (): qb_name=SEL$5DA710D3 nbfros=2 flg=0
              fro(0): flg=0 objn=xxx hint_alias="SC"@"SEL$1"
              fro(1): flg=0 objn=xxx hint_alias="C2"@"SEL$2"
          

          The last line of comments prefixed “SU” states that it has transformed the EXISTS to a join instead.

          Later on in the 10053 file it has:

          SQL:******* UNPARSED QUERY IS *******
          SELECT ... FROM "SCHEMA"."SSCLASS" "C2", "SCHEMA"."SSCLASS" "SC" 
          WHERE ... AND "C2"."SSCID" = "SC"."SSCID" AND "C2"."AUDATE" >= TO_DATE ...
          

          So again I read this as saying that the sub-query had been unnested into a join.

          2. The reported bad execution plan has a JOIN in it, though it is a SEMI JOIN
          3. The hints reported in the outline in the 10053 trace file referred to “USE_HASH (@SEL$5DA710D3 C2@SEL$2)”, from which I infer that a USE_HASH hint was controlling the join between the two data sets, so a join was occurring
          4. All of the references to the 2 table aliases in the hints in the outline are prefixed by @SEL$5DA710D3 as the first argument. These include hints of OUTLINE_LEAF, INDEX, LEADING, USE_HASH and SWAP_JOIN_INPUTS. So all references were within the same single top level query block.

          This lead me to believe that I did not need to qualify the C2 reference in the USE_NL hint by a query block name. I’m not saying that “I know the USE_NL hint is correct”, but instead that “I believe it would work”. Generally I do avoid hints because of the dangers of them not quite working the way you want them to, for one reason or another. In this case I am saying that this was the set of evidence I considered when deciding that a hint of “USE_NL (C2)” and others would be right and would be applied by the optimizer. But I have no direct evidence that that specific hint was recognised and applied by the optimizer, only that the execution plan used was that desired after the 4 hints were added to the SQL.

          I do concede that the reference to C2 in the optimizer produced USE_HASH has @SEL$2 after it, which I take to be the nested sub-query, because the SC references are qualified by @SEL$1. I don’t know how this affects the meaning of the reference to C2 in either the USE_HASH or USE_NL hints.

          Comment by John Brady — June 10, 2011 @ 11:53 am BST Jun 10,2011

    • John,

      I have to say that I am a little surprised that your hints did work – which (exact) version of Oracle are you using, by the way, because the “unparse” line in your trace file means it’s not the same as the one I was testing on.

      In principle a hint like operation(object) ought to be a short hand for operation(@current_query_block object@current_query_block), so your use_nl(t2) should have been interpreted as use_nl(@sel$1 t2@sel$1) because the location of the hint is in the query block with the default name of sel$1.

      I have emphasised the “ought to be” because I don’t know if that claim is true – it’s my working hypothesis, but I haven’t seen anything in the manuals or on Metalink that makes this statement.

      If I am correct, two of your hints ought to be invalid because they demand a join method in a query block with only one table, and one ought to be invalid because it references an object that does not exist in that block.

      You were correct to ignore the observation someone had made about the hash join being a semi-join, the hints dictate the join method whether the specific case is “normal”, semi or anti.

      I wasn’t planning to say anything about getting the best plan for this query – it’s always down to volume, distribution, and available indexes – but there’s always the generality that when the optimizer gives you a hash join when you want a nested loop, it’s probably about the cardinality estimates or the precision of the available indexes, so adjusting some statistics (in a truthful type of way) to tell Oracle that it’s a small volume of data or that there’s a really precise good can be a very good strategy.

      In fact, in one of the models I created for this statement, all I had to do was put the correct index hint into the subquery (although I could have done the same by hinting from the main query) to make Oracle use that index. The optimizer decided that the cheapest path using that index was the nested loop semi-join. (If anyone did that – and no more – on a production system, though, I’d point out that they had emporarily got the right plan by accident.)

      Comment by Jonathan Lewis — June 10, 2011 @ 12:26 pm BST Jun 10,2011 | Reply

      • I was using 10.2.0.4 on a Sun Solaris system. The top line of the 10053 trace file says:

        Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
        

        One of the other hints in the outline output of the 10053 trace file was an optimizer features hint:

        OPTIMIZER_FEATURES_ENABLE('10.2.0.4')
        

        All of this feedback from everyone is just reinforcing my view that hints should be avoided because it is so difficult to know for certain that each hint is individually doing what you want it to. As Pavol has said, it could be one of the other hints that ended up producing the NESTED LOOP join, rather than the USE_NL (C2) itself – either the NO_USE_HASH or the INDEX on SC. And as I said before, the second alternative I came up with after some further analysis of reducing the column density value seems safer from the perspective of it more directly controlling the desired impact on the execution plan produced. “Safe” is a relative term here, as one way or another we are trying to force the Oracle optimizer to pick a different execution plan to the one it naturally believes is “best”.

        Comment by John Brady — June 10, 2011 @ 12:51 pm BST Jun 10,2011 | Reply

  4. [...] Jonathan Lewis writes a quick tutorial about how to use the hints. [...]

    Pingback by Log Buffer #224, A Carnival of the Vanities for DBAs | The Pythian Blog — June 10, 2011 @ 7:08 am BST Jun 10,2011 | Reply

  5. Jonathan,

    I would like to repeat this question:

    Internal qb names are nott sanctioned, of course (hopefully), but I have one more remark. As you can see, my database is 10.2.0.5 (64bit, HP-UX 11.31). Why is my system generated hint [b] SEL$7FF3919B [/b] different from your sel$a93afaed? We have same name of table in subquery with same queryblock name, the query unnestion transformation happened in both cases. I think system generated qb names should have consistent names across all platforms

    Comment by Pavol Babel — June 10, 2011 @ 10:03 am BST Jun 10,2011 | Reply

    • Pavol,

      If you look closely you’ll see that I have a qb_name(main) in my example that you don’t have in yours. The query block name for a query block created by a transformation is the result of a function applied to the query block names that went into producing that query block. So my internalised name comes from fn(main, subq), yours comes from fn(subq).

      I think it was Randolf Geist that first pointed out how the generated names were derived from the input names.

      Comment by Jonathan Lewis — June 10, 2011 @ 11:18 am BST Jun 10,2011 | Reply

      • Argh, what an easy explanation :) I have to blame myself, I didn’t notice your QB_NAME(main) hint in main queryblock. Now it makes sense to me.

        Comment by Pavol Babel — June 10, 2011 @ 11:26 am BST Jun 10,2011 | Reply

  6. John,

    answering to some of your notes:

    1. Of course EXISTS has been transformed to SEMI JOIN. BTW, I think, this the reason why oracle depricated %_SJ and %_AJ hints (you can always achive semi or anti join by using different hints). But the name od default queryblcock name SEL$1 has been transformed and that’s the problem as Jonathan wrote, too
    3. You have seen USE_HASH in the final outline, but you don’t know, weather because of your hint, or not. CBO always generates an outline, also in cases, where no hints were used in original query

    I have tried my examples also on 10.2.0.5 Solaris 32 bit and I have same results. Let’s run explain plan on following queries and please, paste us here execution plans
    generated on your solaris 64bit box.

    
    CREATE TABLE t1 (n1 NUMBER, n2 NUMBER, v1 VARCHAR2(100))
    /
    CREATE INDEX t1_i ON t1(n1)
    /
    CREATE TABLE t2 (n1 NUMBER, n2 NUMBER, v2 VARCHAR2(100))
    /
    
    select
       /*+ LEADING(t2) USE_NL(t1) */ *
    from
        t2
    where   t2.n2 = 15
    and exists (
            select
                null
            from    t1
            where   t1.n1 = t2.n1
        )
    ;
    
    select
       /*+ LEADING(t2) USE_HASH(t1) */ *
    from
        t2
    where   t2.n2 = 15
    and exists (
            select
                null
            from    t1
            where   t1.n1 = t2.n1
        )
    ;
    
    
    

    If the first plan will contain NESTED LOOPS SEMI and the second HASH_JOIN_SEMI, you are right. But as you can see my exampeles abov, optimizer didn’t follow hints in one of them (I have tried LINUX 64bit, HP 64 bit and Solaris 32bit, all DB were 10.2.0.5 ). I think it will be the same story at your DB and will be the final contradiction of your theory.

    Johnathan also suggested that two of your hints are invalid. And yes, OPTIMIZER IGNORES hints when they are not valid (referencing nonexisting tables, indexes etc.).

    Comment by Pavol Babel — June 10, 2011 @ 3:59 pm BST Jun 10,2011 | Reply

    • Pavol,

      I’m with you now on this one – I don’t think the USE_NL (C2) hint I added will have any effect on the optimizer. Another lesson learned for me about the right ways of using hints, and why it is probably better to just avoid them. I also think the NO_USE_HASH (C2) would be ignored by the optimizer for the same reason – the reference to C2 is not qualified by the inner query block name.

      The hints I have mentioned from the outline section in the 10053 trace file were only from the first execution of the query with no hints at all. So anything listed in that outline section was put there by the optimizer itself. I’ve not done a 10053 trace since, so I have no idea what hints were in the outline that resulted from the hinted query. So USE_HASH was produced by the optimizer itself. I think I also said that all of the hints from the optimizer outline section were fully qualified – each had a leading ‘@’ top level query block name argument, followed by table aliases each qualified by the query block name within the query e.g. SC@SEL1, C2@SEL2.

      If I wanted a set of “useful” hints then I would now more closely follow Jonathan’s recommendations – use query block names and qualify all table references, and add in as many hints as possible to be totally unambiguous e.g. LEADING, UNNEST and others.

      Thanks, John

      Comment by John Brady — June 13, 2011 @ 9:06 am BST Jun 13,2011 | Reply

      • I agree, it is very reasonable to avoid hints, if possible. But it depends of complexity of application. Unfortunately, we have to still hint complex queries. But we definitely try to use different techniques (setting statistics, adjust system statistics etc) and use hints only when really needed.

        Comment by Pavol Babel — June 13, 2011 @ 6:33 pm BST Jun 13,2011 | Reply

  7. [...] For more information on what you have to do to properly hint, see this excellent article on by Jonathan Lewis. [...]

    Pingback by Hints of Acceptability « OraStory — December 12, 2011 @ 9:49 pm BST Dec 12,2011 | Reply

  8. [...] a possible hint set that works 10.2.0.3 – I identified the necessary query block name by using the ‘outline’ option of dbms_xplan to check the execution [...]

    Pingback by Geek Stuff « Oracle Scratchpad — February 27, 2012 @ 5:52 pm BST Feb 27,2012 | Reply

  9. [...] the complete hint specification becomes ever more important. See Jonathan Lewis’s How to Hint post for a good [...]

    Pingback by SQL Patch IV (or Part II Addendum) – Why is the SQL Patch applied but not effective? « OraStory — March 29, 2012 @ 12:28 pm BST Mar 29,2012 | Reply

  10. […] hard to create a full set of hints by hand – and I often see hinted SQL in production systems where the plan […]

    Pingback by How to hint | Oracle Scratchpad — May 28, 2013 @ 5:26 pm BST May 28,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,161 other followers