Oracle Scratchpad

December 3, 2010

ANSI – argh

Filed under: ANSI Standard,CBO,Execution plans,Hints,Ignoring Hints — Jonathan Lewis @ 7:30 pm GMT Dec 3,2010

I’m not keen on ANSI standard SQL – even though it is, technically, the strategic option and even though you have to use it for full outer joins and partitioned outer joins.

One reason for disliking it is that it “separates join predicates from filter predicates” – a reason often given in praise of the syntax which, to my mind, claims a spurious distinction and introduces a mechanism that makes it harder to keep mental track of what’s going to happen as you walk  through the join order. (I have to admit that I was temporarily fooled into thinking it was quite a nice idea – in an abstract sort of way.)

The other reason for disliking ANSI SQL in Oracle databases is that sometimes it really is necessary to add hints to the SQL to make the optimizer do what needs to be done – and ANSI makes it so much harder and messier to add hints to code. Here’s a wonderful example that Tony Hasler presented in our recent debate “Does Oracle Ignore Hints” at the UKOUG annual conference:

WITH q1 as (
	SELECT /*+ qb_name(q1block) */
		*
	FROM	t1
	JOIN	t2
	ON	t1_i1 = t2_i1
	AND	t1_i1 < 10
),
q2 AS (
	SELECT
		/*+ qb_name(q2block) */
		*
	FROM
		t3
	JOIN	t4
	ON	t3_i1 = t4_i1
	AND	t3_i1 < 10
)
SELECT
	/*+
		no_merge(@q1block)
		no_merge(@q2block)
		leading (@q1block t2)
		use_nl  (@q1block t1)
	*/
	*
FROM
	q1
JOIN
	q2
ON	t1_i1 + t2_i1 = t3_i1 + t4_i1
;

Just to make life really hard, he’s included a couple of “factored subqueries” – and there are a few outstanding optimizer defects with handling subquery factoring – so when he claimed that this was an example of Oracle ignoring hints I had two different directions of investigation to worry about.

Here’s the execution plan (from my 10.2.0.3 system with the data generation, constraints and indexing that Tony supplied):

------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |       |   250K|    33M|    54 |
|*  1 |  HASH JOIN                     |       |   250K|    33M|    54 |
|   2 |   VIEW                         |       |  5000 |   327K|    11 |
|*  3 |    HASH JOIN                   |       |  5000 |   581K|    11 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T4    |  5000 |   253K|     3 |
|*  5 |      INDEX RANGE SCAN          | T4_I1 |   900 |       |     2 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T3    |  5000 |   327K|     3 |
|*  7 |      INDEX RANGE SCAN          | T3_I1 |   900 |       |     2 |
|   8 |   VIEW                         |       |  5000 |   361K|    12 |
|*  9 |    HASH JOIN                   |       |  5000 |   615K|    12 |
|  10 |     TABLE ACCESS BY INDEX ROWID| T1    |  5000 |   297K|     3 |
|* 11 |      INDEX RANGE SCAN          | T1_I1 |   900 |       |     2 |
|  12 |     TABLE ACCESS BY INDEX ROWID| T2    |  5000 |   317K|     3 |
|* 13 |      INDEX RANGE SCAN          | T2_I1 |   900 |       |     2 |
------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$16C51A37
   2 - SEL$4C69CCA2 / Q2@SEL$1
   3 - SEL$4C69CCA2
   4 - SEL$4C69CCA2 / T4@SEL$2
   5 - SEL$4C69CCA2 / T4@SEL$2
   6 - SEL$4C69CCA2 / T3@SEL$2
   7 - SEL$4C69CCA2 / T3@SEL$2
   8 - SEL$7939585E / Q1@SEL$1
   9 - SEL$7939585E
  10 - SEL$7939585E / T1@SEL$3
  11 - SEL$7939585E / T1@SEL$3
  12 - SEL$7939585E / T2@SEL$3
  13 - SEL$7939585E / T2@SEL$3

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1_I1"+"T2_I1"="T3_I1"+"T4_I1")

As you can see, Oracle has copied the two factored subqueries inline (they appear just once each in the body of the query so this is – probably – inevitable). Then Oracle has obeyed the no_merge() hints – which I could check by deleting the hints and watching the plan change. So why, in lines 10 through 13, has Oracle not obeyed the leading() and use_nl() hints ?

By changing the ANSI syntax to traditional Oracle syntax, I got a different plan:

-------------------------------------------------------------------------
| Id  | Operation                       | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |       |   250K|    33M| 10045 |
|*  1 |  HASH JOIN                      |       |   250K|    33M| 10045 |
|   2 |   VIEW                          |       |  5000 |   327K|    11 |
|*  3 |    HASH JOIN                    |       |  5000 |   581K|    11 |
|   4 |     TABLE ACCESS BY INDEX ROWID | T4    |  5000 |   253K|     3 |
|*  5 |      INDEX RANGE SCAN           | T4_I1 |   900 |       |     2 |
|   6 |     TABLE ACCESS BY INDEX ROWID | T3    |  5000 |   327K|     3 |
|*  7 |      INDEX RANGE SCAN           | T3_I1 |   900 |       |     2 |
|   8 |   VIEW                          |       |  5000 |   361K| 10003 |
|   9 |    TABLE ACCESS BY INDEX ROWID  | T1    |     1 |    61 |     2 |
|  10 |     NESTED LOOPS                |       |  5000 |   615K| 10003 |
|  11 |      TABLE ACCESS BY INDEX ROWID| T2    |  5000 |   317K|     3 |
|* 12 |       INDEX RANGE SCAN          | T2_I1 |   900 |       |     2 |
|* 13 |      INDEX RANGE SCAN           | T1_I1 |     1 |       |     1 |
-------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1
   2 - Q2BLOCK / Q2@SEL$1
   3 - Q2BLOCK
   4 - Q2BLOCK / T4@Q2BLOCK
   5 - Q2BLOCK / T4@Q2BLOCK
   6 - Q2BLOCK / T3@Q2BLOCK
   7 - Q2BLOCK / T3@Q2BLOCK
   8 - Q1BLOCK / Q1@SEL$1
   9 - Q1BLOCK / T1@Q1BLOCK
  11 - Q1BLOCK / T2@Q1BLOCK
  12 - Q1BLOCK / T2@Q1BLOCK
  13 - Q1BLOCK / T1@Q1BLOCK

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1_I1"+"T2_I1"="T3_I1"+"T4_I1")
   3 - access("T3_I1"="T4_I1")
   5 - access("T4_I1"<10)
   7 - access("T3_I1"<10)
  12 - access("T2_I1"<10)
  13 - access("T1_I1"="T2_I1")
       filter("T1_I1"<10)

Notice how the optimizer is now obeying the leading() and use_nl() hints.

The problem is this: Oracle doesn’t optimise ANSI SQL, it transforms it then optimises it. Transformation can change query blocks, and Tony’s hints apply to specific query blocks. After a little testing and checking I worked out what the SQL looked like AFTER transformation and BEFORE optimisation; and it’s this:

select
	/*+ qb_name(sel$4) */
	*
from
	(
	SELECT
		/*+ qb_name(sel$1) */
		Q1.T1_I1 T1_I1, Q1.T1_I2 T1_I2, Q1.T1_D1 T1_D1, Q1.T2_I1 T2_I1, Q1.T2_I2 T2_I2, Q1.T2_TS T2_TS,
		Q2.T3_I1 T3_I1, Q2.T3_I2 T3_I2, Q2.T3_TSTZ T3_TSTZ, Q2.T4_I1 T4_I1, Q2.T4_I2 T4_I2
	FROM
		(
		SELECT
			/*+ NO_MERGE QB_NAME (Q1BLOCK) */
			from$_subquery$_003.T1_I1_0 T1_I1, from$_subquery$_003.T1_I2_1 T1_I2,
			from$_subquery$_003.T1_D1_2 T1_D1, from$_subquery$_003.T2_I1_3 T2_I1,
			from$_subquery$_003.T2_I2_4 T2_I2, from$_subquery$_003.T2_TS_5 T2_TS
		FROM	(
			SELECT
				/*+ qb_name(sel$3) */
				T1.T1_I1 T1_I1_0, T1.T1_I2 T1_I2_1, T1.T1_D1 T1_D1_2,
				T2.T2_I1 T2_I1_3, T2.T2_I2 T2_I2_4, T2.T2_TS T2_TS_5
			FROM
				TEST_USER.T1 T1,
				TEST_USER.T2 T2
			WHERE
				T1.T1_I1 = T2.T2_I1
			AND	T1.T1_I1 < 10
			)	from$_subquery$_003
		)	Q1,
		(
		SELECT
			/*+ NO_MERGE QB_NAME (Q2BLOCK) */
			from$_subquery$_006.T3_I1_0 T3_I1, from$_subquery$_006.T3_I2_1 T3_I2,
			from$_subquery$_006.T3_TSTZ_2 T3_TSTZ, from$_subquery$_006.T4_I1_3 T4_I1,
			from$_subquery$_006.T4_I2_4 T4_I2
		FROM
			(
			SELECT
				/*+ qb_name(sel$2) */
				T3.T3_I1 T3_I1_0, T3.T3_I2 T3_I2_1, T3.T3_TSTZ T3_TSTZ_2,
				T4.T4_I1 T4_I1_3, T4.T4_I2 T4_I2_4 FROM TEST_USER.T3 T3, TEST_USER.T4 T4
			WHERE
				T3.T3_I1 = T4.T4_I1
			AND	T3.T3_I1 < 10
			)	from$_subquery$_006
		)	Q2
	WHERE
		Q1.T1_I1 + Q1.T2_I1 = Q2.T3_I1 + Q2.T4_I1
	)
;

I got most of this from the “Query Block Name / Object Alias” section of the ANSI execution plan (there are some important clues there, like ‘T1@SEL$3′) and the “unparsed” SQL from the 10053 trace.

Notice how the query blocks q1block and q2block still exist – that’s why the no_merge() hints can survive the transformation. Notice, though, that the transformation engines has introduced a layer of inline views inside q1block and q2block - which is why the leading(@q1block t2) and use_nl(@q1block t1) hints are no longer valid: they reference objects which are not in q1block. To get his hints to work at the global level, Tony would have to change the last two hints to reference sel$3 rather than q1block.

So, next time you write a complicated piece of ANSI, make sure you think carefully about what you’re going to have to do if you subsequently have to add hints to force a particular execution plan.  (And bear in mind that one day the transformation engine might be modified to transform the query differently.)

[Further reading on "ignoring hints"]

20 Comments »

  1. Hi Jonathan. Just regarding the statement ‘Oracle doesn’t optimise ANSI SQL, it transforms it then optimises it’.

    Does that still apply to 11g?

    I have been under the impression – though I can’t locate the documents that gave me that impression – that 11g no longer transforms the ANSI SQL to Oracle SQL before running it through the Optimizer.

    Comment by Kirk Brocas — December 3, 2010 @ 7:52 pm GMT Dec 3,2010 | Reply

    • Kirk,

      I hadn’t read anything to that effect – but perhaps you read something about the ANSI features that aren’t available to standard Oracle syntax and extrapolated from there. The start of the 10053 trace from my 11.1.0.6 running the original query looks like this:

      Registered qb: Q1BLOCK 0xd54a290 (HINT Q1BLOCK)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=Q1BLOCK nbfros=1 flg=0
          fro(0): flg=5 objn=0 hint_alias="from$_subquery$_003"@"Q1BLOCK"
      
      Registered qb: Q2BLOCK 0xd54cf94 (HINT Q2BLOCK)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=Q2BLOCK nbfros=1 flg=0
          fro(0): flg=5 objn=0 hint_alias="from$_subquery$_006"@"Q2BLOCK"
      
      Registered qb: SEL$1 0xd543a8c (PARSER)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=SEL$1 nbfros=2 flg=0
          fro(0): flg=5 objn=0 hint_alias="Q1"@"SEL$1"
          fro(1): flg=5 objn=0 hint_alias="Q2"@"SEL$1"
      
      Registered qb: SEL$2 0xd544f50 (PARSER)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=SEL$2 nbfros=2 flg=0
          fro(0): flg=4 objn=82668 hint_alias="T3"@"SEL$2"
          fro(1): flg=4 objn=82671 hint_alias="T4"@"SEL$2"
      
      Registered qb: SEL$3 0xd5478d8 (PARSER)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=SEL$3 nbfros=2 flg=0
          fro(0): flg=4 objn=82662 hint_alias="T1"@"SEL$3"
          fro(1): flg=4 objn=82665 hint_alias="T2"@"SEL$3"
      
      Registered qb: SEL$4 0xd5498c4 (PARSER)
      ---------------------
      QUERY BLOCK SIGNATURE
      ---------------------
        signature (): qb_name=SEL$4 nbfros=1 flg=0
          fro(0): flg=5 objn=0 hint_alias="from$_subquery$_009"@"SEL$4"
      
      SPM: statement not found in SMB
      
      

      I think this is telling us that the parser transforms the query before anything else happens.

      Comment by Jonathan Lewis — December 4, 2010 @ 2:47 pm GMT Dec 4,2010 | Reply

    • That might be a ‘reading’ of the native full outer join implemented in 11g

      http://oracle-randolf.blogspot.com/2008/03/native-hash-full-outer-join-in-oracle.html

      As an enhancement it is the only thing in 11g that is particular to ANSI joins and is a new way of optimising/performing the query.

      Comment by Gary — December 4, 2010 @ 8:33 pm GMT Dec 4,2010 | Reply

    • Here is a quick test on 11.2.0.1 which confirms in 10053 trace files that the ANSI versions of the SQL statements are transformed before optimization:

      CREATE TABLE T5 AS
      SELECT
        ROWNUM C1,
        RPAD('A',10,'A') C2
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10;
      
      CREATE TABLE T6 AS
      SELECT
        ROWNUM C1,
        RPAD('A',10,'A') C2
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10;
        
      EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T5')
      EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T6')
       
      ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANSI';
      ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
       
      SET AUTOTRACE TRACEONLY STATISTICS
       
      SELECT
        *
      FROM
        T5
      NATURAL JOIN
        T6;
       
      SELECT
        *
      FROM
        T5
      JOIN
        T6
      ON
        T5.C1=T6.C1;
       
      SELECT
        *
      FROM
        T5
      LEFT OUTER JOIN
        T6
      ON
        T5.C1=T6.C1;
        
      ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
      

      Now, taking a look in the trace file to retrieve the original SQL statement, the execution plan, and then the rewritten version of the SQL statement:

      ----- Current SQL Statement for this session (sql_id=67q3pr5wynh7r) -----
      SELECT
        *
      FROM
        T5
      NATURAL JOIN
        T6
       
      ============
      Plan Table
      ============
      --------------------------------------+-----------------------------------+
      | Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
      --------------------------------------+-----------------------------------+
      | 0   | SELECT STATEMENT    |         |       |       |     5 |           |
      | 1   |  HASH JOIN          |         |    10 |   280 |     5 |  00:00:01 |
      | 2   |   TABLE ACCESS FULL | T5      |    10 |   140 |     2 |  00:00:01 |
      | 3   |   TABLE ACCESS FULL | T6      |    10 |   140 |     2 |  00:00:01 |
      --------------------------------------+-----------------------------------+
      Predicate Information:
      ----------------------
      1 - access("T5"."C2"="T6"."C2" AND "T5"."C1"="T6"."C1")
      

      Earlier in the trace file:

      Final query after transformations:******* UNPARSED QUERY IS *******
      SELECT "T6"."C1" "C1","T6"."C2" "C2" FROM "TESTUSER"."T5" "T5","TESTUSER"."T6" "T6" WHERE "T5"."C2"="T6"."C2" AND "T5"."C1"="T6"."C1"
      

      ———–

      ----- Current SQL Statement for this session (sql_id=fsq8jaj33m21p) -----
      SELECT
        *
      FROM
        T5
      JOIN
        T6
      ON
        T5.C1=T6.C1
       
      ============
      Plan Table
      ============
      --------------------------------------+-----------------------------------+
      | Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
      --------------------------------------+-----------------------------------+
      | 0   | SELECT STATEMENT    |         |       |       |     5 |           |
      | 1   |  HASH JOIN          |         |    10 |   280 |     5 |  00:00:01 |
      | 2   |   TABLE ACCESS FULL | T5      |    10 |   140 |     2 |  00:00:01 |
      | 3   |   TABLE ACCESS FULL | T6      |    10 |   140 |     2 |  00:00:01 |
      --------------------------------------+-----------------------------------+
      Predicate Information:
      ----------------------
      1 - access("T5"."C1"="T6"."C1")
      

      Earlier in the trace file:

      Final query after transformations:******* UNPARSED QUERY IS *******
      SELECT "T5"."C1" "C1","T5"."C2" "C2","T6"."C1" "C1","T6"."C2" "C2" FROM "TESTUSER"."T5" "T5","TESTUSER"."T6" "T6" WHERE "T5"."C1"="T6"."C1"
      

      ———–

      SELECT
        *
      FROM
        T5
      LEFT OUTER JOIN
        T6
      ON
        T5.C1=T6.C1
       
      ============
      Plan Table
      ============
      --------------------------------------+-----------------------------------+
      | Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
      --------------------------------------+-----------------------------------+
      | 0   | SELECT STATEMENT    |         |       |       |     5 |           |
      | 1   |  HASH JOIN OUTER    |         |    10 |   280 |     5 |  00:00:01 |
      | 2   |   TABLE ACCESS FULL | T5      |    10 |   140 |     2 |  00:00:01 |
      | 3   |   TABLE ACCESS FULL | T6      |    10 |   140 |     2 |  00:00:01 |
      --------------------------------------+-----------------------------------+
      Predicate Information:
      ----------------------
      1 - access("T5"."C1"="T6"."C1")
      

      Earlier in the trace file:

      Final query after transformations:******* UNPARSED QUERY IS *******
      SELECT "T5"."C1" "C1","T5"."C2" "C2","T6"."C1" "C1","T6"."C2" "C2" FROM "TESTUSER"."T5" "T5","TESTUSER"."T6" "T6" WHERE "T5"."C1"="T6"."C1"(+)
      

      Comment by Charles Hooper — December 5, 2010 @ 1:40 am GMT Dec 5,2010 | Reply

      • I just noticed that the transformed version of this SQL statement seems to be incorrect:

        SELECT
          *
        FROM
          T5
        NATURAL JOIN
          T6;
        

        The transformed version looks like this:

        Final query after transformations:******* UNPARSED QUERY IS *******
        SELECT
          "T6"."C1" "C1",
          "T6"."C2" "C2"
        FROM
          "TESTUSER"."T5" "T5",
          "TESTUSER"."T6" "T6"
        WHERE
          "T5"."C2"="T6"."C2"
          AND "T5"."C1"="T6"."C1";
        

        I would have throught that the transformed version of the query would also return columns T5.C1 and T5.C2. The execution plan showed that 280 bytes were expected to be returned, just as was the case for the other SQL statements, so maybe it is just a display glitch. However, the actual rows returned by the query do in fact exclude columns T5.C1 and T5.C2.

        Comment by Charles Hooper — December 5, 2010 @ 2:03 am GMT Dec 5,2010 | Reply

        • Charles,

          Thanks for the notes on 11.2.

          I wonder if the behaviour of the natural join is according to the standard: in concept the two sets of columns are “the same columns” because of what a “natural join” means.

          Without reading the standard I’d be happy to present an argument for the result that we actually get, and an argument for the four-column result that we get in the other cases. But note that if you try to “select t5.*, t6.*” for the natural join you get:

            t5.*, t6.*
               *
          ERROR at line 2:
          ORA-25155: column used in NATURAL join cannot have qualifier
          

          Of course we never do “select *” in the real world, do we ;)

          Comment by Jonathan Lewis — December 5, 2010 @ 12:39 pm GMT Dec 5,2010

        • Thank you for the comment regarding natural joins. I try to avoid ANSI when possible. There were actually 3 issues that my test case was showed regarding the ANSI to Oracle SQL translation:
          1. Not all of the columns from both of the tables were retrieved. I believe that you are correct that this is by design, so we can throw out this issue.
          2. The columns from the second table were retrieved, rather than the columns from the first table. Why is this important? What happens if the datatypes are not identical in the two tables (of course this is a bad design decision, but it does happen)?
          3. The estimated number of bytes at ID 1 was doubled. This is probably just a small issue with production SQL because there are probably few columns in common between typical tables (but it might affect execution plan costing?).

          Let’s take another look at issue #2. Assume that we modify my script like this:

          DROP TABLE T6 PURGE;
           
          CREATE TABLE T6(
            C1 CHAR(2000),
            C2 VARCHAR(2000));
           
          INSERT INTO
            T6
          SELECT
            ROWNUM C1,
            RPAD('A',10,'A') C2
          FROM
            DUAL
          CONNECT BY
            LEVEL<=10;
            
          EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T6')
           
          ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANSI2';
          ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
           
          SET AUTOTRACE TRACEONLY STATISTICS
           
          SELECT
            *
          FROM
            T5
          NATURAL JOIN
            T6;
           
          SELECT
            *
          FROM
            T5
          JOIN
            T6
          ON
            T5.C1=T6.C1;
           
          SELECT
            *
          FROM
            T5
          LEFT OUTER JOIN
            T6
          ON
            T5.C1=T6.C1;
            
          ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
          

          The execution plan for the first query looks like this:

          ============
          Plan Table
          ============
          --------------------------------------+-----------------------------------+
          | Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
          --------------------------------------+-----------------------------------+
          | 0   | SELECT STATEMENT    |         |       |       |     5 |           |
          | 1   |  HASH JOIN          |         |    10 |   20K |     5 |  00:00:01 |
          | 2   |   TABLE ACCESS FULL | T5      |    10 |   140 |     2 |  00:00:01 |
          | 3   |   TABLE ACCESS FULL | T6      |    10 |   20K |     2 |  00:00:01 |
          --------------------------------------+-----------------------------------+
          Predicate Information:
          ----------------------
          1 - access("T5"."C2"="T6"."C2" AND "T5"."C1"=TO_NUMBER("T6"."C1"))
          

          The original version of the query and the query after transformation:

          SELECT
            *
          FROM
            T5
          NATURAL JOIN
            T6;
           
          Final query after transformations:******* UNPARSED QUERY IS *******
          SELECT
            "T6"."C1" "C1",
            "T6"."C2" "C2"
          FROM
            "TESTUSER"."T5" "T5",
            "TESTUSER"."T6" "T6"
          WHERE
            "T5"."C2"="T6"."C2"
            AND "T5"."C1"=TO_NUMBER("T6"."C1")
          

          If the developer expected the SQL statement to “retrieve every column from T5″, he might be shocked to find that column C1 is not a number, but a 2000 byte blank padded string (CHAR). That might cause some problems for the application. But then we already knew that natural joins are evil.

          Now let’s reverse the order of T5 and T6 in the SQL statement to see what happens:

          ALTER SESSION SET TRACEFILE_IDENTIFIER = 'ANSI3';
          ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT FOREVER, LEVEL 1';
           
          SET AUTOTRACE TRACEONLY STATISTICS
           
          SELECT
            *
          FROM
            T6
          NATURAL JOIN
            T5;
           
          ALTER SESSION SET EVENTS '10053 TRACE NAME CONTEXT OFF';
          

          Taking a look in the 10053 trace file:

          ============
          Plan Table
          ============
          --------------------------------------+-----------------------------------+
          | Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
          --------------------------------------+-----------------------------------+
          | 0   | SELECT STATEMENT    |         |       |       |     5 |           |
          | 1   |  HASH JOIN          |         |    10 |   20K |     5 |  00:00:01 |
          | 2   |   TABLE ACCESS FULL | T5      |    10 |   140 |     2 |  00:00:01 |
          | 3   |   TABLE ACCESS FULL | T6      |    10 |   20K |     2 |  00:00:01 |
          --------------------------------------+-----------------------------------+
          Predicate Information:
          ----------------------
          1 - access("T6"."C2"="T5"."C2" AND "T5"."C1"=TO_NUMBER("T6"."C1"))
           
          SELECT
            *
          FROM
            T6
          NATURAL JOIN
            T5
           
          Final query after transformations:******* UNPARSED QUERY IS *******
          SELECT
            "T5"."C1" "C1",
            "T5"."C2" "C2"
          FROM
            "TESTUSER"."T6" "T6",
            "TESTUSER"."T5" "T5"
          WHERE
            "T6"."C2"="T5"."C2"
            AND "T5"."C1"=TO_NUMBER("T6"."C1")
          

          So, now we are retrieving columns C1 and C2 from table T5, so instead of the developer receiving a 2000 byte blank padded string, the database returns a number.

          You could further experiment with defining C2 as CHAR(2000) – the first query will return 0 rows, while the last two will return 2 rows (this is expected, because of the blank padding).

          Comment by Charles Hooper — December 5, 2010 @ 1:55 pm GMT Dec 5,2010

  2. Jonathan,

    this is another example why people should call dbms_xplan.display with parameter “format” at least ‘TYPICAL + alias’ . The “Query Block Name / Object Alias” section is very helpful in some cases.

    I would like to check some little details about Tony’s example. Could you please provide script for DDL of tables t1, t2, t3 and t4? Many thanks.

    Comment by Pavol Babel — December 3, 2010 @ 11:01 pm GMT Dec 3,2010 | Reply

    • Pavol,

      This is the code that generated the test set:

      DROP TABLE t1 CASCADE CONSTRAINTS;
      DROP TABLE t2 CASCADE CONSTRAINTS;
      DROP TABLE t3 CASCADE CONSTRAINTS;
      DROP TABLE t4 CASCADE CONSTRAINTS;
      
      CREATE TABLE t1 (t1_i1 INTEGER NOT NULL,t1_i2 INTEGER NOT NULL,t1_d1 DATE NOT NULL);
      CREATE INDEX t1_i1 ON t1(t1_i1,t1_d1);
      CREATE INDEX t1_i2 ON t1(t1_i2);
      
      CREATE TABLE t2 (t2_i1 INTEGER NOT NULL,t2_i2 INTEGER NOT NULL,t2_ts TIMESTAMP NOT NULL);
      CREATE INDEX t2_i1 ON t2(t2_i1,t2_ts);
      CREATE INDEX t2_i2 ON t2(t2_i2);
      
      CREATE TABLE t3 (t3_i1 INTEGER NOT NULL,t3_i2 INTEGER NOT NULL,t3_tstz TIMESTAMP WITH TIME ZONE NOT NULL);
      CREATE INDEX t3_i1 ON t3(t3_i1,t3_tstz);
      CREATE INDEX t3_i2 ON t3(t3_i2);
      
      CREATE TABLE t4 (t4_i1 INTEGER NOT NULL,t4_i2 INTEGER NOT NULL);
      CREATE INDEX t4_i1 ON t4(t4_i1);
      CREATE INDEX t4_i2 ON t4(t4_i2);
      
      BEGIN
         FOR i IN 1 .. 4
         LOOP
            DBMS_STATS.set_table_stats (USER,
                                        'T' || i,
                                        numrows      => 100000,
                                        numblks      => 10000);
      
            FOR r IN (SELECT index_name
                        FROM user_indexes
                       WHERE table_name = 'T' || i)
            LOOP
               DBMS_STATS.set_index_stats (USER,
                                           r.index_name,
                                           numrows       => 100000,
                                           numlblks      => 10,
                                           indlevel      => 1,
                                           clstfct       => 100);
            END LOOP;
         END LOOP;
      END;
      /
      
      

      Comment by Jonathan Lewis — December 4, 2010 @ 2:59 pm GMT Dec 4,2010 | Reply

      • Jonathan,

        I wonder how difficult is finding how SQL statement looked like after transformation (and before optimization) from 10053 trace, even in 11gR2. I’m using 10gR2 most of time, so I knew it is difficult in 10g, but didn’t know it is still so painful in 11g. You have made great job Jonathan, again.

        There is mentioned some kind of “UNPARSED QUERY”, in the trace, but not the final version

        Join Elimination (JE)
        *************************
        SQL:******* UNPARSED QUERY IS *******
        
        SELECT "Q1"."T1_I1" "T1_I1", "Q1"."T1_I2" "T1_I2", "Q1"."T1_D1" "T1_D1",
               "Q1"."T2_I1" "T2_I1", "Q1"."T2_I2" "T2_I2", "Q1"."T2_TS" "T2_TS",
               "Q2"."T3_I1" "T3_I1", "Q2"."T3_I2" "T3_I2", "Q2"."T3_TSTZ" "T3_TSTZ",
               "Q2"."T4_I1" "T4_I1", "Q2"."T4_I2" "T4_I2"
          FROM (SELECT /*+ NO_MERGE QB_NAME ("Q1BLOCK") */
                       "from$_subquery$_003"."T1_I1_0" "T1_I1",
                       "from$_subquery$_003"."T1_I2_1" "T1_I2",
                       "from$_subquery$_003"."T1_D1_2" "T1_D1",
                       "from$_subquery$_003"."T2_I1_3" "T2_I1",
                       "from$_subquery$_003"."T2_I2_4" "T2_I2",
                       "from$_subquery$_003"."T2_TS_5" "T2_TS"
                  FROM (SELECT "T1"."T1_I1" "T1_I1_0", "T1"."T1_I2" "T1_I2_1",
                               "T1"."T1_D1" "T1_D1_2", "T2"."T2_I1" "T2_I1_3",
                               "T2"."T2_I2" "T2_I2_4", "T2"."T2_TS" "T2_TS_5"
                          FROM "SYS"."T1" "T1", "SYS"."T2" "T2"
                         WHERE "T1"."T1_I1" = "T2"."T2_I1" AND "T1"."T1_I1" < 10) "from$_subquery$_003") "Q1",
               (SELECT /*+ NO_MERGE QB_NAME ("Q2BLOCK") */
                       "from$_subquery$_006"."T3_I1_0" "T3_I1",
                       "from$_subquery$_006"."T3_I2_1" "T3_I2",
                       "from$_subquery$_006"."T3_TSTZ_2" "T3_TSTZ",
                       "from$_subquery$_006"."T4_I1_3" "T4_I1",
                       "from$_subquery$_006"."T4_I2_4" "T4_I2"
                  FROM (SELECT "T3"."T3_I1" "T3_I1_0", "T3"."T3_I2" "T3_I2_1",
                               "T3"."T3_TSTZ" "T3_TSTZ_2", "T4"."T4_I1" "T4_I1_3",
                               "T4"."T4_I2" "T4_I2_4"
                          FROM "SYS"."T3" "T3", "SYS"."T4" "T4"
                         WHERE "T3"."T3_I1" = "T4"."T4_I1" AND "T3"."T3_I1" < 10) "from$_subquery$_006") "Q2"
         WHERE "Q1"."T1_I1" + "Q1"."T2_I1" = "Q2"."T3_I1" + "Q2"."T4_I1"
        
        

        BTW I suggest you have ommited one little detail, just one alias “from$_subquery$_009″. But is has no impact on using hints referencing query block names.

        .
        .
        .
        
                    AND T3.T3_I1 < 10
                    )   from$_subquery$_006
                )   Q2
            WHERE
                Q1.T1_I1 + Q1.T2_I1 = Q2.T3_I1 + Q2.T4_I1
            ) from$_subquery$_009
        ;
        
        

        Comment by Pavol Babel — December 8, 2010 @ 6:18 pm GMT Dec 8,2010 | Reply

  3. Thanks Jonathan and Charles for answering my question. Jonathan I think you are probably right – I’ve taken the increased ANSI support in 11g to extrapolate that 11g is not transforming ANSI to Oracle syntax any more – your above tests clearly show that is not the case.

    Does that mean that it would still be best practice when tuning SQL to stay with the Oracle syntax?

    Numerous articles seem to push ANSI syntax these days and I note Tom Kyte is using it a lot. It is supposedly easier to read, but for me I find multi-table joins incomprehensible…

    Comment by Kirk Brocas — December 5, 2010 @ 8:50 pm GMT Dec 5,2010 | Reply

  4. Kirk,

    I use whichever syntax is the house style when I’m on assignment; but even though ANSI is the “strategic” choice, I will suggest to people that they may need to think about rewriting complex statements in Oracle syntax if they need to hint them into execution paths the optimizer can’t find.

    Comment by Jonathan Lewis — December 5, 2010 @ 9:12 pm GMT Dec 5,2010 | Reply

  5. This is not about “oracle syntax”. Here is old discussion

    http://www.mail-archive.com/pgsql-sql@postgresql.org/msg18872.html

    where Anthony Molinaro (author of bestselling SQL Cookbook) put it that way:

    “Old style is short and sweet and perfect.
    Ansi dumbed it down, that’s the bottom line.
    And for people who’ve been developing for sometime,
    It’s wholly unnecessary.”

    Comment by Vadim Tropashko — December 7, 2010 @ 8:52 pm GMT Dec 7,2010 | Reply

    • there are still applications independed on database system. I hate such (and to be honest I hate ANSI) applications, but they are still alive. The only choice for people developing such applications is only ANSI.
      And do not forget, oracle has, let me say, quite strange mechanism of rewriting ANSI syntax into Oracle syntax. It is just like BITMAP indexes. Do have BITMAP indexes problems with DML? Every oracle guru would say “Yes”. But BITMAP Indexes suffer such problems just in Oracle!!! There were some RDBMS, where this issue didn’t exist at all. However, the main problem is, thet Oracle has wrong implementation of bitmap indexes. Just like terrible implementation of IOT

      Comment by Pavol Babel — December 8, 2010 @ 8:49 pm GMT Dec 8,2010 | Reply

    • I am also in the ANSI camp: I find ANSI join much easier to read because of the verbosity and the clear separation between conditions applied for the join and conditions applied to the result. Working with Oracle it’s probably advisable to know Oracle’s syntax but I just can’t seem to get used to it – I always have to look up the meaning of (+) placement if I did not use it for some time. Also, if I remember correctly, there are some things that can’t be done with Oracle syntax which makes it deficient in my view. And then of course there is compatibility – but that’s a difficult story anyway…

      Comment by Robert Klemme — December 14, 2010 @ 3:54 pm GMT Dec 14,2010 | Reply

  6. [...] Lewis hates ANSI style joins and tells you why. Gwen is ashamed to admit that she find ANSI joins far more readable, but after reading [...]

    Pingback by Log Buffer #209, A Carnival of the Vanities for DBAs | The Pythian Blog — December 13, 2010 @ 4:33 am GMT Dec 13,2010 | Reply

    • Now here’s a problem:

      Too many people want answers in black and white – subtle shades of grey are not allowed in the Oracle world. (Okay, tongue slightly in cheek, there – occasionally a little dramatic effect is allowable, especially when the original source is placed in front of you so that you can see the actual wording.)

      But here we are: I say “I’m not keen on…”, “.. reason for disliking … ” and this is reported as “Jonathan hates ANSI”.

      Shades of Dilbert !

      Comment by Jonathan Lewis — December 13, 2010 @ 7:07 pm GMT Dec 13,2010 | Reply

  7. [...] ANSI – argh [...]

    Pingback by Feeling ANSI About Oracle Join Syntax? « Charles Hooper's Oracle Notes — December 26, 2010 @ 3:59 pm GMT Dec 26,2010 | Reply

  8. From my experience , I see ANSI syntax used by non-Oracle developers a lot , where as Oracle developers for the most part prefer oracle syntax.

    For me , going via ANSI route is lot more text than the Oracle Syntax. Wonder , if Oracle could give out a syntax for Full Outer Join and Partition Outer Join ( in Oracle Syntax) .

    Even If I write

    Comment by Zahir — January 4, 2011 @ 8:05 pm GMT Jan 4,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,422 other followers