Oracle Scratchpad

March 7, 2014

Subquery Anomaly

Filed under: Bugs,CBO,Execution plans,Oracle,Upgrades — Jonathan Lewis @ 8:57 am BST Mar 7,2014

Here’s an oddity that appeared on the OTN database forum last night:

We have this query in our application which works fine in 9i but fails in 11gR2 (on Exadata) giving an “ORA-00937: not a single-group group function” error….

… The subquery is selecting a column and it doesn’t have a group by clause at all. I am not sure how is this even working in 9i. I always thought that on a simple query using an aggregate function (without any analytic functions / clause), we cannot select a column without having that column in the group by clause. So, how 11g behaves was not a surprise but surprised to see how 9i behaves. Can someone explain this behaviour?

The poster supplied the suspect query, and it certainly looked as if it should never have worked – but I took a guess that the optimizer was doing some sort of transformation that concealed the problem before the optimizer managed to see the error. The subquery was a little odd because it was doing something it didn’t need to do, and my was guess that the optimizer had recognised the option to simplify the query and the simplification had “accidentally” removed the error. This turned out to be correct, but my guess about exactly what had happened to hide the error was wrong.

Having created a hypothesis I couldn’t resist checking it this morning, so here’s the test case (don’t pay any attention to the actual data I’ve generated, it was a cut-n-paste from a script that I had previously used for something completely different):

create table t1
as
select
	trunc((rownum-1)/15)	n1,
	trunc((rownum-1)/15)	n2,
	rpad(rownum,180)	v1
from all_objects
where rownum <= 3000
;

create table t2
as
select
	mod(rownum,200)		n1,
	mod(rownum,200)		n2,
	rpad(rownum,180)	v1
from all_objects
where rownum <= 3000
;

begin
 	dbms_stats.gather_table_stats(
 		user,
 		't1',
 		method_opt => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		user,
		't2',
		method_opt => 'for all columns size 1'
	);
end;
/

explain plan for
select
	/*+ qb_name(main) */
	*
from t1
where (n2,n1) in (
	select /*+
			qb_name(subq)
			unnest
		*/
		max(t2.n2), t2.n1
	from t2
	where t2.n1 = t1.n1
)
;

You’ll notice, of course, that I don’t have a group by clause at all, so the presence of the t2.n1 in the select list should lead to Oracle error: “ORA-00937: not a single-group group function”.

In versions from 8i to 11.1.0.7, this query could run, and its execution plan looked looked like this:


----------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost  |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |   200 | 45200 |    46 |
|*  1 |  HASH JOIN           |         |   200 | 45200 |    46 |
|   2 |   VIEW               | VW_SQ_1 |   200 |  7800 |    31 |
|   3 |    HASH GROUP BY     |         |   200 |  2400 |    31 |
|   4 |     TABLE ACCESS FULL| T2      |  3000 | 36000 |    14 |
|   5 |   TABLE ACCESS FULL  | T1      |  3000 |   547K|    14 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("N2"="MAX(T2.N2)" AND "N1"="N1" AND "ITEM_1"="T1"."N1")

Notice how the optimizer has produced an inline view (VW_SQ_1) from the subquery, using it to drive a hash join; notice how that inline view has an aggregation operation (HASH GROUP BY) in it. In effect the optimizer has rewritten my query like this:

select
	t1.*
from	(
		select
			distinct max(t2.n2) max_n2, t2.n1 item_1, t2.n1
		from	t2
		group by
			t2.n1
	)	vw_sq_1,
	t1
where
	t1.n2 = vw_sq_1.max_n2
and	t1.n1 = vw_sq_1.n1
and	t1.n1 = vw_sq_1.item_1
;

There’s a clue about why this succeeded in the 10053 trace file, which includes the lines:

"Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:   Passed validity checks.

Compared to the 11.2 lines:

Subquery Unnesting on query block MAIN (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block MAIN (#1).
SU:   Checking validity of unnesting subquery SUBQ (#2)
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

Whatever check it was that Oracle introduced in 11.2 (maybe a check that the query block was inherently legal), unnesting failed – and if I add an /*+ no_unnest */ hint to the original subquery in the earlier versions of Oracle I get the expected ORA-00937.

The philosophical argument is left to the reader: was the original behaviour a bug, or is the new behaviour the bug ?

 

7 Comments »

  1. Jonathan,

    I tested this on 10.2.0.1 and it ran in there also but with a different execution plan, no inline view created. Execution plan is as following:

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3879820203
    
    -----------------------------------------------------------------------------
    | Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------
    |   0 | SELECT STATEMENT     |      |     1 |   214 |    54   (4)| 00:00:01 |
    |*  1 |  FILTER              |      |       |       |            |          |
    |   2 |   HASH GROUP BY      |      |     1 |   214 |    54   (4)| 00:00:01 |
    |*  3 |    HASH JOIN         |      |  1253 |   261K|    53   (2)| 00:00:01 |
    |   4 |     TABLE ACCESS FULL| T2   |  3070 | 79820 |    26   (0)| 00:00:01 |
    |   5 |     TABLE ACCESS FULL| T1   |  3000 |   550K|    26   (0)| 00:00:01 |
    -----------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("T1"."N1"=MAX("T2"."N2"))
       3 - access("N2"="T2"."N1" AND "T1"."N1"="T2"."N1")
    

    I tried with event 10053 also but there was no similar observation as that of yours.

    Regards
    Jagdeep Sangwan

    Comment by jagdeepsangwan — March 7, 2014 @ 9:56 am BST Mar 7,2014 | Reply

    • Jagdeep,

      Thanks for the comment.

      In your case it looks like the optimizer has unnested the subquery (protecting the query from failure) and then did complex view merging – so “join and aggregate” rather than “aggregate and join”. (See http://jonathanlewis.wordpress.com/2007/03/08/transformation-and-optimisation/ )

      The absence of comments about validity checking etc. may be an indication of the gradual enhancement of the trace file – it get increasingly verbose as you go through the newer versions of Oracle. This is what I see in the 10.2.0.4 trace file when I force complex view merging:

      SU: Considering subquery unnesting in query block MAIN (#1)
      SU: Performing unnesting that does not require costing.
      SU: Considering subquery unnest on MAIN (#1).
      SU:   Checking validity of unnesting subquery SUBQ (#2)
      SU:   Passed validity checks.
      SU:   Unnesting subquery SUBQ (#2)
      Registered qb: SEL$AA0D0E02 0x5444f24 (SUBQ INTO VIEW FOR COMPLEX UNNEST SUBQ)
        signature (): qb_name=SEL$AA0D0E02 nbfros=1 flg=0
          fro(0): flg=0 objn=15801 hint_alias="T2"@"SUBQ"
      
      ...
      
      CVM: Finding query blocks in SEL$B4BE209F (#1) that are valid to merge.
      CVM:   Checking validity of merging SEL$AA0D0E02 (#2)
      CVM: CBQT Marking query block SEL$AA0D0E02 (#2)as valid for CVM.
      
      

      Comment by Jonathan Lewis — March 7, 2014 @ 10:08 am BST Mar 7,2014 | Reply

      • Jonathan,

        Thanks for the link and reply.
        I have just checked on 12.1.0.1 and 10053 shows

        SU: Considering subquery unnest on query block MAIN (#1)
        SU:   Checking validity of unnesting subquery SUBW(#2)
        SU:     SU bypassed: Failed basic validity checks.
        SU:   validity checks failed.
        SU:   Checking validity of unnesting subquery SUBQ (#2)
        SU:     SU bypassed: Failed basic validity checks.
        SU:   validity checks failed
        

        Regards,
        Jagdeep Sangwan

        Comment by jagdeepsangwan — March 7, 2014 @ 10:29 am BST Mar 7,2014 | Reply

  2. Jonathan,

    Thanks for explaining it very nicely and sharing with all of us.

    Thanks
    Aswath

    Comment by Aswath Rao — March 7, 2014 @ 9:39 pm BST Mar 7,2014 | Reply

  3. This looks familiar, you may want to have a look at this Oracle Note:
    Bug 9477688 – ORA-979 not signaled for invalid query – wrong results possible (Doc ID 9477688.8)

    This is a case of fixing a bug, and causing error laden SQL to drop its bomb.

    I saw this happen in a banking application a couple years ago, when an upgrade to 11.2.0.1 failed due to this.

    The patch instructions will say the patch can be applied online.
    It may work now, but when I tried to apply the patch online in a test environment, the query would then cause an ORA-600.

    The gist of my notes on the matter:

    “Oracle support has refused to continue work on this SR, stating that the initial problem stated has been resolved.

    And no, the initial problem was NOT resolved.
    If the patch is applied online, and the database is shutdown, the database cannot be restarted, and the patch cannot be rolled back.

    The only recovery is to restore a backup of ORACLE_HOME”.

    Comment by jkstill — March 8, 2014 @ 12:28 am BST Mar 8,2014 | Reply

    • Jared,

      Thanks for that information. The fix_control for that bug has the description: “edit check the view QB if it has a group-by” (QB I assume means “query block”) which looks like a perfect description of the change.

      Unfortunately in 11.2.0.4 setting the fix control to disable the fix doesn’t make any difference (alter session set “_fix_control” = ‘9477688:off’) – but the notes on fix controls point out that that approach may only work for specific bugs in specific versions of Oracle, so if anyone’s running 11.2.0.1 or 11.2.0.2 (which might be the significant version) it would be nice to have some test results.

      Comment by Jonathan Lewis — March 8, 2014 @ 7:19 am BST Mar 8,2014 | Reply

      • Today I discovered an 11.2.0.2 database was available for this test.
        It fails on 11.2.0.2, as expected.

        If I run into an 11.2.0.1 instance I will try it as well.

        
        SQL> l
          1  explain plan for
          2  select
          3      /*+ qb_name(main) */
          4      *
          5  from t1
          6  where (n2,n1) in (
          7      select /*+
          8              qb_name(subq)
          9              unnest
         10          */
         11          max(t2.n2), t2.n1
         12      from t2
         13      where t2.n1 = t1.n1
         14  )
         15*
        SQL> /
                max(t2.n2), t2.n1
        
                            *
        ERROR at line 11:
        ORA-00937: not a single-group group function
        
        

        Comment by jkstill — March 11, 2014 @ 5:07 pm BST Mar 11,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,910 other followers