Oracle Scratchpad

June 4, 2012

Subpartition stats

Filed under: CBO,Execution plans,Oracle,Performance,Tuning — Jonathan Lewis @ 7:07 am BST Jun 4,2012

You might have expected the following query ought to run reasonably efficiently, after all it seems to be targeted very accurately at precisely the few rows of information I’m interested in:

select
	column_name,
	avg_col_len
from
	dba_subpart_col_statistics
where
	owner             = 'TEST_USER'
and 	table_name        = 'TEST_COMP'
and	subpartition_name = 'P_MAX_D'


I’m after some subpartition column stats (so that I can work out whether a subpartition of a local index on a composite partition is roughly the right size) and I’m querying the view by the only columns that seem to be there to allow me to access the data efficiently. Unfortunately the execution plan isn’t doing what I need it to do. The following plan is coming from a small 11.2.0.3 database with up to date statistics:

------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                   |     1 |  2141 |    18   (6)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER              |                   |     1 |  2141 |    18   (6)| 00:00:01 |
|*  2 |   HASH JOIN                      |                   |     1 |  2128 |    16   (7)| 00:00:01 |
|   3 |    NESTED LOOPS                  |                   |     1 |   100 |     3   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID  | USER$             |     1 |    18 |     1   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN           | I_USER1           |     1 |       |     0   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN             | I_OBJ2            |     1 |    82 |     2   (0)| 00:00:01 |
|   7 |    VIEW                          | TSP$              |   954 |  1889K|    13   (8)| 00:00:01 |
|*  8 |     HASH JOIN                    |                   |   954 | 83952 |    13   (8)| 00:00:01 |
|   9 |      NESTED LOOPS OUTER          |                   |    82 |  6396 |    10   (0)| 00:00:01 |
|  10 |       NESTED LOOPS               |                   |    82 |  2788 |     9   (0)| 00:00:01 |
|  11 |        TABLE ACCESS FULL         | TABCOMPART$       |     7 |    70 |     2   (0)| 00:00:01 |
|* 12 |        TABLE ACCESS CLUSTER      | COL$              |    12 |   288 |     1   (0)| 00:00:01 |
|* 13 |         INDEX UNIQUE SCAN        | I_OBJ#            |     1 |       |     0   (0)| 00:00:01 |
|  14 |       TABLE ACCESS BY INDEX ROWID| ATTRCOL$          |     1 |    44 |     1   (0)| 00:00:01 |
|* 15 |        INDEX UNIQUE SCAN         | I_ATTRCOL1        |     1 |       |     0   (0)| 00:00:01 |
|  16 |      TABLE ACCESS FULL           | TABSUBPART$       |    81 |   810 |     2   (0)| 00:00:01 |
|  17 |   TABLE ACCESS BY INDEX ROWID    | HIST_HEAD$        |     1 |    13 |     2   (0)| 00:00:01 |
|* 18 |    INDEX RANGE SCAN              | I_HH_OBJ#_INTCOL# |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("O"."OBJ#"="TSP"."OBJ#")
   5 - access("U"."NAME"='TEST_USER')
   6 - access("O"."OWNER#"="U"."USER#" AND "O"."NAME"='TEST_COMP' AND "O"."NAMESPACE"=1 AND
              "O"."REMOTEOWNER" IS NULL AND "O"."LINKNAME" IS NULL AND "O"."SUBNAME"='P_MAX_D' AND
              "O"."TYPE#"=34)
       filter("O"."TYPE#"=34 AND "O"."SUBNAME"='P_MAX_D' AND "O"."LINKNAME" IS NULL)
   8 - access("TSP"."POBJ#"="TCP"."OBJ#")
  12 - filter(BITAND("C"."PROPERTY",32768)<>32768)
  13 - access("TCP"."BO#"="C"."OBJ#")
  15 - access("C"."OBJ#"="A"."OBJ#"(+) AND "C"."INTCOL#"="A"."INTCOL#"(+))
  18 - access("TSP"."OBJ#"="H"."OBJ#"(+) AND "TSP"."INTCOL#"="H"."INTCOL#"(+))

Notice particularly the full tablescan of the tabcompart$ table (the thing that identifies the (logical) partition objects that make up a composite partitioned table). This is happening inside the view tsp$, which Oracle is treating as a non-mergeable view, so we have to ask why the optimizer isn’t using one of two features that would allow this query to run more efficiently:

  • view merging so that it can index into tabsubpart$ using the object id I get in line 6 of the plan
  • predicate pushing to achieve a similar effect in a different way

Let’s look at the definition of dba_subpart_col_statistics (you can find this in the dictionary script $ORACLE_HOME/rdbms/admin/cdoptim.sql if you’re running a recente version of Oracle) to see what we can discover:

create or replace view DBA_SUBPART_COL_STATISTICS 
  (OWNER, TABLE_NAME, SUBPARTITION_NAME, COLUMN_NAME, NUM_DISTINCT, LOW_VALUE,
   HIGH_VALUE, DENSITY, NUM_NULLS, NUM_BUCKETS, SAMPLE_SIZE, LAST_ANALYZED,
   GLOBAL_STATS, USER_STATS, AVG_COL_LEN, HISTOGRAM)
as
select u.name, o.name, o.subname, tsp.cname, h.distcnt, 
       case when SYS_OP_DV_CHECK(o.name, o.owner#) = 1
            then h.lowval
            else null
       end,
       case when SYS_OP_DV_CHECK(o.name, o.owner#) = 1
            then  h.hival
            else null
       end, 
       h.density, h.null_cnt,
       case when nvl(h.distcnt,0) = 0 then h.distcnt
            when h.row_cnt = 0 then 1
	    when (h.bucket_cnt > 255
                  or
                  (h.bucket_cnt > h.distcnt
                   and h.row_cnt = h.distcnt
                   and h.density*h.bucket_cnt < 1))
                then h.row_cnt
            else h.bucket_cnt
       end,       
       h.sample_size, h.timestamp#,
       decode(bitand(h.spare2, 2), 2, 'YES', 'NO'),
       decode(bitand(h.spare2, 1), 1, 'YES', 'NO'),
       h.avgcln,
       case when nvl(h.row_cnt,0) = 0 then 'NONE'
            when (h.bucket_cnt > 255
                  or
                  (h.bucket_cnt > h.distcnt and h.row_cnt = h.distcnt
                   and h.density*h.bucket_cnt < 1))
                then 'FREQUENCY'
            else 'HEIGHT BALANCED'
       end
from sys.obj$ o, sys.hist_head$ h, tsp$ tsp, user$ u
where o.obj# = tsp.obj# and tsp.obj# = h.obj#(+)
  and tsp.intcol# = h.intcol#(+)
  and o.type# = 34 /* TABLE SUBPARTITION */
  and o.owner# = u.user#
  and o.namespace = 1 and o.remoteowner IS NULL and o.linkname IS NULL
/

This doesn’t look too bad. Although the layout of the code makes it a little hard to read it looks as if we could start at obj$ with the object name, join to user$ to check we’ve got the right object owner, then join to tsp$ to get some “denormalised” details about the Table Sub Partition (like the subpartition object number and internal column numbers) and then join to hist_head$ where the interesting column statistics are kept. The join to hist_head$ will have to appear last in the join order because it’s an outer join (which probably has to be there so that the query can return rows that report NULLs for columns where the statistics haven’t been collected).

So let’s take a closer look at tsp$ (again it’s in script cdoptim.sql) – perhaps that’s where the problem is:

create or replace view TSP$ as
select tsp.obj#, tcp.bo#, c.intcol#,
      decode(bitand(c.property, 1), 1, a.name, c.name) cname
      from sys.col$ c, sys.tabsubpart$ tsp, sys.tabcompart$ tcp, attrcol$ a
      where tsp.pobj# = tcp.obj# and tcp.bo# = c.obj#
      and bitand(c.property,32768) != 32768    /* not unused columns */
      and c.obj# = a.obj#(+) and c.intcol# = a.intcol#(+)
/

Again it doesn’t look too bad; we seem to have possible access paths through tsp.obj# (the subpartition’s object id), tcp.bo# (the base object id for the partition – which means the object id of the owning table), and c.intcol# (the internal column position for the column). Again, though, we see an outer join – we have a join to table attrcol$ (the thing that defines attributes for object types). But again there’s no obvious reason why this should cause a problem.

It’s only when you look closely at the whole picture that you can spot the problem. Here’s the critical outer join from the main query:

and tsp.obj# = h.obj#(+)
and tsp.intcol# = h.intcol#(+)

But the relevant columns in the select list for view tsp$ are tsp.obj# and c.intcol# – we’re trying to do an outer join to two different tables, and there’s a limitation in the optimizer that makes this difficult. Here’s a brief demo of the basic problem:

rem
rem     Script:         outer_limitation.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2012
rem

create table t1a (
	n1a	number,
	v1a	varchar(10)
);

create table t1b (
	n1b	number,
	v1b	varchar(10)
);

create table t2 (
	n2a	number,
	n2b	number,
	v2	varchar2(10)
);

create or replace view v1
as
select
	t1a.n1a,
	t1a.v1a,
	t1b.n1b,
	t1b.v1b
from
	t1a, t1b
where
	t1b.n1b = t1a.n1a
;

set autotrace traceonly explain

select
	t1a.v1a, t1b.v1b, t2.v2
from
	t1a,
	t1b,
	t2
where
	t1b.n1b   = t1a.n1a
and	t2.n2a(+) = t1a.n1a
and	t2.n2b(+) = t1b.n1b
;

select
	v1a, v1b, t2.v2
from
	v1,
	t2
where
	t2.n2a(+) = v1.n1a
and	t2.n2b(+) = v1.n1b
;

set autotrace off

When I try to run the simple query (any version up to 11.2.0.3) Oracle raises error “ORA-01417: a table may be outer joined to at most one other table”. When I hide the join between t1a and t1b inside the view v1, Oracle can execute the query but it has to make v1 a non-mergeable view that is also “non-pushable” (i.e. the optimizer can’t push the join predicates inside the view). Here’s the resulting execution plan; note, particularly, the VIEW operator at line 2 which tells us that v1 was not merged:

-------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    73 |     8 |
|*  1 |  HASH JOIN OUTER     |      |     1 |    73 |     8 |
|   2 |   VIEW               | V1   |     1 |    40 |     5 |
|*  3 |    HASH JOIN         |      |     1 |    40 |     5 |
|   4 |     TABLE ACCESS FULL| T1A  |     1 |    20 |     2 |
|   5 |     TABLE ACCESS FULL| T1B  |     1 |    20 |     2 |
|   6 |   TABLE ACCESS FULL  | T2   |     1 |    33 |     2 |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N2A"(+)="V1"."N1A" AND "T2"."N2B"(+)="V1"."N1B")
   3 - access("T1B"."N1B"="T1A"."N1A")

In the case of my client there was no reasonable workaround (although I could, perhaps, have designed a more complex piece of PL/SQL that avoided this particular query). Fortunately the client doesn’t use objects in the database, and we always collect for stale stats, so we decided that the simplest way to avoid the problem was to create a replacement view (jpl_subpart_col_stats) that copied the original view but turned the join to hist_head$ into an inner join – this made it possible for Oracle to merge the tsp$view, and the resulting query was fast enough for the job we had to do.

Footnote:

Don’t try to compare the behaviour of this query too closely with the behaviour of view dba_subpart_col_statistics, I’m just trying to demonstrate one little point with this example – the way in which the double outer join is currently illegal, and how it may be hidden (at a price) inside a view.

This restriction on outer joins disappears in Oracle 12c

3 Comments »

  1. Maybe you could change the title of this post. “double outer join and hiding behind a view”.

    Comment by Hemant K Chitale — June 5, 2012 @ 4:00 pm BST Jun 5,2012 | Reply

  2. Why does Oracle have the restriction that “a table may be outer joined to at most one other table”? When I re-write the SQL using ANSI SQL it permits the join.

    Comment by Ashley — June 6, 2012 @ 8:02 am BST Jun 6,2012 | Reply

    • Ashley,
      You’re correct, but the reason why you can do it in ANSI is that Oracle transforms your ANSI, in a form which basically contains the necessary non-mergeable in-line view that makes it possible, e.g.

      select
      	/*+ qb_name(main) */
      	gp.small_vc_gp,
      	par.small_vc_p,
      	chi.small_vc_c
      from
      	grandparent	gp
      left outer join 
      	parent		par
      on	par.id_gp = gp.id
      left outer join 
      	child		chi
      on	chi.id_p = par.id  -- to first table
      and	chi.id_gp = gp.id  -- to second table
      ;
      
      --------------------------------------------------------------------
      | Id  | Operation            | Name        | Rows  | Bytes | Cost  |
      --------------------------------------------------------------------
      |   0 | SELECT STATEMENT     |             |  2000 |   115K|   517 |
      |*  1 |  HASH JOIN OUTER     |             |  2000 |   115K|   517 |
      |   2 |   VIEW               |             |  2000 | 80000 |   121 |
      |*  3 |    HASH JOIN OUTER   |             |  2000 | 68000 |   121 |
      |   4 |     TABLE ACCESS FULL| GRANDPARENT |  1000 | 15000 |    40 |
      |   5 |     TABLE ACCESS FULL| PARENT      |  2000 | 38000 |    78 |
      |   6 |   TABLE ACCESS FULL  | CHILD       | 10000 |   185K|   381 |
      --------------------------------------------------------------------
      
      Query Block Name / Object Alias (identified by operation id):
      -------------------------------------------------------------
         1 - SEL$A4DA5D2A
         2 - SEL$58A6D7F6 / from$_subquery$_003@SEL$4
         3 - SEL$58A6D7F6
         4 - SEL$58A6D7F6 / GP@SEL$2
         5 - SEL$58A6D7F6 / PAR@SEL$1
         6 - SEL$A4DA5D2A / CHI@SEL$3
      
      

      Note the VIEW operator in line 2 of the plan, where the non-mergeable in-line view makes it presence felt.

      Comment by Jonathan Lewis — June 7, 2012 @ 5:57 pm BST Jun 7,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.