Oracle Scratchpad

February 2, 2016

Partitioned Bitmap Join

Filed under: bitmaps,Bugs,Indexing,Infrastructure,Oracle,Partitioning,Troubleshooting — Jonathan Lewis @ 8:32 am GMT Feb 2,2016

If you don’t want to read the story, the summary for this article is:

If you create bitmap join indexes on a partitioned table and you use partition exchanges to load data into the table then make sure you create the bitmap join indexes on the loading tables in exactly the same order as you created them on the partitioned table or the exchange will fail with the (truthful not quite complete) error: ORA-14098: index mismatch for tables in ALTER TABLE EXCHANGE PARTITION.

My story starts with this OTN posting from John Hall where he found after a year of successful batch loading one of his partition exchanges was raising error 14098. After an exchange of ideas, user rp0428 came up with a query against sys.jijoin$ (one of the tables behind bitmap join indexes) that allowed John Hall to see that the indexes on the exchange table had been created in a different order from that of the partitioned table. I did a quick test to see if this might be relevant (it shouldn’t be, it isn’t with “normal” indexes or function-based indexes, or virtual columns) and didn’t manage to reproduce the problem with two dimension tables and two bitmap join indexes.

Fortunately John didn’t take my word for it and tested the idea on a clone of the production system – and found that the order of creation did matter. His system, however, had 9 dimension tables and 33 bitmap join indexes – which shouldn’t have made any difference in principle, but maybe it was something to do with having several indexes on the same table,  maybe it was something to do with have far more tables or far more indexes than I had. So I built a larger test case with 6 dimension tables and six indexes per table – and reproduced the problem.

Then I started cutting back to see where the problem appeared, and found that all it took was one dimension with two indexes, or two dimensions with one index each – whatever I had done in my “quick test” I had clearly done it too quickly and done something wrong. (Unfortunately I had overwritten most of the code from the original quick test while building the larger test, so I couldn’t go back and see where the error was.)

Here, then, is the minimal test case that I finally ran to demonstrate that switching the order of index creation on the exchange table causes the exchange to fail:


drop table pt_range purge;
drop table t1 purge;
drop table dim_1 purge;
drop table dim_2 purge;

prompt  =================
prompt  Partitioned table
prompt  =================

create table pt_range (
        id,
        grp1,
        grp2,
        padding
)
nologging
partition by range(id) (
        partition p2001 values less than (2001),
        partition p4001 values less than (4001),
        partition p6001 values less than (6001),
        partition p8001 values less than (8001)
)
as
select
        rownum                          id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 8000
;

prompt  ================================================
prompt  Exchange table - loaded to match partition p8001
prompt  ================================================

alter table pt_range 
add constraint pt_pk primary key (id) using index local;

create table t1 (
        id,
        grp1,
        grp2,
        padding
)
as 
select
        rownum + 6000                   id,
        trunc(rownum/100)               grp1,
        trunc(rownum/100)               grp2,
        rpad('x',100)                   padding
from
        all_objects
where 
        rownum <= 2000
;

alter table t1
add constraint t1_pk primary key (id);

execute dbms_stats.gather_table_stats(user,'pt_range')
execute dbms_stats.gather_table_stats(user,'t1')

prompt  ================
prompt  dimension tables
prompt  ================

create table dim_1 
as 
select distinct 
        grp1, 
        cast('A'||grp1 as varchar2(3)) agrp1,
        cast('B'||grp1 as varchar2(3)) bgrp1
from
        t1
;

create table dim_2 as select * from dim_1;

prompt  ===============================
prompt  Primary keys required for BMJIs
prompt  ===============================

alter table dim_1 add constraint d1_pk primary key (grp1);
alter table dim_2 add constraint d2_pk primary key (grp1);

execute dbms_stats.gather_table_stats(user,'dim_1')
execute dbms_stats.gather_table_stats(user,'dim_2')

prompt  ============================
prompt  Creating bitmap join indexes
prompt  ============================

create bitmap index pt_1a on pt_range(d1.agrp1) from pt_range pt, dim_1 d1 where d1.grp1 = pt.grp1 local ;
create bitmap index pt_2a on pt_range(d2.agrp1) from pt_range pt, dim_2 d2 where d2.grp1 = pt.grp2 local ;

prompt  ====================================================
prompt  Pick your index creation order on the exchange table
prompt  ====================================================

create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;
create bitmap index t1_2a on t1(d2.agrp1) from t1, dim_2 d2 where d2.grp1 = t1.grp2 ;
-- create bitmap index t1_1a on t1(d1.agrp1) from t1, dim_1 d1 where d1.grp1 = t1.grp1 ;

prompt  ==================
prompt  Exchanging (maybe)
prompt  ==================

alter table pt_range
        exchange partition p8001 with table t1
        including indexes
        without validation
;

I’ve got the same create statement twice for one of the bitmap join indexes – as it stands the indexes will be created in the right order and the exchange will work; if you comment out the first t1_1a create and uncomment the second the exchange will fail. (If you comment out the ‘including indexes’ then the exchange will succeed irrespective of the order of index creation, but that rather defeats the point of being able to exchange partitions.)

I’ve reproduced the problem in 12.1.0.2, 11.2.0.4 and 10.2.0.5

Footnote: running an extended trace didn’t help me work out how Oracle is detecting the mismatch, presumably it’s something that gets into the dictionary cache in a general “load the index definition” step; but it did show me that (in the “without validation” case) the code seems to check the correctness of the exchange table’s primary key data BEFORE checking whether the indexes match properly.

January 11, 2016

Subquery Effects

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:50 pm GMT Jan 11,2016

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql

 

January 6, 2016

NLS Mess

Filed under: Bugs,CBO,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:18 pm GMT Jan 6,2016

The Oracle database has all sorts of little details built into it to help it deal with multi-national companies, but since they’re not commonly used you can find all sorts of odd “buggy” bits of behaviour when you start to look closely. I have to put “buggy” in quotes because some of the reported oddities are the inevitable consequences of (for example) how multi-byte character sets have to work; but some of the oddities look as if they simply wouldn’t be there if the programmer writing the relevant bit of code had remembered that they also had to cater for some NLS feature.

Here’s an example of the type of unexpected behaviour that can appear. There probably are some bugs in the area I’m going to demonstrate but, at first glance, I thought I was looking at an acceptable limitation imposed by a generic requirement. The example came from AskTom. which is why the data set isn’t my usual “t1” generation (and the formatting and capitalisation isn’t according to my usual standards).

The problem involves Case Insensitive indexing.


ALTER session SET nls_sort=binary_ci;
ALTER session SET nls_comp=linguistic;

CREATE TABLE log_data(
  account_id NUMBER,
  log_type NUMBER,
  sys_name VARCHAR2(30),
  log_time TIMESTAMP,
  msg varchar2(4000)
)
nologging
;

insert /*+ append */ into log_data(
  account_id,
  log_type,
  sys_name,
  log_time,
  msg
)
select
        5,
        2,
        dbms_random.string('a',1),
        sysdate + dbms_random.value,
        rpad('x',200)
from
        dual
connect by
        level <= 26000
;


create index log_date on log_data (
        account_id, 
        log_type, 
--      sys_name,
        NLSSORT(sys_name,'NLS_SORT=BINARY_CI'),
        log_time
)
nologging
;
  
rem     ======================================================================
rem     Need to gather stats AFTER index creation because of the hidden column
rem     ======================================================================
  
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'LOG_DATA',
                method_opt       => 'for all columns size 1'
        );
end;
/

And here’s the query I want to optimize:


SELECT 
        *
FROM
  (
    SELECT
        sys_name, log_time,  substr(msg,1,40) msg
    FROM log_data
    WHERE
      account_id=5
      AND log_type=2
      AND sys_name='a'
    ORDER BY
      log_time  desc
  )
WHERE
  rownum <= 10
;

The requirement of the query is that we see the ten most recent entries for a given combination of account_id, log_type and sys_name (ignoring case in sys_name). The orginal table has tens of millions of rows, of course, with many combinations, and some of the combinations have a very large number of entries hence the desire to find an access path that gets just the 10 rows we want without getting all the rows for a combination and sorting them before returning the ten.

Normally we would just create an index that started with the 3 columns used in the equality and ending with the column in the order by clause, and that would be enough for the optimizer to see the option for a “sort order by nosort” operation to get the required data through an index range scan; so that’s the index the code sample creates, except that since we’ve enabled case insensitive sorting we need to use a function-based index to hold the case-insensitive version of sys_name.

Here’s the execution plan we would get if we DIDN’T use the nlssort() function in the index – I’ve run the query in 11.2.0.4 and pulled the plan from memory with rowsource execution stats enabled:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   605 (100)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.02 |    1065 |       |       |          |
|   2 |   VIEW                         |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY       |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| LOG_DATA |      1 |    500 |   603   (1)|    966 |00:00:00.01 |    1065 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | LOG_DATE |      1 |    500 |   103   (3)|    966 |00:00:00.01 |     100 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2)
       filter(NLSSORT("SYS_NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6100') )

Notice particularly the filter predicate at operation 5: that’s the thing we need to get into the index before we can avoid picking up excess data and sorting it. Notice also in the A-Rows column that we acquired 966 rows from the table before sorting and discarding all but 10 of them at operation 3.

Notice especially how important it is to look at the predicate section of an execution plan to gain a full understanding of what’s happening.

So here’s the execution plan we get by default with the function-based index in place:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |    13 (100)|     10 |00:00:00.01 |     969 |       |       |          |
|*  1 |  COUNT STOPKEY                  |          |      1 |        |            |     10 |00:00:00.01 |     969 |       |       |          |
|   2 |   VIEW                          |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY        |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|    966 |00:00:00.01 |     969 |       |       |          |
|*  5 |      INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|    966 |00:00:00.01 |       5 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

It didn’t work ! (Check the A-Rows at operations 4 and 5, and the sort that we didn’t want at operation 3 where the data is finally reduced to 10 rows.

But there’s something odd going on here – look at the predicate section: our three predicates are all access predicates for the index range scan descending. We are doing exactly what we want to do with the index, but we’re not stopping after the 10 rows that we need, we’re getting all of them (in the order we want) and then doing a trivial sort and discard. Look at the Cost column – the cost at operation 4 is exactly what we might expect for the 10 rows we want to see, and the E-rows at line 5 is clearly based on our “first 10 rows” requirement.

This raises two questions:

  1. What’s gone wrong ?
  2. Can we work around the problem ?

The answer to (1) is, I think, that there’s a bug in the code. Looking at the 10053 trace file I can see the optimizer correctly handling the arithmetic of the virtual column (the sys_nc000006$) representing the function in the index and then getting to the point where it goes into a code section relating to “Recost for ORDER BY”, and brings back the original function as a filter predicate – I think that in the recosting it may be losing track of the fact that sys_nc000006$ and nlssort(sys_name, ‘nls_sort=binary_ci’) are the same thing and therefore can’t apply the rule about “Equality on 1st N columns, order by on the remainder”.

There are several answers to (2).

Workarounds

The honest hack

The first one is simply to fall back to the old (probably version 7, possibly version 8) requirement for getting the “sort order by nosort” operation – put all the index columns into the order by clause. Unfortunately the optimizer then did a tablescan rather than an index range scan because my data set was so small, so I had to hack the system stats temporarily to make the tablescan very expensive:


begin
        dbms_stats.set_system_stats('MBRC',2);
        dbms_stats.set_system_stats('MREADTIM',20); 
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000); 
end;
/

... order by account_id desc, log_type desc, sys_name desc, lot_time desc

Unfortunately the optimizer still went wrong – it did an ASCENDING index range scan sorting all the data. I actually had to hint the code to use the index in descending order to get the following execution plan:


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |  1215 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |   1000 |  1215   (1)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |  1006   (1)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |   1000 |     5   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

The A-Rows tells us we’ve accessed the minimum data set, and the absence of the SORT ORDER BY STOPKEY operation tells us that we’ve avoided doing the sort. Notice, though that the cost is the cost that would have been appropriate if we have accessed all 1,000 rows that matched the equality predicates. This is an example of a plan that you couldn’t really trust if all you had done was an “explain plan” rather than running the query and checking the rowsource execution stats. If you ignore the A-Rows it looks as if the plan WOULD get all the data in order and only eliminate the redundant rows at operation 1.

The silly surprise

The original author of the problem came up with this one. Put in two predicates which, between them are equivalent to the original requirement:


where ...
and     sys_name >= 'a'
and     sys_name <= 'a'

Clearly this is totally silly – the optimizer can fold this pair of predicates into the single predicate “sys_name = ‘a'”, so it shouldn’t make any difference. But here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

Yes, it’s (structurally) exactly the same plan, with exactly the same predicate section except that (a) it gets there without being hinted, (b) the Cost column looks appropriate all down the line, and (c) the E-Rows value for the VIEW operator would have helped us appreciate that the correct elimination was (probably) going to happen if all we had done was the Explain Plan.

The dirty hack

I know the name of the hidden column that’s causing the problem, and I know how to generate the value it has to be – so let’s give Oracle exactly what it needs to see rather than allowing its internal transformation to rewrite the SQL:

...
AND sys_nc00006$ = nlssort('a','nls_sort=binary_ci')
...


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "SYS_NC00006$"=HEXTORAW('6100') )

We get exactly the plan we need – and the silly thing about this example is that it’s a case where we get the plan we want by EXPLICITLY transforming the SQL to reproduce the transformation that Oracle had done IMPLICITLY and then messed up !

Final Choice
Of the three options – the dirty hack is definitely a no-no in production; the “double the predicate” trock is undesirable because it may depend in some unexpected way on a particular optimizer bug or on some statistical detail that could change; so I’d choose the hinted path with the (nominally) redundant columns.

One final point about this solution, we actually needed to include only the sys_name in the order by clause to use the descending range scan and early stop – which is basically another indication that it’s something about the function-based column that is breaking the normal code path.

Reference Script: nls_sort_anomaly.sql

January 4, 2016

ANSI bug

Filed under: ANSI Standard,Bugs,Oracle — Jonathan Lewis @ 1:12 pm GMT Jan 4,2016

In almost all cases the SQL you write using the ANSI standard syntax is tranformed into a statement using Oracle’s original syntax before being optimised – and there are still odd cases where the translation is not ideal.  This can result in poor performance, it can result in wrong results. The following examples arrived in my in-tray a couple of weeks ago:

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select *
    from (
        select sum(table3.table2.my_number) the_answer
            from table1
            left join table2 on table1.my_number = table2.my_number
            group by table1.my_number
        );


with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select sum(table3.table2.my_number) the_answer
    from table1
    left join table2 on table1.my_number = table2.my_number
    group by table1.my_number;

Notice the reference to table3.table2.my_number in the select list of both queries – where does the “table3” bit come from ? These queries should result in Oracle error ORA-00904: “TABLE3″.”TABLE2″.”MY_NUMBER”: invalid identifier.

If you’re running 11.2.0.4 (and, probably, earlier versions) both queries produce the following result:


THE_ANSWER
----------
         1

1 row selected.

If you’re running 12.1.0.2 the first query produces the ORA-00904 error that it should do, but the second query still survives to produce the same result as 11.2.0.4.

November 24, 2015

Table Expansion

Filed under: 12c,Bugs,Oracle,Partitioning — Jonathan Lewis @ 12:13 pm GMT Nov 24,2015

I’ve often found that while I’m investigating one Oracle feature I get waylaid by noticing anomalies in other parts of the code. I was caught by one of these events a little while ago while experimenting with the new (12.1.0.2) Inmemory Columnar Store.  After reading a posting by Martin Bach I asked the question:

“If you have a partitioned table with a local index and one of the table partitions has been declared INMEMORY, would a query that could use that index be able to apply table expansion to produce a plan that did a tablescan on the in-memory partition and an indexed access path on the partitions that weren’t in-memory?”

The answer was yes, with one important caveat – the first test I built to investigate the question did something very silly and produced the wrong results. In fact the error turned out to have nothing to do with the inmemory columnar store, it also appeared when I disabled the inmemory feature off and nudged the optimizer into table expansion by making one of the index partitions unusable. Here’s the demo, followed by a few notes, if you want to test it for yourselves:


create table t1 (
        id,
        n1,
        padding
)
partition by range (id) interval (250000) (
        partition p_start values less than (250000)
        segment creation immediate
        inmemory priority critical
)
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual 
        connect by 
                level <= 1e4
)
select
        rownum - 1              id,
        mod(rownum - 1,1250)    n1,             -- 200 rows per segment
        rpad('x',400,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(n1) local nologging;

I’ve created an interval-partitioned table holding a million rows with an interval of 250,000, and I’ve defined the first partition with an upper bound of 250,000 and the inmemory attribute, leaving Oracle to add the other three partitions which won’t have the inmemory attribute set.

I’ve created a local index on the n1 column, with 200 rows (1,250 distinct values) per partition. Because of the way I’ve defined n1 I can easily recreate the table to adjust the relative code of the index accessed path and the tablescan path by adjusting the value I use in the mod() function. The use of the mod() function also means that every partition holds the same volume of data (with exactly the same degree of scattering) for any specific value of n1.

To test the option for table expansion I’ve then checked the execution plan for (and run) four queries:


select
        /*+ full(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000
;

select
        /*+ index(t1) */
        id
from
        t1
where
        n1 = 1
and     id < 250000 
;
 
select
        /*+ full(t1) */
        id 
from    t1
where   n1 = 1
and     id >= 250000
and     id <  500000
;

select
        id
from
        t1
where
        n1 = 1
;

The first two queries give me the cost of accessing the inmemory partition by tablescan compared to the cost of accessing it by index. The third query is there to demonstrate that the non-inmemory tablescan is a LOT more expensive than the inmemory one. The final query accesses all four partitions to see if the inmemory partition is treated differently from the other three. Here are the four plans:

Default tablescan when limited to the in-memory partition
----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |   200 |  1800 |   184   (9)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE     |      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory("N1"=1 AND "ID"<250000)
       filter("N1"=1 AND "ID"<250000)



Index access is more expensive than the inmemory tablescan
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   200 |  1800 |   201   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |       |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   200 |  1800 |   201   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   200 |       |     1   (0)| 00:00:01 |     1 |     1 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ID"<250000)
   3 - access("N1"=1)



Tablescan on a non-inmemory partition is much higher than inmemory (and indexed access)
-----------------------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |   200 |  1800 |   1891  (2)| 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE|       |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
|*  2 | TABLE ACCESS FULL     | T1    |   200 |  1800 |   1891  (2)| 00:00:01 |     2 |     2 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"=1 AND "ID">=250000 AND "ID"<500000)



Access all four partitions - no table expansion. Cost = 4 x cost of single partition indexed access path.
--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |       |   800 |  7200 |   807   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL                       |       |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1    |   800 |  7200 |   807   (1)| 00:00:01 |     1 |1048575|
|*  3 |    INDEX RANGE SCAN                        | T1_I1 |   800 |       |     6   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=1)

So we can see that the inmemory tablescan of a single segment is cheaper than the indexed access path, and much cheaper than the tablescan of a single non-inmemory segment. When we execute the query that accesses all four segments, though, the optimizer chooses to use the same access path for all four partitions rather then splitting the query into one tablescan and three indexed accesses.

This is just a starting point for testing though – there are transformations where the optimizer will only use a particular transformation if the transformed query has a cost that is “X%” lower than it would be without the transformation (and some of these transformations have associated parameters – usually hidden – that can be set to adjust the percentage). Perhaps if I adjusted the data so that the relative benefit of a single inmemory scan was larger; or if I created a table with more partitions and had two inmemory and three not inmemory, or 3 inmemory and 4 not inmemory; or if I had more inmemory partitions than not then maybe the optimizer would spontaneously do table expansion.

Rather than fiddle about with the data my next step was to hint the final query with /*+ expand_table(t1) */. Here’s the resulting execution plan:


------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |       |       |  1797 (100)|          |       |       |
|   1 |  VIEW                                        | VW_TE_1 |  1600 | 20800 |  1797   (1)| 00:00:01 |       |       |
|   2 |   UNION-ALL                                  |         |       |       |            |          |       |       |
|   3 |    PARTITION RANGE SINGLE                    |         |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |   200 |  1800 |   184   (9)| 00:00:01 |     1 |     1 |
|   5 |    PARTITION RANGE ITERATOR                  |         |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   600 |  5400 |   806   (1)| 00:00:01 |     2 |     4 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     5   (0)| 00:00:01 |     2 |     4 |
|   8 |    PARTITION RANGE INLIST                    |         |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |   800 | 16800 |   807   (1)| 00:00:01 |       |       |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |   800 |       |     6   (0)| 00:00:01 |       |       |
------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000)) 6 - filter(("T1"."ID">=250000 AND "T1"."ID"<1000000))
   7 - access("N1"=1)
  10 - access("N1"=1)

Points to notice:
Table expansion has occurred – we did an inmemory full tablescan on the inmemory partition at a cost of 184, and index range scans on the other three partitions at a cost of 805 (which looks more like the cost of visiting 4 partitions), expecting 800 rowids from the local index (a failure of partition elimination) and 600 rows from the table. The partition start/stop columns do show that the optimizer expects to visit just the three correct partitions, although some of the cost and cardinality numbers seem to be appropriate to 4 partitions.

Worse still we see a third branch to the UNION ALL / table expansion – operations 8 to 10 – which don’t report a partition start and stop. What’s that doing ? Sometimes, of course, we see sections of plan that don’t actually run – usually preceded by a FILTER operation that can evaluate to FALSE – so maybe that’s what’s happening here. Re-running the query with rowsource execution stats enabled it turned out that the PARTITION RANGE INLIST started once, and operations 9 and 10 didn’t operate at all – so that’s okay.

But what happens if I execute a query that should only access the first two partitions ? Here’s the run-time plan to answer that question:


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
SQL_ID  5dgp982ffsph8, child number 0
-------------------------------------
select  /*+ expand_table(t1) gather_plan_statistics */         id from
       t1 where  n1 = 1 and id < 500000

Plan hash value: 2876620067

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.05 |     649 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.05 |     649 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.01 |     649 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |       3 |
|*  4 |     TABLE ACCESS INMEMORY FULL               | T1      |      1 |    200 |    200 |00:00:00.01 |       3 |
|   5 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  6 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     215 |
|*  7 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      15 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - inmemory(("N1"=1 AND "T1"."ID"<250000))
       filter(("N1"=1 AND "T1"."ID"<250000))
   6 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   7 - access("N1"=1)
  10 - access("N1"=1)

My query should have returned 400 rows, accessing only the first two partitions. It returned 800 rows, accessing all 4 partitions. (Notice how there are no predicates on ID for operations 9 and 10).

Since I had started my investigation with the inmemory option I assumed at this point that the problem was somehow related to the that feature; which shows how easy it is to get too focused and jump to conclusions. After raising the problem with Oracle I got a reply that the problem wasn’t about the inmemory columnar store – and here’s the next little change to test to demonstrate that point:


alter index t1_i1 indexing partial;

alter table t1 modify partition p_start indexing off;
alter table t1 modify partition p_start no inmemory;

select  partition_name, indexing, inmemory 
from    user_tab_partitions 
where   table_name = 'T1';

select  partition_name, segment_created    
from    user_ind_partitions 
where   index_name = 'T1_I1';

Enable partial indexing for the index, switch off the index on the first partition then disable the inmemory option for the partition. This is the plan I got from re-running the two-partition query:


---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |         |      1 |        |    800 |00:00:00.08 |   15370 |  14706 |
|   1 |  VIEW                                        | VW_TE_1 |      1 |   1200 |    800 |00:00:00.08 |   15370 |  14706 |
|   2 |   UNION-ALL                                  |         |      1 |        |    800 |00:00:00.07 |   15370 |  14706 |
|   3 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  4 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      1 |    200 |    200 |00:00:00.01 |     216 |      0 |
|*  5 |      INDEX RANGE SCAN                        | T1_I1   |      1 |    200 |    200 |00:00:00.01 |      16 |      0 |
|   6 |    PARTITION RANGE SINGLE                    |         |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|*  7 |     TABLE ACCESS FULL                        | T1      |      1 |    200 |    200 |00:00:00.04 |   14723 |  14706 |
|   8 |    PARTITION RANGE INLIST                    |         |      1 |    800 |    400 |00:00:00.01 |     431 |      0 |
|   9 |     TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| T1      |      2 |    800 |    400 |00:00:00.01 |     431 |      0 |
|* 10 |      INDEX RANGE SCAN                        | T1_I1   |      2 |    800 |    400 |00:00:00.01 |      31 |      0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."ID"<500000 AND "T1"."ID">=250000))
   5 - access("N1"=1)
   7 - filter(("N1"=1 AND "T1"."ID"<250000))
  10 - access("N1"=1)


Again I accessed all 4 partitions and returned 800 rows. As an odd little detail the plan reversed the order of access of partitions 1 and 2. The problem isn’t about the inmemory option, it’s a problem with generic table expansion.

Continuing Tests
I won’t go into all the details of what I did next – once you get started it’s hard to stop, and easy to lose track of what you’ve done. So here’s a quick summary.

Given the problem appeared without tthe inmemory columnar store enabled, I switched back to 11.2.0.4 (where table expansion is also possible) and emulated the problem by setting the first index partition unusable (since partial indexing is a 12c feature). Table expansion did not occur even when hinted: the plan was a full tablescan on both partitions.

So I switched the table from being interval partitioned to simple range partitioned, creating all 4 partitions as I created the data. In 11g I got table expansion and the correct answer without the extra branch to the union all; so I went back to 12c and did the same – pure range partitioning, partial indexing, and got table expansion with the correct result and no spare branches to the plan.

Tentative Conclusion

Clearly my testing is by no means exhaustive – but 12c seems to allow table expansion for interval partitioning in cases where 11g does not; unfortunately it is possible for the table expansion code to go wrong for interval partitioning in cases where simple range partitioning does not. In my case this led to wrong results.

 

Reference Script: in_memory_table_expansion_*.sql

 

 

November 9, 2015

Wrong Results

Filed under: 12c,Bugs,Oracle — Jonathan Lewis @ 6:23 am GMT Nov 9,2015

Here’s a little gem in 12c that arrived in my email a few days ago: a query where the result depends on the SQL*Plus arraysize!

The email had a short description, and a script to create a small data set that would demonstrate the problem. I’m not going to show you the query, or the result set, but here’s a sample of the output from an SQL*Plus session after creating the data. This is, by the way, on a “single-user” system – there is no way that some other session is changing the data – especially after the opening “set transaction”:

SQL> set transaction read only;

Transaction set.

SQL> set arraysize 1
SQL> select ...

...

541 rows selected.

SQL> set arraysize 4
SQL> select ...

...

599 rows selected.

SQL> set arraysize 10
SQL> select ...

...

620 rows selected.

SQL> set arraysize 32
SQL> select ...

...

616 rows selected.

The correct result set should have had the 616 rows reported when the arraysize was set to 32 (of, course, it’s possible with an arraysize of 32 the 616 rows returned weren’t the correct 616 rows – rows seemed to get multiplied or deleted fairly arbitrarily as the arraysize changed).

The execution plan was a little unusual in that it forced a nested loop join with a tablescan on the inner table; and when I ran the query with rowsource execution statistics enabled the number of starts of the inner tablescan was 1,597 but the number of rows actually returned varied. My first thought was that some new mechanical optimisation of the tablescan code was losing track of where it had got to in the repeated tablescans – but it turned out I was wrong.

Here’s the execution plan (with camouflage) – the key detail is in a section I didn’t look at intially, the column projection:


select * from table(dbms_xplan.display_cursor('0dh0kh9qa88mz',1,'-note +projection'));

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       | 14118 (100)|          |
|   1 |  NESTED LOOPS       |      |    29 |  2958 | 14118   (2)| 00:00:01 |
|*  2 |   HASH JOIN         |      |   892 | 57088 |    35   (3)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| ABC  |   549 | 21411 |    17   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DEF  |   892 | 22300 |    17   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | XYZ  |     1 |    38 |    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEF"."ABC_FK"="ABC"."ABC_ID")
   5 - filter(("DEF"."COL0"="XYZ"."COL0" AND "XYZ"."COL1"="ABC"."COL1"
              AND "XYZ"."COL2"="ABC"."COL2"))

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20], "XYZ"."COL1"[NUMBER,22],
       "XYZ"."COL2"[NUMBER,22], "XYZ"."COL0"[VARCHAR2,20]
   2 - (#keys=1) "ABC"."ABC_ID"[NUMBER,22], "DEF"."ABC_FK"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "DEF"."COL0"[VARCHAR2,20]
   3 - (rowset=200) "ABC"."ABC_ID"[NUMBER,22], "ABC"."COL1"[NUMBER,22],
       "ABC"."COL2"[NUMBER,22]
   4 - (rowset=200) "DEF"."ABC_FK"[NUMBER,22], "DEF"."COL0"[VARCHAR2,20]
   5 - "XYZ"."COL1"[NUMBER,22], "XYZ"."COL2"[NUMBER,22],
       "XYZ"."COL0"[VARCHAR2,20]

The predicate section is irrelevant in this case, and I’ve camouflaged the names of the tables and columns – the only interesting bit is the appearance of the (rowset=200) in the projection information. This is reporting a feature new in 12c (and not to be confused with Oracle Java Rowsets) that should improve the performance of some queries.

I didn’t actually look at the projection information until after I’d asked the Oak Table members if they had ever seen this type of anomaly before – and Stefan Koehler emailed back to suggest that the problem might be related to rowsets (there are a couple of similar bugs on MoS, e.g: 17016479 and 20960570) – so I checked the projection, then repeated my tests after disabling the feature with a call to: ‘alter session set “_rowsets_enabled”=false;’

Problem solved – although I’ve told the person who emailed me to report this discovery and workaround to Oracle support and see what they supply as the approved solution.

It is possible to affect the feature through event 10055 – different levels disable it at different locations in the code; the list of options is given in the $ORACLE_HOME/rdbms/mesg/oraus.msg file (if you’re not running Windows):


//            Level:
//            0x00000001 - turn off for table scan
//            0x00000002 - turn off for hash join consume
//            0x00000004 - turn off for hash join produce
//            0x00000008 - turn off for group by
//            0x00000010 - turn off for sort
//            0x00000020 - turn off for table-queue out
//            0x00000040 - turn off for table-queue in
//            0x00000080 - turn off for identity
//            0x00000100 - turn off for granule iterator
//            0x00000200 - turn off for EVA functions
//            0x00000400 - turn off for PL/SQL
//            0x00000800 - turn off for upgrade
//            0x00001000 - turn off for database startup
//            0x00002000 - turn off for blobs and clobs
//            0x00004000 - turn off for tracing row source
//            0x00008000 - turn off rowset information in explain plan
//            0x00010000 - disable hash join rowsets fast path
//            0x00020000 - turn off for bloom create
//            0x00040000 - turn off for bloom use
//            0x00080000 - disable prefetch for hash join
//            0x00100000 - disable prefetch for bloom
//            0x00200000 - disable semi blocking hash join
//            0x00400000 - turn off rowset for fixed table

I tried the first few levels and found that both levels 1 and 2 eliminated the problem (and eliminated the appearance of the (rowset=200) entry in the projection information). Given the shape of the plan I had thought that just one of 1,2 or 4 might have been relevant so I was a little surprised to find that both 1 and 2 were effective – but that’s probably just a question of interpretation of the brief descriptions.

Update

I’ve asked the owner of the problem if it’s okay to post the script to create the tables and data – and the answer was yes: the content was already heavily camouflaged anyway. So here’s a file you can download if you want to test other environments: insert_script_12c_bug

It’s declared as a “.doc” file to get past the upload process, but it’s really a flat text file.

Update 16th Nov

The official “minimum impact” workaround is to set event 10055 at level 2097152 (disable semi blocking hash join). Alternatively there is also a patch available. See Mike Dietriech’s blog for details: https://blogs.oracle.com/UPGRADE/entry/update_for_switch_off_rowsets

Update 7th Dec

Mike Dietriech has updated his second posting to add details of a patch for this bug.

 

October 23, 2015

Histogram Limit

Filed under: Bugs,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:03 pm GMT Oct 23,2015

A surprising question came up on OTN a couple of days ago:

Why does a query for “column = 999999999999999999” run slower than a query for “column > 999999999999999998” (that’s 18 digit numbers, if you don’t want to count them). With the equality predicate the query is very slow, with the range-based predicate perfomance is good.

In the absence of further information there are various reasons why this is possible – but the example in question was about a “versioning” table where the single very large value was used as the “not yet ended” value for the history of an id so, at a minimum, the table held columns (id, nstart, nend, other), and each id could appear many times with pairs of start and end values that supplied non-overlapping, covering ranges and one row that had the very large number as the end value.

Let’s jot down a few ideas about what the data (and stats) might look like.

Assuming every id appears “a few” time and every id has to have a “still valid” row this means that a very large fraction (say 10% to 25%, if “a few” means 4 to 9) of the rows hold the value 999999999999999999.

If you gather stats without a histogram then you should get the low and high, finding that the high is 999999999999999999 and that the range is enormous, and so the predicates “column = {high value}” and “column > {high value} – 1” should give very similar cardinalities.

If you collect stats with a histogram you should find the very popular high value even in a very small sample set (which is what happens with histogram collection in 11g, and even in 12c for hybrid histograms). In this case the histogram should spot the significance of the high value and again the two predicates should have very similar cardinalities.

At first sight there doesn’t seem to be a feasible way that the two cardinalities could be sufficiently different to cause a problem – so maybe there’s something about character conversion or maybe bind variable usage that hasn’t been mentioned. So to test a couple of the less likely ideas I built a data set using 11.2.0.4 – and found a bug:


create table t1 (
        id              number(18),
        nend            number(18),
        n1              number(18),
        small_vc        varchar2(10)
)
nologging
;

insert /*+ append */ into t1
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum -1               id,
        case mod(rownum - 1,4)
                when 0  then 999999999999999999
--              when 0  then 999999999999999
                        else mod(rownum - 1, 250000)
        end                     nend,
        rownum - 1              n1,
        rpad('x',10,'x')        small_vc
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 ; commit; select * from t1 where nend = 0; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size auto'
        );
end;
/

I have an nend column that is set to 999999999999999999 every 4th row in the table and otherwise has 4 rows per value for 187,500 other (relatively low) values. It’s probably a reasonable initial model of the original data. I’ve run a query with a predicate referencing nend before gathering stats so that the (default) auto option will build a histogram for nend. Then I’ve checked the execution plans for two critical queries:


explain plan for
select  *
from    t1
where   nend = 999999999999999999
;

select * from table(dbms_xplan.display);

explain plan for
select  *
from    t1
where   nend > 999999999999999998
;

select * from table(dbms_xplan.display);

And here’s the surprise – the two plans, in order:

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     4 |   108 |   625   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |     4 |   108 |   625   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND"=999999999999999999)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   250K|  6591K|   627   (9)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T1   |   250K|  6591K|   627   (9)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("NEND">999999999999999998)


Clearly this should not happen – the equality test is way off, the inequality test is correct. The obvious first guess is that something funny has happened with the statistics so let’s see what they look like – the column stats (user_tab_cols) and the histogram stats (user_tab_histograms) seem like a good starting point:


select
        sample_size, num_distinct, histogram, num_buckets,
        substr(low_value,1,26) low_value, substr(high_value,1,26) high_value
from
        user_tab_cols
where
        table_name = 'T1'
and     column_name = 'NEND'
;

column endpoint_value format 999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'NEND'
order by
        endpoint_number
;

And here are the results (with a couple of hundred uninteresting rows eliminated from the histogram):


      Sample     Distinct HISTOGRAM          Buckets LOW_VALUE                  HIGH_VALUE
------------ ------------ --------------- ---------- -------------------------- --------------------------
       5,541      190,176 HEIGHT BALANCED        254 C102                       C9646464646464646464


ENDPOINT_NUMBER               ENDPOINT_VALUE
--------------- ----------------------------
              0                           19
              1                        1,225
              2                        2,503
              3                        3,911
              4                        4,806
...
            188                      247,479
            189                      248,754
            190                      249,862
            254    1,000,000,000,000,000,000

Oracle will have started with a 100% sample to collect stats on all the columns, but taken a small sample to test the need for a histogram on the nend column – and that’s why the sample size of 5,541 has appeared, but that’s not relevant to the problem in hand. The big question comes from endpoint_number 254 – why is the highest value in the histogram 1e19 when we know (and the column stats show) that the highest value is actually 999999999999999999 ?!

It’s a question to which I don’t have an answer – but I do know that

  • if your high value is 15 digits long (all 9s) then the histogram shows the right high value
  • if your high value is more that 15 9s then the histogram shows the high value plus 1
  • the value collected by the query that Oracle runs is the actual value (i.e. 18 9s)
  • if you use set_column stats to set 18 9s as the high value you still get 1e19 in the histogram

Once you see the stats you can understand why the OP sees the odd performance problem. If the histogram identifies 1e19 as a (very) popular value, leaving 999999999999999 as an “average” value with only 4 rows; on the other hand the query for greater than 999999999999999998 can see that there really are 250K rows with higher values.

Footnote:

Interestingly 12c does the same with the stats – introducing the 1e19 in the histogram – but still manages, somehow, to calculate the correct cardinality in the equality case. (There is one slight difference in 12c, the histogram is a hybrid histogram, not a height-balanced histogram).

Update 24th Oct 2015:

I realised late last night that I had written about this behaviour before – though possible just as a response on OTN or in some unpublished notes. According to the notes in a script called histogram_numeric_bug.sql that I found on my laptop this is “Bug 18514507 : WRONG CARDINALITY ESTIMATES WHEN NUMERIC VALUE IS LONGER THAN 15 BYTES”. (except that the base bugs that that one is linked to are probably not the same bug).

I suspect the problem is related to the way that character histograms are built based on a numeric representation of the string that takes the first few characters of the string, treats that resulting N bytes as a hex number, converts to decimal and then applies round(N,-14) to restrict the precision stored. The effect with strings is that (broadly speaking) you get fifteen digits precision – which is exactly what I seem to be seeing with numbers.

The reason that 12c can get the right answers despite storing the wrong endpoint_value is that it’s also storing to_char() of the right value as the endpoint_actual_value – possibly doing this any time it has had to round the endpoint_value (for character strings the endpoint_actual_value was only populated if two entries in the endpoint_value were the same).

 

 

 

 

September 22, 2015

Result Cache

Filed under: 12c,Bugs,Infrastructure,Oracle,Performance — Jonathan Lewis @ 10:11 am GMT Sep 22,2015

Yesterday I thought I’d spend half an hour before breakfast creating a little demonstration of a feature; some time about midnight I felt it was time to stop because I’d spent enough time chasing around a couple of bugs that produced wrong results in a variety of ways. Today’s short post is just little warning: be VERY careful what you do with the PL/SQL result cache – if you use the results of database queries in the cache you may end up with inconsistent results in your application. Here’s one very simple example of what can go wrong, starting with a little script:


alter system flush shared_pool;

truncate table gtt1;
drop table gtt1;
create global temporary table gtt1 (n1 number)
on commit preserve rows
;

insert into gtt1 values(1);
commit;

execute dbms_stats.gather_table_stats(user,'gtt1');


create or replace function f_cache
return number 
result_cache 
-- relies_on (gtt1)
is
        m_ret number;
begin
        select  max(n1)
        into    f_cache.m_ret
        from    gtt1
        ;

        return f_cache.m_ret;
end;
/

execute dbms_output.put_line(f_cache)

Here’s the output from a session that’s just connected and run this script (the table already existed from an earlier run):

SQL> @temp

System altered.


Table truncated.


Table dropped.


Table created.


1 row created.


Commit complete.


PL/SQL procedure successfully completed.


Function created.

1

PL/SQL procedure successfully completed.

SQL> 

Let’s call this session A, and I’m going to connect with another session which I’ll call session B. The following extracts are cut-and-pastes as I alternate between sessions:

Session B:

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

SQL> insert into gtt1 values(0);

1 row created.

SQL> execute dbms_output.put_line(f_cache);
0

PL/SQL procedure successfully completed.


Session B has just seen the data inserted into a global temporary table by session A; but after inserting a row into the table it now sees what it perceives to be the correct answer.

Session A:

SQL> truncate table gtt1;

Table truncated.

SQL> execute dbms_output.put_line(f_cache);
1

PL/SQL procedure successfully completed.

Despite truncating the table, session A still sees the data which has been eliminated unrecoverably.

Session B (where I hadn’t yet committed):


SQL> commit;

Commit complete.

SQL> 

Session A (where I’ve done nothing new):

SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

The row has finally “disappeared” because session B committed.

Session B (where I haven’t done anything since committing):


SQL> execute dbms_output.put_line(f_cache);


PL/SQL procedure successfully completed.

SQL> 

Session B no longer sees the data because it’s now seeing what session A has just seen.

Warning.

This is just one of several ways I managed to get surprising inconsistencies when using the (cross-session) PL/SQL Result Cache. Oracle (in 12c) is supposed to know what the cache relies on and react accordingly, but it doesn’t manage to do it properly (even if you include the relies_on clause) – if you query the database in your PL/SQL you may find strange things happen.

The most important point to note in this example is that a session that wasn’t necessarily doing anything wrong got the wrong results because of the actions of another session that Oracle should have protected it from.

The good thing about this is example that it’s documented (sort of) – as it says in the manual: “to be result cached … does not reference … temporary tables ..”. So that should warn people off copying my example; on the other hand the problem I’m seeing arises because Oracle seems to be trying to use the result cache when the manuals suggests it shouldn’t be.

Conclusion

Do not mix the pl/sql result cache with database queries. The cache is public but (unlike the buffer cache) it is not guaranteed to give you read-consistency.

If you think this was an unrealistic example and you don’t need to worry about it – I’ll post a couple more examples in the next couple of weeks. They’ll be slightly special cases again, but I find the Oracle world is full of special cases.

Update

This behaviour is now visible on MoS as “Bug 21907155 : PL/SQL RC FN SHOWS ONE SESSION THE CONTENTS OF ANOTHER SESSION’S GTT”

September 2, 2015

IN/EXISTS bugs

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 8:11 am GMT Sep 2,2015

Here’s a simple data set – I’m only interested in three of the columns in the work that follows, but it’s a data set that I use for a number of different models:


execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;
create table t2 nologging 
as
select * from t1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

The columns I want to consider are n_3, n_400, and n_1000. As their names suggest the columns have 3, 400, and 1000 distinct values respectively and since I’ve used the dbms_random.value() function to generate the data the distinct values are fairly evenly spread across the million rows of the table.

Consider, then, the following two queries:


select
        *
from
        t1
where
        exists (
                select  null
                from    t2
                where   n_1000 = 0
                and     t2.n_400 = t1.n_400
                and     t2.n_3 = t1.n_3
        )
;


select
        *
from
        t1
where
        (t1.n_400, t1.n_3) in (
                select  t2.n_400, t2.n_3
                from    t2
                where   t2.n_1000 = 0
        )
;

The first point to check is that these two queries are logically equivalent.

Once you’re happy with that idea we can work out, informally, how many rows we should expect the queries ought to return: there are 1,200 combinations for (n_400, n_3) so each combination should return roughly 833 rows; if we pick 1,000 rows from the 1 million available we can expect to see 679 of those combinations (that’s Alberto Dell’Era’s “selection without replacement” formula that Oracle uses for adjusting num_distinct to allow for filter predicates). So we might reasonably suggest that the final number of rows as 833 * 679 = 565,607. It turns out that that’s a pretty good estimate – when I ran the query the result was actually 567,018 rows.

So what does Oracle produce for the two execution plans – here are the result from 12c (EXISTS first, then IN):


===================
Multi-column EXISTS
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_400"="T1"."N_400" AND "T2"."N_3"="T1"."N_3")
   2 - filter("N_1000"=0)

===================
Equivalent IN query
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N_400"="T2"."N_400" AND "T1"."N_3"="T2"."N_3")
   2 - filter("T2"."N_1000"=0)

The first thing to note is that the shape of the plans is identical, and the predicate sections are identical – but the final cardinalities are different. Clearly at least one of the cardinalities has to be wrong by a significant amount (7.5% or 10.4%, depending which way round you want to look at it). If you run the test on 11.2.0.4 you find that both plans give the same estimated row count – and it’s the 920,000 rows; so arguably 12c has “fixed” the IN subquery calculation, bringing it closer to a reasonable prediction, but it hasn’t fixed the EXISTS subquery calculation. That 833K prediction, by the way, is what you would expect to see with this data with a basic join – and a semi-join shouldn’t be able to produce more data than  a join.

But both predictions are way off the (informal) expectation, so how have they appeared ?

Working backwards it’s easy to spot that: 833K = 833 * 1,000: Oracle is behaving as if every single row identified in the subquery will produce a separate combination of (n_400, n_3). If we reverse engineer 920K we get: 920K / 833 = 1104 – it would appear that the optimizer thinks the 1,000 rows produced by the subquery will produce 1,104 distinct combinations of (n_400, n_3) so we how did the impossible 1,104 appear in the arithmetic.

If you apply the “selection without replacement” formula to picking 1,000 rows with 400 distinct values from 1,000,000 rows the expected number of distinct values (with rounding) will be 368; if you apply the formula for picking 1,000 rows with 3 distinct values from 1,000,000 rows the expected number will be 3. And 3 * 368 = 1,104. (Remember that in my original estimate I applied the formula after multiplying out the combination of distinct values). The optimizer is using its standard methods, but using internediate results in an unsuitable fashion.

It’s impossible to say what the impact of this particular code path – and the change on the upgrade – might be. The optimizer has over-estimated by 47% in one case and 62% in the other but (a) there may be something about my data that exaggerated an effect that few people will see in the wild and (b) in many cases getting in the right ballpark is enough to get a reasonable plan, and a factor of 2 is the right ballpark.

Of course, a few people will be unlucky with a few queries on the upgrade where the estimate changes – after all a single row difference in the estimate can cause the optimizer to flip between a hash join and a nested loop – but at least you’ve got a little extra information that might help when you see a bad estimate on an important semi-join.

So is there a workaround ? Given that I’ve got 12c, the obvious thing to try is to create a column group at both ends of the semi-join and see what happens. It shouldn’t really make any difference because column groups are targeted at the problems of correlated column – but we might as well try it:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (n_400,n_3) size 1')
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for columns (n_400,n_3) size 1')

Unfortunately when I did this the final cardinality estimate for both queries dropped to just 833 (the absence of a K on the end isn’t a typo!).

Manually unnesting got me closer:


select
        *
from
        (
        select  distinct n_3, n_400
        from    t2
        where   n_1000 = 0
        )       sq,
        t1
where   
        sq.n_400 = t1.n_400
and     sq.n_3 = t1.n_3
;

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   649K|    33M|  1260  (11)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   649K|    33M|  1260  (11)| 00:00:01 |
|   2 |   VIEW               |      |   779 | 20254 |   612   (8)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |   779 |  8569 |   612   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SQ"."N_400"="T1"."N_400" AND "SQ"."N_3"="T1"."N_3")
   4 - filter("N_1000"=0)

The cardinality of 649K is (allowing for rounding) 833 * 779; so we need to know where the 779 came from. It’s the optimizer standard arithmetic for “distinct” – multiply the N individual selectivities together then divide by the sqrt(2) “N-1” times. So we apply the “selection without replacement formula twice”:

  • adjusted selectivity of n_400 = 367.21
  • adjusted selectivity of n_3 = 3
  • 367.21 * 3 / sqrt(2) = 779

If you create column group statistics for (n_400, n_3) this doesn’t change the optimizer’s estimate for the number of distinct combinations after selection – maybe that’s another enhancement in the pipeline – but, at least in this case, the manual unnesting has got us a little closer to the right estimates without any statistical intervention.

Footnote:

Just for the sake of completeness, here are the plans (with yet more cardinality predictions) that you get if you block the unnesting:


select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;



---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1179 | 33012 |   766K (12)| 00:00:30 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    26M|   632  (11)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   3 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)



=====================================
Unnesting blocked and subquery pushed
=====================================
select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest push_subq */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 50000 |  1367K|  1271  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | T1   | 50000 |  1367K|   632  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST */ 0 FROM "T2"
              "T2" WHERE "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   2 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)

The 1179 comes from the magic of sqrt(2):  1179 = 1,000,000 / (400 * 3 / sqrt(2)).

The 50,000 is just the basic “I dunno, let’s call it 5%”.

 

Reference script: aggregate_selectivity_c.sql

 

July 27, 2015

Subquery Factoring (10)

Filed under: Bugs,CBO,Oracle,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 1:26 pm GMT Jul 27,2015

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my playing around investigation in 12.1.0.2 – first some sample data:


create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner > 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization as the default, and that’s what happened (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table. The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer were smart enough to assume that the temporary table would be in the cache for the second scan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded).

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples. Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

July 22, 2015

Invalidation

Filed under: ANSI Standard,Bugs,Oracle — Jonathan Lewis @ 12:45 pm GMT Jul 22,2015

Someone who attended my sessions at the Bucharest Oracle Summit earlier on this year sent me an example of a quirky little bug, possibly related to the newer “fine-grained” invalidation mechanisms, possibly related to ANSI syntax SQL, that’s very easy to reproduce. (That’s always nice for Oracle support – a perfect test case.)

All it takes is two tables and a packaged procedure that queries those tables. The package is coded to do something that should not be allowed in production code; but “should not” and “is not” are very different things. For anyone who wants to play with the example, here’s the script to create the necessary objects:


drop package pkg_test;
drop table t2 purge;
drop table t1 purge;

create table t1 (id1 number, val1 varchar2(10));
create table t2 (id2 number, val2 varchar2(10));

insert into t1 values(1,rpad('x',10,'x'));
insert into t2 values(1,rpad('x',10,'x'));

execute dbms_stats.gather_table_stats(user,'t1')
execute dbms_stats.gather_table_stats(user,'t2')

create or replace package pkg_test is
   procedure pr_call;
end pkg_test;
/

create or replace package body pkg_test as

   procedure pr_call is

      cursor cur_ids is
         select *                   -- Naughty !
           from t1
           join t2
             on t2.id2 = t1.id1
         ;

      rec_id cur_ids%rowtype := null;

   begin
      open cur_ids;
      fetch cur_ids into rec_id;
      close cur_ids;
      dbms_output.put_line(rec_id.val1 || '-' || rec_id.val2);

   exception
      when others then
         if cur_ids%isopen then
            close cur_ids;
         end if;
         raise;

   end pr_call;

end pkg_test;
/

Having created the procedure I’m now going to call it – and then add a column to table t1. What’s that going to do to a packaged procedure with a “select *”?

Pause for thought …

Here’s some SQL to run the test.


set serveroutput on

prompt   *** Make a first call to the procedure: no error ***
execute  pkg_test.pr_call

prompt   *** add a column to one of the tables
alter table t1 add col_test varchar2(20);

prompt   *** Make two more calls to the procedure: ouch! ***
execute  pkg_test.pr_call
execute  pkg_test.pr_call

prompt  *** Recompile before a third call ***
execute  dbms_ddl.alter_compile('package body', user, 'pkg_test')
execute  pkg_test.pr_call

Unless I’ve managed to cut-n-paste the wrong bits of code, you would have got the following error for the 2nd and 3rd calls to the package:


BEGIN
*
ERROR at line 1:
ORA-00932: inconsistent datatypes: expected - got -
ORA-06512: at "TEST_USER.PKG_TEST", line 25
ORA-06512: at line 2

The package body should (I believe) have invalidated and recompiled itself for the second execution, and even if it failed on the first attempt surely it should have invalidated itself on the ORA-932 and recompiled itself and succeeded on the third execution.  (If you remove the exception clause you’ll find that the error is intially raised at the fetch, by the way).

If we change the “select *” to explicitly name the columns we want, viz:“select t1.id1, t1.val1, t2.id2, t2.val2” we don’t get the ORA-00932 errors (just as we would probably expect). What we might not expect is that the errors also disappear if we leave the “select *” in place but change the query from ANSI syntax to traditional Oracle syntax.

Footnote:

Obviously you shouldn’t use the lazy “*” notation in any production code – it can cause several different problems (including the dangers of “whoops, I didn’t mean to make that one invisible”) – but if you do you may find that you end up with packaged procedures that crash for no apparent reason until you recompile them. Perhaps ORA-00932 is the only possible error message, but maybe it’s possible to cause other errors to appear. Even worse, though I haven’t tried to force it yet, you may find that you can construct cases where the package reports no error but modifies the wrong data.

I’ve tested this code on versions 11.2.0.4 and 12.1.0.2 and see the same results on both.

July 20, 2015

12c Downgrade

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:12 pm GMT Jul 20,2015

No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum <= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.

You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.

As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.

The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation.  This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available  to standard joins (in cases such as “join to parent”, perhaps).

October 31, 2014

first_rows(10)

Filed under: Bugs,CBO,Execution plans,Oracle — Jonathan Lewis @ 5:31 pm GMT Oct 31,2014

No, not the 10th posting about first_rows() this week – whatever it may seem like – just an example that happens to use the “calculate costs for fetching the first 10 rows” optimizer strategy and does it badly. I think it’s a bug, but it’s certainly a defect that is a poster case for the inherent risk of using anything other than all_rows optimisation.  Here’s some code to build a couple of sample tables:


begin
	dbms_stats.set_system_stats('MBRC',16);
	dbms_stats.set_system_stats('MREADTIM',12);
	dbms_stats.set_system_stats('SREADTIM',5);
	dbms_stats.set_system_stats('CPUSPEED',1000);
end;
/

create table t1
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(1,1000))	n1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t1_n1 on t1(id, n1);

create table t2
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	rownum					id,
	trunc(dbms_random.value(10001,20001))	x1,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;

create index t2_i1 on t2(x1);

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);

end;
/


create or replace view  v1
as
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 101 and 300
union all
select 
	id, n1, small_vc, padding
from	t1 
where	n1 between 501 and 700
;

The key feature of this demonstration is the UNION ALL view and what the optimizer does with it when we have first_rows_N optimisation – this is a simplified model of a production problem I was shown a couple of years ago, so nothing special, nothing invented. Here’s a query that behaves badly:


select
	/*+ gather_plan_statistics */
	v1.small_vc,
	v1.n1
from
	v1,
	t2
where
	t2.id = v1.id
and	t2.x1 = 15000
;

I’m going to execute this query in three different ways – as is, using all_rows optimisation; as is, using first_rows_10 optimisation, then using all_rows optimisation but with the necessary hints to make it follow the first_rows_10 execution path. Here are the resulting plans from an instance of 12.1.0.2 (the same thing happens in 11.2.0.4):


first_rows_10 plan
----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |    35 |   107   (0)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |     1 |    35 |   107   (0)| 00:00:01 |
|   2 |   VIEW                               | V1    |    24 |   600 |     4   (0)| 00:00:01 |
|   3 |    UNION-ALL                         |       |       |       |            |          |
|*  4 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL                | T1    |    12 |   240 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

all_rows plan
------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   1 |  NESTED LOOPS                          |       |    40 |  1400 |   904   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |   100 |  1000 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |   100 |       |     3   (0)| 00:00:01 |
|   4 |   VIEW                                 | V1    |     1 |    25 |     8   (0)| 00:00:01 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     1 |    20 |     4   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

first_rows_10 plan hinted under all_rows optimisation
---------------------------------------------------------------------------------------------- 
| Id  | Operation                    | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | 
---------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT             |       |   200 |  8600 |       |  6124   (3)| 00:00:01 | 
|*  1 |  HASH JOIN                   |       |   200 |  8600 |    17M|  6124   (3)| 00:00:01 |
|   2 |   VIEW                       | V1    |   402K|    12M|       |  5464   (3)| 00:00:01 | 
|   3 |    UNION-ALL                 |       |       |       |       |            |          | 
|*  4 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2731   (3)| 00:00:01 | 
|*  5 |     TABLE ACCESS FULL        | T1    |   201K|  3933K|       |  2733   (3)| 00:00:01 | 
|   6 |   TABLE ACCESS BY INDEX ROWID| T2    |   100 |  1000 |       |   103   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |   100 |       |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

I’m not sure why the first_rows_10 plan uses “table access by rowid batched”, but I’d guess it’s because the optimiser calculates that sorting the index rowids before visiting the table may have a small benefit on the speed of getting the first 10 rows – eventually I’ll get around to checking the 10053 trace file. The important thing, though, is the big mistake in the strategy, not the little difference in table access.

In the first_rows_10 plan the optimizer has decided building an in-memory hash table from the UNION ALL of the rows fetched from the two copies of the t1 table will be fast and efficient; but it’s made that decision based on the assumption that it will only get 10 rows from each copy of the table – and at run-time it HAS to get all the relevant t1 rows to build the hash table before it can get any t2 rows. We can get some idea of the scale of this error when we look at the hinted plan under all_rows optimisation – it’s a lot of redundant data and a very expensive hash table build.

In contrast the all_rows plan does an efficient indexed access into the t2 table then, for each row, does a join predicate pushdown into the union all view using an indexed access path. If we only wanted to fetch 10 rows we could stop after doing a minimum amount of work. To demonstrate the error more clearly I’ve re-run the experiment for the first two plans from SQL*PLus, setting the arraysize to 11, the pagesize to 5, and stopping after the first 10 rows. Here are the plans showing the rowsource execution stats:


first_rows_10 plan
------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |   107 (100)|     12 |00:00:00.43 |   35150 |       |       |          |
|*  1 |  HASH JOIN                           |       |      1 |      1 |   107   (0)|     12 |00:00:00.43 |   35150 |    24M|  3582K|   23M (0)|
|   2 |   VIEW                               | V1    |      1 |     24 |     4   (0)|    400K|00:00:06.48 |   35118 |       |       |          |
|   3 |    UNION-ALL                         |       |      1 |        |            |    400K|00:00:04.20 |   35118 |       |       |          |
|*  4 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.71 |   17559 |       |       |          |
|*  5 |     TABLE ACCESS FULL                | T1    |      1 |     12 |     2   (0)|    200K|00:00:00.63 |   17559 |       |       |          |
|   6 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |       |       |          |
|*  7 |    INDEX RANGE SCAN                  | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------

all_rows plan
-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |      1 |        |   904 (100)|     12 |00:00:00.01 |     213 |
|   1 |  NESTED LOOPS                          |       |      1 |     43 |   904   (1)|     12 |00:00:00.01 |     213 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED  | T2    |      1 |    100 |   103   (0)|     28 |00:00:00.01 |      32 |
|*  3 |    INDEX RANGE SCAN                    | T2_I1 |      1 |    100 |     3   (0)|     28 |00:00:00.01 |       4 |
|   4 |   VIEW                                 | V1    |     28 |      1 |     8   (0)|     12 |00:00:00.01 |     181 |
|   5 |    UNION ALL PUSHED PREDICATE          |       |     28 |        |            |     12 |00:00:00.01 |     181 |
|   6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    212K|     4   (0)|      8 |00:00:00.01 |      93 |
|*  7 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      8 |00:00:00.01 |      85 |
|   8 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1    |     28 |    213K|     4   (0)|      4 |00:00:00.01 |      88 |
|*  9 |      INDEX RANGE SCAN                  | T1_N1 |     28 |      1 |     3   (0)|      4 |00:00:00.01 |      84 |
-----------------------------------------------------------------------------------------------------------------------

If I had set the optimizer_mode to first_rows_10 because I really only wanted to fetch (about) 10 rows then I’ve managed to pay a huge overhead in buffer visits, memory and CPU for the privilege – the all_rows plan was much more efficient.

Remember – we often see cases where the first_rows(n) plan will do more work to get the whole data set in order to be able to get the first few rows more quickly (the simplest example is when the optimizer uses a particular index to get the first few rows of a result set in order without sorting rather than doing a (faster) full tablescan with sort. This case, though, is different: the optimizer is choosing to build a hash table as if it only has to put 10 rows into that hash table when it actually HAS to build the whole has table before it can take any further steps – we don’t get 10 rows quicker and the rest more slowly; we just get 10 very slow rows.

Footnote:

It’s possible that this is an example of bug 9633142: (FIRST_ROWS OPTIMIZER DOES NOT PUSH PREDICATES INTO UNION INLINE VIEW) but that’s reported as fixed in 12c, with a couple of patches for 11.2.0.2/3. However, setting “_fix_control”=’4887636:off’, does bypass the problem. (The fix control, introduced in 11.1.0.6 has description: “remove restriction from first K row optimization”)

October 19, 2014

Plan depth

Filed under: 12c,Bugs,Execution plans,Oracle,subqueries — Jonathan Lewis @ 6:20 pm GMT Oct 19,2014

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 months ago referencing 11.2.0.3; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the depth calculated by Oracle doesn’t match the level that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:


create table t1
as
select
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500
;

create table t2
as
select	* from t1
;

-- call dbms_stats to gather stats

explain plan for
select
	case mod(id,2)
		when 1 then (select max(t1.id) from t1 where t1.id <= t2.id)
		when 0 then (select max(t1.id) from t1 where t1.id >= t2.id)
	end id
from	t2
;

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.


SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         0
         1          0
         2          1
         3          0
         4          3
         5          0

--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |
--------------------------------------------------------------------------------

So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.

 

Update 20th Oct 2014

A question via twitter – does the error also show up with dbms_xplan.display_cursor(), SQL tuning sets, AWR, etc. or is it just a defect of explain plan. Since the depth is (probably) a derived value for display purposes that Oracle doesn’t use internally for executing the plan I would be inclined to assume that the defect is universal, but I’ve only checked it through explain plan/display, and through execution/display_cursor().

 

 

 

September 4, 2014

Group By Bug

Filed under: 12c,Bugs,dbms_xplan,Execution plans,Oracle — Jonathan Lewis @ 5:11 pm GMT Sep 4,2014

This just in from OTN Database Forum – a surprising little bug with “group by elimination” exclusive to 12c.


alter session set nls_date_format='dd-Mon-yyyy hh24:mi:ss';

select
       /* optimizer_features_enable('12.1.0.1')*/
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from (
  select
        ts, max(fieldb) fieldb
  from (
  select trunc(sysdate) - 1/24 ts, 1 fieldb from dual
  union all
  select trunc(sysdate) - 2/24 ts, 2 fieldb from dual
  union all
  select trunc(sysdate) - 3/24 ts, 3 fieldb from dual
  union all
  select trunc(sysdate) - 4/24 ts, 4 fieldb from dual
  union all
  select trunc(sysdate) - 5/24 ts, 5 fieldb from dual
  )
  group by ts
)
group by trunc (ts,'DD')
/

You might expect to get one row as the answer – but this is the result I got, with the execution plan pulled from memory:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00          1
03-Sep-2014 00:00:00          5
03-Sep-2014 00:00:00          4
03-Sep-2014 00:00:00          2
03-Sep-2014 00:00:00          3

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    11 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    11  (10)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    10   (0)| 00:00:01 |
|   3 |    UNION-ALL     |      |       |       |            |          |
|   4 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   6 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |     FAST DUAL    |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

You’ll notice that I’ve got an “optimizer_features_enable()” comment in the code: if I change it into a hint I get the following (correct) result and plan:


TS1                      FIELDB
-------------------- ----------
03-Sep-2014 00:00:00         15

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |       |       |    12 (100)|          |
|   1 |  HASH GROUP BY   |      |     5 |    60 |    12  (17)| 00:00:01 |
|   2 |   VIEW           |      |     5 |    60 |    11  (10)| 00:00:01 |
|   3 |    HASH GROUP BY |      |     5 |    60 |    11  (10)| 00:00:01 |
|   4 |     VIEW         |      |     5 |    60 |    10   (0)| 00:00:01 |
|   5 |      UNION-ALL   |      |       |       |            |          |
|   6 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   7 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   8 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|   9 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
|  10 |       FAST DUAL  |      |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------

Somehow 12.1.0.2 has managed to get confused by the combination of “group by ts” and “group by trunc(ts,’DD’)” and has performed “group-by elimination” when it shouldn’t have. If you use the ‘outline’ option for dbms_xplan.display_cursor() you’ll find that the bad result reports the hint elim_groupby(@sel$1), which leads to an alternative solution to hinting the optimizer_features level. Start the code like this:


select
       /*+ qb_name(main) no_elim_groupby(@main) */
       trunc (ts,'DD') ts1, sum(fieldb) fieldb
from  ...

The (no_)elim_groupby is a hint that appeared in v$sql_hints only in the 12.1.0.2.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,939 other followers