Oracle Scratchpad

January 24, 2016

Semijoin_driver

Filed under: bitmaps,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:42 am GMT Jan 24,2016

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the BITMAP MERGE operation, and to demonstrate a very simple case I’ll start with a data set with just one bitmap index:


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t expect to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column – but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for columns with a very small number of distinct values. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it, and this means that the rows for any given value are scattered widely across the table, but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, with two rows per leaf block.  (By comparison, a B-tree index oon the column was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with stats (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen – and since I’ve asked the question you know that the answer is almost sure to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        max(small_vc)
from
        t1
where
        n1 in (
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

select  id from ...

Original query structure
         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Modified query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a big difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data in the wrong order we may be able to find ways of modifying the SQL to visit the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table (typically) appears twice, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

 

 

My reference: bitmap_merge.sql, star_hack3.sql

January 11, 2016

Subquery Effects

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:50 pm GMT Jan 11,2016

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql

 

January 6, 2016

NLS Mess

Filed under: Bugs,CBO,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:18 pm GMT Jan 6,2016

The Oracle database has all sorts of little details built into it to help it deal with multi-national companies, but since they’re not commonly used you can find all sorts of odd “buggy” bits of behaviour when you start to look closely. I have to put “buggy” in quotes because some of the reported oddities are the inevitable consequences of (for example) how multi-byte character sets have to work; but some of the oddities look as if they simply wouldn’t be there if the programmer writing the relevant bit of code had remembered that they also had to cater for some NLS feature.

Here’s an example of the type of unexpected behaviour that can appear. There probably are some bugs in the area I’m going to demonstrate but, at first glance, I thought I was looking at an acceptable limitation imposed by a generic requirement. The example came from AskTom. which is why the data set isn’t my usual “t1” generation (and the formatting and capitalisation isn’t according to my usual standards).

The problem involves Case Insensitive indexing.


ALTER session SET nls_sort=binary_ci;
ALTER session SET nls_comp=linguistic;

CREATE TABLE log_data(
  account_id NUMBER,
  log_type NUMBER,
  sys_name VARCHAR2(30),
  log_time TIMESTAMP,
  msg varchar2(4000)
)
nologging
;

insert /*+ append */ into log_data(
  account_id,
  log_type,
  sys_name,
  log_time,
  msg
)
select
        5,
        2,
        dbms_random.string('a',1),
        sysdate + dbms_random.value,
        rpad('x',200)
from
        dual
connect by
        level <= 26000
;


create index log_date on log_data (
        account_id, 
        log_type, 
--      sys_name,
        NLSSORT(sys_name,'NLS_SORT=BINARY_CI'),
        log_time
)
nologging
;
  
rem     ======================================================================
rem     Need to gather stats AFTER index creation because of the hidden column
rem     ======================================================================
  
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'LOG_DATA',
                method_opt       => 'for all columns size 1'
        );
end;
/

And here’s the query I want to optimize:


SELECT 
        *
FROM
  (
    SELECT
        sys_name, log_time,  substr(msg,1,40) msg
    FROM log_data
    WHERE
      account_id=5
      AND log_type=2
      AND sys_name='a'
    ORDER BY
      log_time  desc
  )
WHERE
  rownum <= 10
;

The requirement of the query is that we see the ten most recent entries for a given combination of account_id, log_type and sys_name (ignoring case in sys_name). The orginal table has tens of millions of rows, of course, with many combinations, and some of the combinations have a very large number of entries hence the desire to find an access path that gets just the 10 rows we want without getting all the rows for a combination and sorting them before returning the ten.

Normally we would just create an index that started with the 3 columns used in the equality and ending with the column in the order by clause, and that would be enough for the optimizer to see the option for a “sort order by nosort” operation to get the required data through an index range scan; so that’s the index the code sample creates, except that since we’ve enabled case insensitive sorting we need to use a function-based index to hold the case-insensitive version of sys_name.

Here’s the execution plan we would get if we DIDN’T use the nlssort() function in the index – I’ve run the query in 11.2.0.4 and pulled the plan from memory with rowsource execution stats enabled:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   605 (100)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.02 |    1065 |       |       |          |
|   2 |   VIEW                         |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY       |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| LOG_DATA |      1 |    500 |   603   (1)|    966 |00:00:00.01 |    1065 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | LOG_DATE |      1 |    500 |   103   (3)|    966 |00:00:00.01 |     100 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2)
       filter(NLSSORT("SYS_NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6100') )

Notice particularly the filter predicate at operation 5: that’s the thing we need to get into the index before we can avoid picking up excess data and sorting it. Notice also in the A-Rows column that we acquired 966 rows from the table before sorting and discarding all but 10 of them at operation 3.

Notice especially how important it is to look at the predicate section of an execution plan to gain a full understanding of what’s happening.

So here’s the execution plan we get by default with the function-based index in place:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |    13 (100)|     10 |00:00:00.01 |     969 |       |       |          |
|*  1 |  COUNT STOPKEY                  |          |      1 |        |            |     10 |00:00:00.01 |     969 |       |       |          |
|   2 |   VIEW                          |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY        |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|    966 |00:00:00.01 |     969 |       |       |          |
|*  5 |      INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|    966 |00:00:00.01 |       5 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

It didn’t work ! (Check the A-Rows at operations 4 and 5, and the sort that we didn’t want at operation 3 where the data is finally reduced to 10 rows.

But there’s something odd going on here – look at the predicate section: our three predicates are all access predicates for the index range scan descending. We are doing exactly what we want to do with the index, but we’re not stopping after the 10 rows that we need, we’re getting all of them (in the order we want) and then doing a trivial sort and discard. Look at the Cost column – the cost at operation 4 is exactly what we might expect for the 10 rows we want to see, and the E-rows at line 5 is clearly based on our “first 10 rows” requirement.

This raises two questions:

  1. What’s gone wrong ?
  2. Can we work around the problem ?

The answer to (1) is, I think, that there’s a bug in the code. Looking at the 10053 trace file I can see the optimizer correctly handling the arithmetic of the virtual column (the sys_nc000006$) representing the function in the index and then getting to the point where it goes into a code section relating to “Recost for ORDER BY”, and brings back the original function as a filter predicate – I think that in the recosting it may be losing track of the fact that sys_nc000006$ and nlssort(sys_name, ‘nls_sort=binary_ci’) are the same thing and therefore can’t apply the rule about “Equality on 1st N columns, order by on the remainder”.

There are several answers to (2).

Workarounds

The honest hack

The first one is simply to fall back to the old (probably version 7, possibly version 8) requirement for getting the “sort order by nosort” operation – put all the index columns into the order by clause. Unfortunately the optimizer then did a tablescan rather than an index range scan because my data set was so small, so I had to hack the system stats temporarily to make the tablescan very expensive:


begin
        dbms_stats.set_system_stats('MBRC',2);
        dbms_stats.set_system_stats('MREADTIM',20); 
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000); 
end;
/

... order by account_id desc, log_type desc, sys_name desc, lot_time desc

Unfortunately the optimizer still went wrong – it did an ASCENDING index range scan sorting all the data. I actually had to hint the code to use the index in descending order to get the following execution plan:


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |  1215 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |   1000 |  1215   (1)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |  1006   (1)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |   1000 |     5   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

The A-Rows tells us we’ve accessed the minimum data set, and the absence of the SORT ORDER BY STOPKEY operation tells us that we’ve avoided doing the sort. Notice, though that the cost is the cost that would have been appropriate if we have accessed all 1,000 rows that matched the equality predicates. This is an example of a plan that you couldn’t really trust if all you had done was an “explain plan” rather than running the query and checking the rowsource execution stats. If you ignore the A-Rows it looks as if the plan WOULD get all the data in order and only eliminate the redundant rows at operation 1.

The silly surprise

The original author of the problem came up with this one. Put in two predicates which, between them are equivalent to the original requirement:


where ...
and     sys_name >= 'a'
and     sys_name <= 'a'

Clearly this is totally silly – the optimizer can fold this pair of predicates into the single predicate “sys_name = ‘a'”, so it shouldn’t make any difference. But here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

Yes, it’s (structurally) exactly the same plan, with exactly the same predicate section except that (a) it gets there without being hinted, (b) the Cost column looks appropriate all down the line, and (c) the E-Rows value for the VIEW operator would have helped us appreciate that the correct elimination was (probably) going to happen if all we had done was the Explain Plan.

The dirty hack

I know the name of the hidden column that’s causing the problem, and I know how to generate the value it has to be – so let’s give Oracle exactly what it needs to see rather than allowing its internal transformation to rewrite the SQL:

...
AND sys_nc00006$ = nlssort('a','nls_sort=binary_ci')
...


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "SYS_NC00006$"=HEXTORAW('6100') )

We get exactly the plan we need – and the silly thing about this example is that it’s a case where we get the plan we want by EXPLICITLY transforming the SQL to reproduce the transformation that Oracle had done IMPLICITLY and then messed up !

Final Choice
Of the three options – the dirty hack is definitely a no-no in production; the “double the predicate” trock is undesirable because it may depend in some unexpected way on a particular optimizer bug or on some statistical detail that could change; so I’d choose the hinted path with the (nominally) redundant columns.

One final point about this solution, we actually needed to include only the sys_name in the order by clause to use the descending range scan and early stop – which is basically another indication that it’s something about the function-based column that is breaking the normal code path.

Reference Script: nls_sort_anomaly.sql

November 5, 2015

Column Groups

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 6:48 am GMT Nov 5,2015

I think the “column group” variant of extended stats can be amazingly useful in helping the optimizer to generate good execution plans because of the way they supply better details about cardinality; unfortunately we’ve already seen a few cases (don’t forget to check the updates and comments) where the feature is disabled, and another example of this appeared on OTN very recently.

Modifying the example from OTN to make a more convincing demonstration of the issue, here’s some SQL to prepare a demonstration:


create table t1 ( col1 number, col2 number, col3 date);

insert  into t1
select 1 ,1 ,to_date('03-Nov-2015') from dual
union all
select 1, 2, to_date('03-Nov-2015')  from dual
union all
select 1, 1, to_date('03-Nov-2015')  from dual
union all
select 2, 2, to_date('03-Nov-2015')  from dual
union all   
select 1 ,1 ,null  from dual
union all  
select 1, 1, null  from dual
union all  
select 1, 1, null  from dual
union all
select 1 ,1 ,to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for columns (col1, col2, col3)'
        );
end;
/

I’ve collected stats in a slightly unusual fashion because I want to make it clear that I’ve got “ordinary” stats on the table, with a histogram on the column group (col1, col2, col3). You’ll notice that this combination is a bit special – of the 10 rows in the table there are three with the values (1,1,null) and three with the values (1,1,’04-Nov-2015′), so some very clear skew to the data which results in Oracle gathering a frequency histogram on the table.

These two combinations are remarkably similar, so what happens when we execute a query to find them – since there are no indexes the plan will be a tablescan, but what will we see as the cardinality estimate ? Surely it will be the same for both combinations:


select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 = '04-Nov-2015'
;

select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 is null

Brief pause for thought …

and here are the execution plans, including predicate section – in the same order (from 11.2.0.4 and 12.1.0.2):


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL1"=1 AND "COL2"=1 AND "COL3"=TO_DATE(' 2015-11-04
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    12 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL3" IS NULL AND "COL1"=1 AND "COL2"=1)

The predictions are different – the optimizer has used the histogram on the column group for the query with “col3 = to_date()”, but not for the query with “col3 is null”. That’s a bit of a shame really because there are bound to be cases where some queries would benefit enormously from having a column group used even when some of its columns are subject to “is null” tests.

Analysis

The demonstration above isn’t sufficient to prove the point, of course; it merely shows an example of a suspiciously bad estimate. Here are a few supporting details – first we show that both the NULL and the ’04-Nov-2015′ combinations do appear in the histogram. We do this by checking the column stats, the histogram stats, and the values that would be produced by the hashing function for the critical combinations:


set null "n/a"

select distinct
        col3,
        mod(sys_op_combined_hash(col1, col2, col3), 9999999999)
from    t1
where
        col3 is null
or      col3 = to_date('04-Nov-2015')
order by
        2
;

column endpoint_actual_value format a40
column column_name           format a32
column num_buckets           heading "Buckets"

select
        column_name,
        num_nulls, num_distinct, density,
        histogram, num_buckets
from
        user_tab_cols
where
        table_name = 'T1'

break on column_name skip 1

select
        column_name,
        endpoint_number, endpoint_value,
        endpoint_actual_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name not like 'COL%'
order by
        table_name, column_name, endpoint_number
;

(For an explanation of the sys_op_combined_hash() function see this URL).

Here’s the output from the three queries:


COL3      MOD(SYS_OP_COMBINED_HASH(COL1,COL2,COL3),9999999999)
--------- ----------------------------------------------------
04-NOV-15                                           5347969765
n/a                                                 9928298503

COLUMN_NAME                       NUM_NULLS NUM_DISTINCT    DENSITY HISTOGRAM          Buckets
-------------------------------- ---------- ------------ ---------- --------------- ----------
COL1                                      0            2         .5 NONE                     1
COL2                                      0            2         .5 NONE                     1
COL3                                      3            2         .5 NONE                     1
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE            0            5        .05 FREQUENCY                5


COLUMN_NAME                      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-------------------------------- --------------- -------------- ----------------------------------------
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE                 1      465354344
                                               4     5347969765
                                               6     6892803587
                                               7     9853220028
                                              10     9928298503

As you can see, there’s a histogram only on the combination and Oracle has found 5 distinct values for the combination. At endpoint 4 you can see the combination that includes 4th Nov 2015 (with the interval 1 – 4 indicating a frequency of 3 rows) and at endpoint 10 you can see the combination that includes the null (again with an interval indicating 3 rows). The stats are perfect to get the job done, and yet the optimizer doesn’t seem to use them.

If we examine the optimizer trace file (event 10053) we can see concrete proof that this is the case when we examine the “Single Table Access Path” sections for the two queries – here’s a very short extract from each trace file, the first for the query with “col3 = to_date()”, the second for “col3 is null”:


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.3000


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

Apparently “col3 is null” is not a predicate!

The column group can be used only if you have equality predicates on all the columns. This is a little sad – the only time that the sys_op_combined_hash() will return a null is (I think) when all its input are null, so there is one very special case for null handling with column groups – and even then the num_nulls for the column group would tell the optimizer what it needed to know. As it is, we have exactly the information we need to get a good cardinality estimate for the second query, but the optimizer is not going to use it.

Summary

If you create a column group to help the optimizer with cardinality calculations it will not be used for queries where any of the underlying columns is used in an “is null” predicate. This is coded into the optimizer, it doesn’t appear to be an accident.

Reference script: column_group_null.sql

September 30, 2015

Estimate_percent

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:10 pm GMT Sep 30,2015

Here’s a live one from OTN – here are a couple of extracts from the problem statement:

We’re experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night.
It’s resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between).
It started occurring about 2 months ago and now it’s becoming more prevalent where the bad query plan is coming up more often.
I noticed that the query plans vary for a simple query.
We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( ‘userenv’, ‘current_schema’ ), estimate_percent => 1);)

The query and two execution plans look like this:

select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0;

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |     1 |    12 |   155   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE              |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 |  1148 | 13776 |   155   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | CS_BUCKET_MEMBER_N1 |  1272 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0)
   3 - access("ORDER_ID"=0)


------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |    12 | 11215   (2)| 00:01:41 |
|   1 |  SORT AGGREGATE    |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 |  1522K|    17M| 11215   (2)| 00:01:41 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')

There are a couple of bits of information that would be useful – such as the database version, the number of rows in the table, the number of distinct values in each column, and whether any of the columns have histograms – but there are a couple of reasonable guesses that we might make about the problem. Notice particularly that the number of rows estimated from the index ranges scan is 1272 and only a small volume is then eliminated by the table filter predicates on sec_id and bucket_type. This suggests that the optimizer has information that tells it that most of the rows in the table have sec_id > 0 and bucket_type = ‘P’, and you might note that that suggests that there’s a histogram on bucket_type.

Rather than stating the most obvious guesses about the problem, though, I’ll start by creating a data set and emulating the problem, starting from an empty schema on 11.2.0.4:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual
        connect by 
                level <= 1e4
)
select
        rownum                  sec_id,
        case
                when mod(rownum,1000) = 0
                        then 'X'
                        else 'P'
        end                     bucket_type,
        case
                when rownum < 1e6 - 50000 
                        then mod(rownum-1,1e5)
                        else 1000
        end                     order_id,
        lpad(rownum,10,'0')     id_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6

create index t1_i1 on t1(order_id) nologging; 

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

The column names in the table match those needed by the query, and the bucket_p column has a very skewed distribution that will eliminate very little data; the sec_id column is also not going to eliminate data, but it’s very evenly distributed with no large gaps so not a good candidate for a histogram in any case. The order_id has 50,000 rows out of 1,000,000 (5%) set of a single value, and most of those special rows are at the end of the table – it’s a pretty good candidate for a histogram (if Oracle spots it, and if we actually write queries to access that data).

I’ve run a query that references all three columns so that the default method_opt of “for all columns size auto” will apply to them when I gather stats. So here’s the code that gathers stats and checks the resulting execution plans, first for “auto_sample_size” then for the 1% used by the OP:


set autotrace traceonly explain

begin
        dbms_stats.gather_schema_stats(
/*              estimate_percent => 1, */
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

begin
        dbms_stats.gather_schema_stats(
                estimate_percent => 1,
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

set autotrace off

And here are the two plans – in the same order:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |  2333   (4)| 00:00:12 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 51063 |   598K|  2333   (4)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=1000 AND "SEC_ID">1000 AND "BUCKET_TYPE"='P')


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    12 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| T1    |    20 |   240 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    20 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------
   2 - filter("SEC_ID">1000 AND "BUCKET_TYPE"='P')
   3 - access("ORDER_ID"=1000)


[Update: Following on from a question in the comments, I’ve expanded this section, and wandered a little off-topic]

I don’t know why, but with a 1% sample (which really did sample 10,000 rows) the optimizer didn’t spot the need for a histogram on order_id, but with the auto_sample_size (which sampled 5,500 – yes, half as many rows) the optimizer spotted the need for the histogram. Checking the trace files the only difference visible in the sample SQL was the presence in the 1% sample of the id_vc and padding columns which were not present in the auto_sample_size trace.

According to the manuals when the method_opt is “for all columns size auto”, then

“Oracle determines the columns on which to collect histograms based on data distribution and the workload of the columns.”

There is nothing in the manuals to suggest that there is a deliberate link between the auto_sample_size and estimate_percent, and there is room for ambiguity in how we interpret this bit of text in the manual so the difference in the SQL used and the effects thereof requires (a) some hand-waving, and/or (b) lots more experimentation.  At the moment I’m prepared to go for hand-waving:

Hypothesis 1: auto_sample_size did not sample the id_vc and padding columns because the (100%) sample taken had given Oracle enough information to decide that the data distribution of those columns was not skewed enough to merit further consideration; but it sampled the three columns that had been used in a fashion that might be helped by a histogram. This sampling spotted the benefit of a histogram on order_id and bucket_type but decided that sec_id didn’t need a histogram

Hypothesis 2: the 1% sample got pretty close to the same results in its estimates of number of distinct values for id_vc and padding as the (100%) auto_sample_size, but still decided to do a sampled test for the data distribution (the manual seems to suggest that the histograms will only be considered if there has been some use of the columns in predicates, but doesn’t explicitly preclude the possibility of creating the histogram on the basis of just the data distribution). After doing the 1% sample to analyze the data for suitability of a histogram the result suggested that only the histogram on bucket_type would be beneficial.  (In fact, after the first sample Oracle took a second 1% histogram sample on just the order_id before deciding that it a histogram on order_id wasn’t needed.)

Bottom line on this: I don’t know if the auto_sample_size “accidentally” eliminated a couple of columns from histogram sampling and if a larger fixed sample size (say 50%, or even 100%) might result in Oracle eliminating a few columns from the histogram; or maybe the code path for histogram samples with auto_sample_size in place is actually a different code path. The only thing I can say is that the two sets of events that appeared from my demonstration don’t seem to be entirely self-consistent, but it would probably take most of a day doing experiments to narrow down the variation in behaviour to a few “best guess” ideas of what’s going on behind the scenes – though unwrapping the code might lead to a more accurate answer more quickly.

Moral

Histograms are tricky things – and you can only make things worse in 11g by NOT using the auto_sample_size.

Footnote

Based on previous experience – my “obvious” guess about the OP’s data was that there was a special-case value for order_id, that the rows for that value were fairly well clustered, probably towards the end of the table, and constituted a small percentage of the table, and that the rest of the data reported “a few” rows per value. That’s why I built the model you see above.

September 4, 2015

Histogram Tip

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:32 am GMT Sep 4,2015

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

Histogram Tips

A histogram, created in the right way, at the right time, and supported by the correct client-side code, can be a huge benefit to the optimizer; but if you don’t create and use them wisely they can easily become a source of inconsistent performance, and the automatic statistics gathering can introduce an undesirable overhead during the overnight batch. This note explains how you can create histograms very cheaply on the few columns where they are most likely to have a beneficial effect.

set_column_stats

The dbms_stats package have many procedures and functions built into it that allow us to see (get) and manipulate (set) the stored statistics; in particular it holds two functions get_column_stats() and set_column_stats(), and we can use these procedures to create a histogram very cheaply whenever we want at very low cost. Here’s an example that could be modified to suit a character column in a table where you’ve previously collected some stats. It uses a copy of all_objects, limited to exactly 10,000 rows.


create table t1 as
select
	*
from
	all_objects
where
	rownum <= 10000 ; begin dbms_stats.gather_table_stats( ownname => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/


declare

	m_distcnt		number;		-- num_distinct
	m_density		number;		-- density
	m_nullcnt		number;		-- num_nulls
	m_avgclen		number;		-- avg_col_len

	srec			dbms_stats.statrec;
	c_array		dbms_stats.chararray;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec			=> srec,
		avgclen		=> m_avgclen
	); 

	c_array		:= dbms_stats.chararray('A', 'B', 'C', 'X', 'Y');
	srec.bkvals	:= dbms_stats.numarray (  2,   2,   2, 500, 494);
--	srec.rpcnts	:= dbms_stats.numarray (  0,   0,   0,   0,   0);
	srec.epc := 5;

	dbms_stats.prepare_column_values(srec, c_array);

	m_distcnt	:= 5;
	m_density	:= 1/(5000);

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec			=> srec,
		avgclen		=> m_avgclen
	); 

end;
/

Key features of the code: as you can see, the two calls have identical parameters which identify the table and column name (there is an optional parameter for a (sub) partition name), and most of the basic statistics about the column. The histogram (or low and high values) are accessed through a special record type, and we can populate that record type by supplying an ordered list of values, a matching list of frequencies, and a count of how many values we have supplied.

Since my code is fixing stats on a varchar2() column I’ve declared an array of type dbms_stats.chararray to hold the list of values I want to see in a frequency histogram – there are other array types for dates, raw, number, etc. I’ve then used the structure of the stats record I had declared to hold the list of frequencies (srec.bkvals – possibly a short name for “bucket values”) and the count (srec.epc“end-point count”).

The call to dbms_stats.prepare_column_stats() takes my two arrays and massages them into the correct format for storage as a histogram that I can then write into the data dictionary with the closing call to dbms_stats.set_column_stats(). Before making that call, though, I’ve also set the “num_distinct” variable to tell the optimizer that there are 5 distinct values for the column (it makes sense, but isn’t absolutely necessary, for the num_distinct to match the number of values in the array), and set the “density” to a value that I would like the optimizer to use in it calculations if someone asks for a value that is not in my list.

I’ve included (but commented out) a line that’s relevant to the new histogram mechanisms in 12c –the srec.rpcnts (“repeat counts”) array is used in “hybrid histograms”. It’s not relevant to my example where I’m trying to create a pure frequency histogram, but if I don’t set the array I get an Oracle error: “ORA-06532: Subscript outside of limit”.

Results

There’s one important point to the method that isn’t instantly visible in the code – I created my table with 10,000 rows and there will be no nulls for object_type; but if you sum the array of frequencies it comes to exactly 1,000. This apparent contradiction is not a problem – the optimizer will compare the histogram figures to the total number of non-null entries it has recorded (in other words user_tables.num_rowsuser_tab_columns.num_nulls), and scale up the histogram accordingly. This means that a query for ‘A’ should return an estimated row count of 20 (rather than 2), ‘X’ should return 5,000 (rather than 500) and ‘D’ should return 2 (10,000 rows * 1/5000, the selectivity I had set for non-existent values).

With a little editing to save space, here’s a cut-n-paste from an SQL*Plus session running against 12c:


SQL> set autotrace traceonly explain
SQL> select * from t1 where object_type = 'A';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    20 |  2100 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    20 |  2100 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='A')

SQL> select * from t1 where object_type = 'X';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  5000 |   512K|    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  5000 |   512K|    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='X')

SQL> select * from t1 where object_type = 'D';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   105 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   105 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='D')

Conclusion

It is very easy to create truly representative histograms (if you know your data) and the resources required to do so are minimal. If you see instability due to bad luck, or bad timing, gathering a histogram then you benefit enormously from writing code to construct histograms.

Footnote

In 12c the introduction of the “approximate NDV” strategy to collecting frequency histograms, and the introduction of the “Top-frequency” histogram has made automatic gathering of histograms on columns with a relatively small number of distinct values much faster and safer – but timing may still be an issue, and the resources needed to gather a safe hybrid histogram may still justify a hand-coded approach.

 

September 2, 2015

IN/EXISTS bugs

Filed under: 12c,Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 8:11 am GMT Sep 2,2015

Here’s a simple data set – I’m only interested in three of the columns in the work that follows, but it’s a data set that I use for a number of different models:


execute dbms_random.seed(0)

create table t1
nologging
as
with generator as (
	select	--+ materialize
		rownum id 
	from dual 
	connect by 
		level <= 1e4
)
select
	trunc(dbms_random.value(0,1000))	n_1000,
	trunc(dbms_random.value(0,750))		n_750,
	trunc(dbms_random.value(0,600))		n_600,
	trunc(dbms_random.value(0,400))		n_400,
	trunc(dbms_random.value(0,90))		n_90,
	trunc(dbms_random.value(0,72))		n_72,
	trunc(dbms_random.value(0,40))		n_40,
	trunc(dbms_random.value(0,3))		n_3
from
	generator	v1,
	generator	v2
where
	rownum <= 1e6
;
create table t2 nologging 
as
select * from t1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T2',
		method_opt 	 => 'for all columns size 1'
	);
end;
/

The columns I want to consider are n_3, n_400, and n_1000. As their names suggest the columns have 3, 400, and 1000 distinct values respectively and since I’ve used the dbms_random.value() function to generate the data the distinct values are fairly evenly spread across the million rows of the table.

Consider, then, the following two queries:


select
        *
from
        t1
where
        exists (
                select  null
                from    t2
                where   n_1000 = 0
                and     t2.n_400 = t1.n_400
                and     t2.n_3 = t1.n_3
        )
;


select
        *
from
        t1
where
        (t1.n_400, t1.n_3) in (
                select  t2.n_400, t2.n_3
                from    t2
                where   t2.n_1000 = 0
        )
;

The first point to check is that these two queries are logically equivalent.

Once you’re happy with that idea we can work out, informally, how many rows we should expect the queries ought to return: there are 1,200 combinations for (n_400, n_3) so each combination should return roughly 833 rows; if we pick 1,000 rows from the 1 million available we can expect to see 679 of those combinations (that’s Alberto Dell’Era’s “selection without replacement” formula that Oracle uses for adjusting num_distinct to allow for filter predicates). So we might reasonably suggest that the final number of rows as 833 * 679 = 565,607. It turns out that that’s a pretty good estimate – when I ran the query the result was actually 567,018 rows.

So what does Oracle produce for the two execution plans – here are the result from 12c (EXISTS first, then IN):


===================
Multi-column EXISTS
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   920K|    34M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N_400"="T1"."N_400" AND "T2"."N_3"="T1"."N_3")
   2 - filter("N_1000"=0)

===================
Equivalent IN query
===================
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT SEMI|      |   833K|    30M|  1259  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N_400"="T2"."N_400" AND "T1"."N_3"="T2"."N_3")
   2 - filter("T2"."N_1000"=0)

The first thing to note is that the shape of the plans is identical, and the predicate sections are identical – but the final cardinalities are different. Clearly at least one of the cardinalities has to be wrong by a significant amount (7.5% or 10.4%, depending which way round you want to look at it). If you run the test on 11.2.0.4 you find that both plans give the same estimated row count – and it’s the 920,000 rows; so arguably 12c has “fixed” the IN subquery calculation, bringing it closer to a reasonable prediction, but it hasn’t fixed the EXISTS subquery calculation. That 833K prediction, by the way, is what you would expect to see with this data with a basic join – and a semi-join shouldn’t be able to produce more data than  a join.

But both predictions are way off the (informal) expectation, so how have they appeared ?

Working backwards it’s easy to spot that: 833K = 833 * 1,000: Oracle is behaving as if every single row identified in the subquery will produce a separate combination of (n_400, n_3). If we reverse engineer 920K we get: 920K / 833 = 1104 – it would appear that the optimizer thinks the 1,000 rows produced by the subquery will produce 1,104 distinct combinations of (n_400, n_3) so we how did the impossible 1,104 appear in the arithmetic.

If you apply the “selection without replacement” formula to picking 1,000 rows with 400 distinct values from 1,000,000 rows the expected number of distinct values (with rounding) will be 368; if you apply the formula for picking 1,000 rows with 3 distinct values from 1,000,000 rows the expected number will be 3. And 3 * 368 = 1,104. (Remember that in my original estimate I applied the formula after multiplying out the combination of distinct values). The optimizer is using its standard methods, but using internediate results in an unsuitable fashion.

It’s impossible to say what the impact of this particular code path – and the change on the upgrade – might be. The optimizer has over-estimated by 47% in one case and 62% in the other but (a) there may be something about my data that exaggerated an effect that few people will see in the wild and (b) in many cases getting in the right ballpark is enough to get a reasonable plan, and a factor of 2 is the right ballpark.

Of course, a few people will be unlucky with a few queries on the upgrade where the estimate changes – after all a single row difference in the estimate can cause the optimizer to flip between a hash join and a nested loop – but at least you’ve got a little extra information that might help when you see a bad estimate on an important semi-join.

So is there a workaround ? Given that I’ve got 12c, the obvious thing to try is to create a column group at both ends of the semi-join and see what happens. It shouldn’t really make any difference because column groups are targeted at the problems of correlated column – but we might as well try it:


execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for columns (n_400,n_3) size 1')
execute dbms_stats.gather_table_stats(user,'t2',method_opt=>'for columns (n_400,n_3) size 1')

Unfortunately when I did this the final cardinality estimate for both queries dropped to just 833 (the absence of a K on the end isn’t a typo!).

Manually unnesting got me closer:


select
        *
from
        (
        select  distinct n_3, n_400
        from    t2
        where   n_1000 = 0
        )       sq,
        t1
where   
        sq.n_400 = t1.n_400
and     sq.n_3 = t1.n_3
;

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   649K|    33M|  1260  (11)| 00:00:01 |
|*  1 |  HASH JOIN           |      |   649K|    33M|  1260  (11)| 00:00:01 |
|   2 |   VIEW               |      |   779 | 20254 |   612   (8)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |   779 |  8569 |   612   (8)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T2   |  1000 | 11000 |   610   (8)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T1   |  1000K|    26M|   628  (11)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SQ"."N_400"="T1"."N_400" AND "SQ"."N_3"="T1"."N_3")
   4 - filter("N_1000"=0)

The cardinality of 649K is (allowing for rounding) 833 * 779; so we need to know where the 779 came from. It’s the optimizer standard arithmetic for “distinct” – multiply the N individual selectivities together then divide by the sqrt(2) “N-1” times. So we apply the “selection without replacement formula twice”:

  • adjusted selectivity of n_400 = 367.21
  • adjusted selectivity of n_3 = 3
  • 367.21 * 3 / sqrt(2) = 779

If you create column group statistics for (n_400, n_3) this doesn’t change the optimizer’s estimate for the number of distinct combinations after selection – maybe that’s another enhancement in the pipeline – but, at least in this case, the manual unnesting has got us a little closer to the right estimates without any statistical intervention.

Footnote:

Just for the sake of completeness, here are the plans (with yet more cardinality predictions) that you get if you block the unnesting:


select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;



---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1179 | 33012 |   766K (12)| 00:00:30 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|    26M|   632  (11)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ NO_UNNEST */ 0 FROM "T2" "T2" WHERE
              "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   3 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)



=====================================
Unnesting blocked and subquery pushed
=====================================
select 
	*
from 
	t1 
where 
	exists (
		select	
			/*+ no_unnest push_subq */
			null  
		from	t2 
		where	n_1000 = 0 
		and	t2.n_400 = t1.n_400 
		and	t2.n_3 = t1.n_3
	)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 50000 |  1367K|  1271  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL | T1   | 50000 |  1367K|   632  (11)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |    11 |   638  (12)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT /*+ PUSH_SUBQ NO_UNNEST */ 0 FROM "T2"
              "T2" WHERE "N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2))
   2 - filter("N_1000"=0 AND "T2"."N_400"=:B1 AND "T2"."N_3"=:B2)

The 1179 comes from the magic of sqrt(2):  1179 = 1,000,000 / (400 * 3 / sqrt(2)).

The 50,000 is just the basic “I dunno, let’s call it 5%”.

 

Reference script: aggregate_selectivity_c.sql

 

September 1, 2015

Index Usage – 4

Filed under: CBO,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 6:41 pm GMT Sep 1,2015

Here’s a thought that came to me while I was writing up a note about identifying redundant indexes a few minutes ago. Sometimes you end up supporting applications with unexpected duplication of data and indexes and need to find ways to reduce overheads. Here’s some code modelling a scenario that I’ve seen more often than I like (actually, just once would be more often than I’d like):


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e5
)
select
        rownum                                          id,
        trunc(sysdate,'MM') + (rownum-1)/1440           date_time,
        trunc(sysdate,'MM') + trunc((rownum-1)/1440)    date_only,
        rpad('x',100)                                   padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 ; begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

I’ve got a table holding one row per minute since the start of the month; there’s a column which holds the date and time accurate to the minute, and another column which is supposed to hold just the date part. Is it possible to create a single index that allows Oracle to handles queries relatively efficiently whether they refer to date_time or date_only ? As a starting step could we get an index range scan on the same index for both of the following queries:


select
        max(id)
from
        t1
where
        date_only between sysdate-1 and sysdate
;


select
        max(id)
from
        t1
where
        date_time between sysdate-1 and sysdate
;

As Bob the Builder likes to say: “yes we can”.

There are a few lines of SQL between the table creation and the stats gathering that I didn’t show you. The first creates the constraint that describes the relationship between date_time and date_only – one is the truncated version of the other; the second defines the index we need, and the third (unfortunately) has to be there to declare the date_time column as a mandatory column:

alter table t1
        add constraint t1_trunc_date
        check(
                  date_only = trunc(date_time)
              and (   (date_only is null and date_time is null)
                   or (date_only is not null and date_time is not null)
              )
        )
;

create index t1_i1 on t1(trunc(date_time)) nologging;

alter table t1 modify (date_time not null);

(Given the requirement for date_time to be not null to get my indexing strategy to work, we could simplify the t1_trunc_date constraint to just (date_only = trunc(date_time)) if we declared date_only to be not null as well).

With the extra lines of SQL included here are the resulting execution plans for the two queries (running on 11.2.0.4, but you get the same plans on 12.1.0.2):


=======================================
date_only between sysdate-1 and sysdate
=======================================

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    92   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |            |          |
|*  2 |   FILTER                      |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |  4306 | 90426 |    92   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_ONLY"<=SYSDATE@! AND "DATE_ONLY">=SYSDATE@!-1)
   4 - access(TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=SYSDATE@!-1 AND
              TRUNC(INTERNAL_FUNCTION("DATE_TIME"))<=SYSDATE@!)
=======================================
date_time between sysdate-1 and sysdate
=======================================

---------------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |     1 |    21 |    92   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE               |       |     1 |    21 |            |          |
|*  2 |   FILTER                      |       |       |       |            |          |
|*  3 |    TABLE ACCESS BY INDEX ROWID| T1    |  1442 | 30282 |    92   (2)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | T1_I1 |  4306 |       |    13   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SYSDATE@!>=SYSDATE@!-1)
   3 - filter("DATE_TIME"=SYSDATE@!-1)
   4 - access(TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=TRUNC(SYSDATE@!-1) AND
              TRUNC(INTERNAL_FUNCTION("DATE_TIME"))>=TRUNC(SYSDATE@!))

The optimizer has managed to generate extra predicates in both cases by applying transitive closure to the critical constraint to produce queries that can be addressed (with some inefficiencies) through the single index.

Within limits, therefore, I can reduce two indexes to a single index. The strategy isn’t ideal but it may be appropriate in a few special cases. There are several problems that should be considered carefully:

  • The date_time column has to be declared not null for this optimization strategy to appear – that’s going to limit its applicability.
  • You may have more complex code where the transformation simply can’t be made to appear.
  • The introduction of the trunc() function may change the optimizer’s arithmetic in ways that cause plans to change for the worse
  • (Most important) The index range scan is always a multiple of 24 hours, with the excess data discarded after you reach the table. If you have lots of time-based queries for short time intervals (e.g. less than 8 hours) then the extra work done may outweigh the benefit of reducing the number of indexes – especially if all the excess table visits turn into randomly scattered single block reads.

Despite these drawbacks you may decide that you have a case where the strategy is “good enough” to help you reduce the workload on your system at some critical times during the day or night.

 

August 7, 2015

CBO catchup

Filed under: 12c,CBO,Oracle,Partitioning — Jonathan Lewis @ 1:10 pm GMT Aug 7,2015

It’s interesting to watch the CBO evolving and see how an enhancement in one piece of code doesn’t necessarily echo through to all the other places it seems to fit. Here’s an example of an enhancement that spoiled (or, rather, made slightly more complicated) a little demonstration I had been running for about the last 15  years  – but (in a fashion akin to another partitioning limitation) doesn’t always work in exactly the way you might expect.

I wrote a note some time ago about the way that the optimizer could pre-compute the result of what I called a “fixed subquery” (such as “select 9100 from dual”) and take advantage of the value it derived to do a better job of estimating the cardinality for a query. That’s a neat feature (although it may cause some 3rd party applications a lot of pain as plans change on the upgrade to 11.2.0.4 or 12c) but it doesn’t work everywhere you might hope.

I’m going to create two (small) tables with the same data, but one of them is going to be a simple heap table and the other is going to be partitioned by range; then I’m going to run the same queries against the pair of them and show you the differences in execution plans. First the tables:


create table t1
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
from
        generator       v1
where
        rownum <= 1e4
;

create table pt1(
        id, v1, padding
)
partition by range (id) (
        partition p02000 values less than ( 2001),
        partition p04000 values less than ( 4001),
        partition p06000 values less than ( 6001),
        partition p08000 values less than ( 8001),
        partition p10000 values less than (10001)
)
as
select * from t1
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'PT1',
                granularity      =>'ALL',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

alter table  t1 add constraint  t1_pk primary key(id);
alter table pt1 add constraint pt1_pk primary key(id) using index local;

create or replace function f(i_in  number)
return number
is
begin
        return i_in;
end;
/

Note that I’ve used ‘ALL’ as my granularity option – for such small tables this should mean that the statistics at the partition and global level are as accurate as they can be. And since the data is defined to be uniform I don’t expect the partitioning to introduce any peculiarities in the optimizer’s calculations of selectivity and cardinality. I’ve created the indexes after gathering stats on the tables – this is 12c (and 11.2.0.4) so the index stats will be collected with a 100% sample as the indexes are created. Finally I’ve created a function that simply returns its numeric input.

Now let’s run a couple of queries against the simple table and check the cardinality (Rows) predicted by the optimizer – the two plans follow the code that generated them:

set serveroutput off

select  max(v1)
from    t1
where   id between (select 500 from dual)
           and     (select 599 from dual)
;

select * from table(dbms_xplan.display_cursor);

select  max(v1)
from    t1
where   id between (select f(500) from dual)
           and     (select f(599) from dual)
;

select * from table(dbms_xplan.display_cursor);

======================
Actual Execution Plans
======================

select max(v1) from t1 where id between (select 500 from dual)
  and     (select 599 from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   101 |  1515 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |   101 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

select max(v1) from t1 where id between (select f(500) from dual)
     and     (select f(599) from dual)

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE                      |       |     1 |    15 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    25 |   375 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_PK |    45 |       |     2   (0)| 00:00:01 |
|   4 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
|   5 |     FAST DUAL                        |       |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">= AND "ID"<=)

In the first plan the optimizer has recognised the values 500 and 599, so its range-based calculation has produced a (matching, and nearly correct) prediction of 101 rows. In the second plan the function call has hidden the values so the optimizer has had to use the arithmetic for “ranges with unknown values” – which means it uses guesses for the selectivity of 0.45% for the index and 0.25% for the table. Maybe in a future release that f(500) will be evaluated in the same way that we can trigger in-list calculation with the precompute_subquery hint.

Now we repeat the query, but using the partitioned table – showing only the trimmed output from dbms_xplan.display_cursor():

select max(v1) from pt1 where id between (select 500 from dual)
   and     (select 599 from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     4 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |   101 |  1515 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |   101 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

select max(v1) from pt1 where id between (select f(500) from dual)
      and     (select f(599) from dual)

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |        |       |       |     3 (100)|          |       |       |
|   1 |  SORT AGGREGATE                             |        |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR                  |        |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|   3 |    TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PT1    |    25 |   375 |     3   (0)| 00:00:01 |   KEY |   KEY |
|*  4 |     INDEX RANGE SCAN                        | PT1_PK |    45 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   5 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |      FAST DUAL                              |        |     1 |       |     2   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("ID">= AND "ID"<=)

It’s great to see that the predicted cardinalities match the simple heap version exactly – but can you see anything odd about either of these plans ?

 

 

Pause for thought …

 

 

There’s nothing odd about the second plan but there’s a little puzzle in the first.

In theory, it seems, the optimizer is aware that the first query covers the range 500 – 599; so why are the pstart/pstop columns for operations 2-4 showing KEY-KEY, which usually means the optimizer knows that it will have some partition key values at run-time and will be able to do run-time partition elimination, but it doesn’t know what those key values are at parse time.

In this very simple case it’s (probably) not going to make any difference to the performance – but it may be worth some careful experimentation in more complex cases where you might have been hoping to get strict identification of partitions and partition-wise joins taking place. Yet another topic to put on the todo list of “pre-emptive investigations” with a reminder to re-run the tests from time to time.

 

 

July 27, 2015

Subquery Factoring (10)

Filed under: Bugs,CBO,Oracle,Subquery Factoring,Troubleshooting — Jonathan Lewis @ 1:26 pm GMT Jul 27,2015

What prompted me to write my previous note about subquerying was an upgrade to 12c, and a check that a few critical queries would not do something nasty on the upgrade. As ever it’s always interesting how many little oddities you can discover while looking closely as some little detail of how the optimizer works. Here’s an oddity that came up in the course of my playing around investigation in 12.1.0.2 – first some sample data:


create table t1
nologging
as
select * from all_objects;

create index t1_i1 on t1(owner) compress nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1 for columns owner size 254'
        );
end;
/

The all_objects view is convenient as a tool for modelling what I wanted to do since it has a column with a small number of distinct values and an extreme skew across those values. Here’s a slightly weird query that shows an odd costing effect:


with v1 as (
        select /*+ inline */ owner from t1 where owner > 'A'
)
select count(*) from v1 where owner = 'SYS'
union all
select count(*) from v1 where owner = 'SYSTEM'
;

Since the query uses the factored subquery twice and there’s a predicate on the subquery definition, I expect to see materialization as the default, and that’s what happened (even though I’ve engineered the query so that materialization is more expensive than executing inline). Here are the two plans from 12.1.0.2 (the same pattern appears in 11.2.0.4, though the costs are a little less across the board):


=======================
Unhinted (materializes)
=======================

---------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |     2 |   132 |    25  (20)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D661B_876C2CB |       |       |            |          |
|*  3 |    INDEX FAST FULL SCAN    | T1_I1                      | 85084 |   498K|    21  (15)| 00:00:01 |
|   4 |   UNION-ALL                |                            |       |       |            |          |
|   5 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  6 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|   7 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
|   8 |    SORT AGGREGATE          |                            |     1 |    66 |            |          |
|*  9 |     VIEW                   |                            | 85084 |  5483K|    13  (24)| 00:00:01 |
|  10 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D661B_876C2CB | 85084 |   498K|    13  (24)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

=============
Forced inline
=============

--------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     2 |    12 |    22  (14)| 00:00:01 |
|   1 |  UNION-ALL             |       |       |       |            |          |
|   2 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  3 |    INDEX FAST FULL SCAN| T1_I1 | 38784 |   227K|    21  (15)| 00:00:01 |
|   4 |   SORT AGGREGATE       |       |     1 |     6 |            |          |
|*  5 |    INDEX RANGE SCAN    | T1_I1 |   551 |  3306 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

I’m not surprised that the optimizer materialized the subquery – as I pointed out in my previous article, the choice seems to be rule-based (heuristic) rather than cost-based. What surprises me is that the cost for the default plan is not self-consistent – the optimizer seems to have lost the cost of generating the temporary table. The cost of the materialized query plan looks as if it ought to be 21 + 13 + 13 = 47. Even if the optimizer were smart enough to assume that the temporary table would be in the cache for the second scan (and therefore virtually free to access) we ought to see a cost of 21 + 13 = 34. As it is we have a cost of 25, which is 13 + 13 (or, if you check the 10053 trace file, 12.65 + 12.65, rounded).

Since the choice to materialize doesn’t seem to be cost-based (at present) this doesn’t really matter – but it’s always nice to see, and be able to understand, self-consistent figures in an execution plan.

Footnote

It is worth pointing out as a side note that materialization can actually be more expensive than running in-line, even for very simple examples. Subquery factoring seems to have become more robust and consistent over recent releases in terms of consistency of execution plans when the subqueries are put back inline, but you still need to think a little bit before rewriting a query for cosmetic (i.e. totally valid “readability”) reasons just to check whether the resulting query is going to produce an unexpected, and unexpectedly expensive, materialization.

July 24, 2015

Subquery Factoring (9)

Filed under: CBO,Oracle,Subquery Factoring,Tuning — Jonathan Lewis @ 12:34 pm GMT Jul 24,2015

Several years go (eight to be precise) I wrote a note suggesting that Oracle will not materialize a factored subquery unless it is used at least twice in the main query. I based this conclusion on a logical argument about the cost of creating and using a factored subquery and, at the time, I left it at that. A couple of years ago I came across an example where even with two uses of a factored subquery Oracle still didn’t materialize even though the cost of doing so would reduce the cost of the query – but I never got around to writing up the example, so here it is:


create table t1
as
select
        object_id, data_object_id, created, object_name, rpad('x',1000) padding
from
        all_objects
where
        rownum &lt;= 10000
;

exec dbms_stats.gather_table_stats(user,'T1')

explain plan for
with gen as (
        select /*+ materialize */ object_id, object_name from t1
)
select
        g1.object_name,
        g2.object_name
from
        gen g1,
        gen g2
where
        g2.object_id = g1.object_id
;

select * from table(dbms_xplan.display);

You’ll notice that my original table has very wide rows, but my factored subquery selects a “narrow” subset of those rows. My target is to have an example where doing a tablescan is very expensive but the temporary table holding the extracted data is much smaller and cheaper to scan.

I’ve included a materialize hint in the SQL above, but you need to run the code twice, once with, and once without the hint. Here are the two plans – unhinted first:


============================
Unhinted - won't materialize
============================

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |   468K|   428   (2)| 00:00:03 |
|*  1 |  HASH JOIN         |      | 10000 |   468K|   428   (2)| 00:00:03 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   234K|   214   (2)| 00:00:02 |
---------------------------------------------------------------------------

==================================
Hinted to materialize - lower cost
==================================

--------------------------------------------------------------------------------------------------------- 
| Id  | Operation                  | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT           |                            | 10000 |   585K|   227   (2)| 00:00:02 |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |            |          |
|   2 |   LOAD AS SELECT           | SYS_TEMP_0FD9D6664_9DAAEB7 |       |       |            |          | 
|   3 |    TABLE ACCESS FULL       | T1                         | 10000 |   234K|   214   (2)| 00:00:02 | 
|*  4 |   HASH JOIN                |                            | 10000 |   585K|    13   (8)| 00:00:01 | 
|   5 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   6 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 | 
|   7 |    VIEW                    |                            | 10000 |   292K|     6   (0)| 00:00:01 | 
|   8 |     TABLE ACCESS FULL      | SYS_TEMP_0FD9D6664_9DAAEB7 | 10000 |   234K|     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Clearly the optimizer isn’t considering the costs involved. If I add the predicate “where object_id > 0” (which identifies ALL the rows in the table), materialization occurs unhinted (with the same costs reported as for the hinted plan above. My tentative conclusion is that the transformation is a heuristic one that follows the rule “two or more appearances of the subquery and some indication of row selection in the subquery rowsource”. (In fact if the rowsource is “select * from pipeline_function” the requirement for subsetting doesn’t seem to apply.)

The plans above came from 11.2.0.4 but I got the same result, with a slight difference in costs, in 12.1.0.2. It’s worth pointing out that despite Oracle apparently ignoring the costs when deciding whether or not to materialize, it still seems to report self-consistent values after materialization: the 227 for the plan above is the 214 for creating the temporary table plus the 13 for deriving the hash join of the two copies of the temporary table.

July 20, 2015

12c Downgrade

Filed under: 12c,Bugs,CBO,Oracle — Jonathan Lewis @ 1:12 pm GMT Jul 20,2015

No, not really – but sometimes the optimizer gets better and gives you worse performance as a side effect when you upgrade. Here’s an example where 11.2.0.4 recognised (with a few hints) the case for a nested loop semi-join and 12c went a bit further and recognised the opportunity for doing a cunning “semi_to_inner” transformation … which just happened to do more work than the 11g plan.

Here’s a data set to get things going, I’ve got “parent” and “child” tables, but in this particular demonstration I won’t be invoking referential integrity:


create table chi
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1                              id,
        trunc((rownum-1)/10)                    n1,
        trunc(dbms_random.value(0,1000))        n2,
        rpad('x',1000)                          padding
from
        generator
;

create table par
as
with generator as (
        select  --+ materialize
                rownum  id
        from dual
        connect by
                level <= 1e4
)
select
        rownum - 1      id,
        rpad('x',1000)  padding
from
        generator
where
        rownum <= 1e3
;

alter table par modify id not null;
alter table par add constraint par_pk primary key(id)
-- deferrable
;

-- Now gather stats on the tables.

The code uses my standard framework that could generate a few million rows even though it’s only generating 1,000 in par and 10,000 in chi. The presence of the commented “deferrable” for the primary key constraint is for a secondary demonstration.

You’ll notice that the 1,000 values that appear in chi.n1 and chi.n2 are matched by the 1,000 rows that appear in the primary key of par – in some other experiment I’ve got two foreign keys from chi to par. Take note that the values in n1 are very well clustered because of the call to trunc() while the values in n2 are evenly scattered because of the call to dbms_random() – the data patterns are very different although the data content is very similar (the randomised data will still produce, on average, 10 rows per value).

So here’s the test code:


set serveroutput off
set linesize 156
set trimspool on
set pagesize 60

alter session set statistics_level = all;

prompt  =============================
prompt  Strictly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n1
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

prompt  =============================
prompt  Randomly ordered driving data
prompt  =============================

select
        /*+
                leading(@sel$5da710d3 chi@sel$1 par@sel$2)
                full   (@sel$5da710d3 chi@sel$1)
                use_nl (@sel$5da710d3 par@sel$2)
                index  (@sel$5da710d3 par@sel$2 (par.id))
        */
        count(*)
from
        chi
where   exists (
                select null
                from par
                where par.id = chi.n2
        )
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline alias cost'));

set serveroutput on
alter session set statistics_level = typical;

In both cases I’ve hinted the query quite heavily, using internally generated query block names, into running with a nested loop semi-join from chi to par. Since there are 10,000 rows in chi with no filter predicates, you might expect to see the probe into the par table starting 10,000 times returning (thanks to our perfect data match) one row for each start. Here are the run-time plans with rowsource execution stats from 11.2.0.4

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.14 |    1450 |   1041 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.14 |    1450 |   1041 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.12 |    1450 |   1041 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.07 |    1434 |   1037 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   1000 |   1048 |     0   (0)|   1000 |00:00:00.01 |      16 |      4 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

-----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   190 (100)|      1 |00:00:00.12 |    5544 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.12 |    5544 |
|   2 |   NESTED LOOPS SEMI |        |      1 |  10065 |   190   (4)|  10000 |00:00:00.10 |    5544 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10065 |   186   (2)|  10000 |00:00:00.02 |    1434 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |   4033 |   1048 |     0   (0)|   4033 |00:00:00.02 |    4110 |
-----------------------------------------------------------------------------------------------------

Notice how we do 1,000 starts of operation 4 when the data is well ordered, and 4,033 starts when the data is randomly ordered. For a semi-join nested loop the run-time engine uses the same caching mechanism as it does for scalar subqueries – a fact you can corroborate by removing the current hints and putting the /*+ no_unnest */ hint into the subquery so that you get a filter subquery plan, in which you will note exactly the same number of starts of the filter subquery.

As an extra benefit you’ll notice that the index probes for the well-ordered data have managed to take advantage of buffer pinning (statistic “buffer is pinned count”) – keeping the root block and most recent leaf block of the par_pk index pinned almost continually through the query; while the randomised data access unfortunately required Oracle to unpin and repin the index leaf blocks (even though there were only 2 in the index) as the scan of chi progessed.

Time to upgrade to 12.1.0.2 and see what happens:

=============================
Strictly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |    1448 |   1456 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |    1448 |   1456 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.20 |    1448 |   1456 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.06 |      16 |     27 |
--------------------------------------------------------------------------------------------------------------

=============================
Randomly ordered driving data
=============================

--------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |      1 |        |   189 (100)|      1 |00:00:00.22 |   11588 |   1429 |
|   1 |  SORT AGGREGATE     |        |      1 |      1 |            |      1 |00:00:00.22 |   11588 |   1429 |
|   2 |   NESTED LOOPS      |        |      1 |  10000 |   189   (4)|  10000 |00:00:00.19 |   11588 |   1429 |
|   3 |    TABLE ACCESS FULL| CHI    |      1 |  10000 |   185   (2)|  10000 |00:00:00.03 |    1432 |   1429 |
|*  4 |    INDEX UNIQUE SCAN| PAR_PK |  10000 |      1 |     0   (0)|  10000 |00:00:00.07 |   10156 |      0 |
--------------------------------------------------------------------------------------------------------------

Take a close look at operation 2 – it’s no longer a NESTED LOOP SEMI, the optimizer has got so smart (recognising the nature of the primary key on par) that it’s done a “semi_to_inner” transformation. But a side effect of the transformation is that the scalar subquery caching mechanism no longer applies so we probe the par table 10,000 times. When the driving data is well-ordered this hasn’t made much difference to the buffer gets (and related latch activity), but when the data is randomised the extra probes ramp the buffer gets up even further.

The timings (A-time) on these experiments are not particularly trustworthy – the differences between cached reads and direct path reads introduced more variation than the difference in Starts and Buffers, and the total CPU load is pretty small anyway – and I suspect that this difference won’t make much difference to most people most of the time. No doubt, though, there will be a few cases where a small change like this could have a noticeable effect on some important queries.

Footnote

There is a hint /*+ no_semi_to_inner(@queryblock object_alias) */ that I thought might persuade the optimizer to stick with the semi-join, but it didn’t have any effect. Since the “semi to inner” transformation (and the associated hints) are available in 11.2.0.4 I was a little puzzled that (a) I didn’t see the same transformation in the 11g test, and (b) that I couldn’t hint the transformation.  This makes me wonder if there’s a defect in 11g that might be fixed in a future patch.

It’s also nice to think that the scalar subquery caching optimisation used in semi-joins might eventually become available  to standard joins (in cases such as “join to parent”, perhaps).

July 17, 2015

Descending Indexes

Filed under: CBO,Oracle,Statistics,Troubleshooting — Jonathan Lewis @ 8:42 am GMT Jul 17,2015

I’ve written about optimizer defects with descending indexes before now but a problem came up on the OTN database forum a few days ago that made me decide to look very closely at an example where the arithmetic was clearly defective. The problem revolves around a table with two indexes, one on a date column (TH_UPDATE_TIMESTAMP) and the other a compound index which starts with the same column in descending order (TH_UPDATE_TIMESTAMP DESC, TH_TXN_CODE). The optimizer was picking the “descending” index in cases where it was clearly the wrong index (even after the statistics had been refreshed and all the usual errors relating to date-based indexes had been discounted). Here’s an execution plan from the system which shows that there’s something wrong with the optimizer:


SELECT COUNT(*) FROM TXN_HEADER WHERE TH_UPDATE_TIMESTAMP BETWEEN SYSDATE-30 AND SYSDATE;

---------------------------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |       |       |     4 (100)|          |
|   1 |  SORT AGGREGATE    |                  |     1 |     8 |            |          |
|*  2 |   FILTER           |                  |       |       |            |          |
|*  3 |    INDEX RANGE SCAN| TXN_HEADER_IDX17 |  1083K|  8462K|     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

There are two clues here: first, Oracle has used the (larger) two-column index when the single column is (almost guaranteed to be) the better choice simply because it will be smaller; secondly, we have a cost of 4 to acquire 1M rowids through an (implicitly b-tree) index range scan, which would require at least 250,000 index entries per leaf block to keep the cost that low (or 2,500 if you set the optimizer_index_cost_adj to 1; so it might just be possible if you had a 32KB block size).

The OP worked hard to supply information we asked for, and to follow up any suggestions we made; in the course of this we got the information that the table had about 90M rows and this “timestamp” column had about 45M distinct values ranging from 6th Sept 2012 to 2nd July 2015 with no nulls.

Based on this information I modelled the problem in an instance of 11.2.0.4 (the OP was using 11.2.0.3).


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                                                          id,
        to_date('06-SEP-2012 15:13:00','dd-mon-yyyy hh24:mi:ss') +
                trunc((rownum - 1 )/4) / (24 * 60)                      upd_timestamp,
        mod(rownum - 1,10)                                              txn_code,
        rpad('x',50)                                                    padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4 * 1030  *  24 * 60
;

create index t1_asc on t1(upd_timestamp) nologging;
create index t1_desc on t1(upd_timestamp desc) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1',
                cascade          => false
        );
end;
/

My data set has 4 rows per minute from 6th Sept 2012 to 3rd July 2015, with both an ascending and descending index on the upd_timestamp column. For reference, here are the statistics about the two indexes:


INDEX_NAME               SAMPLE     BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- ---------- ---------- ----------- -----------------
T1_DESC                 5932800          2       17379            154559
T1_ASC                  5932800          2       15737             59825

The ascending index is smaller with a significantly better clustering_factor, so for queries where either index would be a viable option the ascending index is the one (we think) that the optimizer should choose. In passing, the 5.9M index entries is exactly the number of rows in the table – these stats were computed automatically as the indexes were created.

Here’s a simple query with execution plan (with rowsource execution stats):

select max(id) from t1 where upd_timestamp between
to_date('01-jun-2015','dd-mon-yyyy')                and
to_date('30-jun-2015','dd-mon-yyyy')

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |      1 |        |    29 (100)|      1 |00:00:01.29 |    4710 |
|   1 |  SORT AGGREGATE              |         |      1 |      1 |            |      1 |00:00:01.29 |    4710 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1      |      1 |    167K|    29   (0)|    167K|00:00:00.98 |    4710 |
|*  3 |    INDEX RANGE SCAN          | T1_DESC |      1 |    885 |     5   (0)|    167K|00:00:00.35 |     492 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."SYS_NC00005$">=HEXTORAW('878CF9E1FEF8FEFAFF')  AND
              "T1"."SYS_NC00005$"<=HEXTORAW('878CF9FEF8FEF8FF') )
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00005$")>=TO_DATE(' 2015-06-01 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND SYS_OP_UNDESCEND("T1"."SYS_NC00005$")<=TO_DATE(' 2015-06-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))

The optimizer’s index estimate of 885 rowids is a long way off the actual rowcount of 167,000 – even though I have perfect stats describing uniform data and the query is simple enough that the optimizer should be able to get a good estimate. Mind you, the predicate section looks a lot messier than you might expect, and the table estimate is correct (though not consistent with the index estimate, of course).

Here’s the plan I get when I make the descending index invisible:


--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |      1 |        |  2146 (100)|      1 |00:00:01.25 |    2134 |
|   1 |  SORT AGGREGATE              |        |      1 |      1 |            |      1 |00:00:01.25 |    2134 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1     |      1 |    167K|  2146   (1)|    167K|00:00:00.94 |    2134 |
|*  3 |    INDEX RANGE SCAN          | T1_ASC |      1 |    167K|   453   (2)|    167K|00:00:00.31 |     446 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("UPD_TIMESTAMP">=TO_DATE(' 2015-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "UPD_TIMESTAMP"<=TO_DATE(' 2015-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

With the descending index invisible we can see that the cardinality estimate for the ascending index correct and notice how much higher this makes the cost. It’s not surprising that the optimizer picked the wrong index with such a difference in cost. The question now is why did the optimizer get the index cardinality (i.e. selectivity, hence cost) so badly wrong.

The answer is that the optimizer has made the same mistake that applications make by storing dates as character strings. It’s using the normal range-based calculation for the sys_op_descend() values recorded against the virtual column and has lost all understanding of the fact that these values represent dates. I can demonstrate this most easily by creating one more table, inserting a couple of rows, gathering stats, and showing you what the internal storage of a couple of “descendomg” dates looks like.


drop table t2;

create table t2 (d1 date);
create index t2_i1 on t2(d1 desc);

insert into t2 values('01-Jun-2015');
insert into t2 values('30-Jun-2015');
commit;

select d1, sys_nc00002$ from t2;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1',
                cascade          => false
        );
end;
/

column endpoint_value format 999,999,999,999,999,999,999,999,999,999,999,999
break on table_name skip 1

select
        table_name, column_name, endpoint_number, endpoint_value, to_char(endpoint_value,'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') end_hex
from
        user_tab_histograms
where
        table_name in ('T1','T2')
and     column_name like 'SYS%'
order by
        table_name, column_name, endpoint_number
;

I’ve put the dates 1st June 2015 and 30th June 2015 into the table because those were the values I used in the where clause of my query. Here are the results showing the internal representation of the (descending) index column and the stored low and high values for the virtual columns in both t1 and t2.


D1        SYS_NC00002$
--------- ------------------------
01-JUN-15 878CF9FEF8FEF8FF
30-JUN-15 878CF9E1FEF8FEFAFF

TABLE_NAME           COLUMN_NAME          ENDPOINT_NUMBER                                   ENDPOINT_VALUE END_HEX
-------------------- -------------------- --------------- ------------------------------------------------ ---------------------------------
T1                   SYS_NC00005$                       0  703,819,340,111,320,000,000,000,000,000,000,000    878CF8FCEFF30BCEBC8823F7000000
                     SYS_NC00005$                       1  703,880,027,955,346,000,000,000,000,000,000,000    878FF6F9EFF20256F3B404F7400000

T2                   SYS_NC00002$                       0  703,819,411,001,549,000,000,000,000,000,000,000    878CF9E1FEF8F24C1C7CED46200000
                     SYS_NC00002$                       1  703,819,419,969,389,000,000,000,000,000,000,000    878CF9FEF8FEEED28AC5B22A200000

If you check the t2 data (sys_nc00002$) values against the end_hex values in the histogram data you’ll see they match up to the first 6 bytes (12 digits). Oracle has done its standard processing – take the first 6 bytes of the column, covert to decimal, round to the most significant 15 digits (technically round(value, -21)), convert and store the result as hex.

So let’s do some arithmetic. The selectivity of a range scan that is in-bounds is (informally): “range we want” / “total range”. I’ve set up t2 to show us the values we will need to calculate the range we want, and I’ve reported the t1 values to allow us to calculate the total range, we just have to subtract the lower value (endpoint number 0) from the higher value for the two sys_nc0000N$ columns. So (ignoring the 21 zeros everywhere) our selectivity is:

  • (703,819,419,969,389 – 703,819,411,001,549) / ( 703,880,027,955,346 – 703,819,340,111,320) = 0.00014777
  • We have 5.9M rows in the table, so the cardinality estimate should be about: 5,932,800 * 0.00014777 = 876.69

The actual cardinality estimate was 885 – but we haven’t allowed for the exact form of the range-based predicate: we should add 1/num_distinct at both ends because the range is a closed range (greater than or EQUAL to, less than or EQUAL to) – which takes the cardinality estimate up to 884.69 which rounds to the 885 that the optimizer produced.

Conclusion

This note shows that Oracle is (at least for dates) losing information about the underlying data when dealing with the virtual columns associated with descending columns in indexes. As demonstrated that can lead to extremely bad selectivity estimates for predicates based on the original columns, and these selectivity estimates make it possible for Oracle to choose the wrong index and introduce a spuriously low cost into the calculation of the full execution plan.

Footnote

This note seems to match bug note 11072246 “Wrong Cardinality estimations for columns with DESC indexes”, but that bug is reported as fixed in 12.1, while this test case reproduces in 12.1.0.2

 

July 13, 2015

Missing Bloom

Filed under: CBO,Execution plans,Oracle,Partitioning — Jonathan Lewis @ 1:37 pm GMT Jul 13,2015

Here’s a little surprise that came up on the OTN database forum a few days ago. Rather than describe it, I’m just going to create a data set to demonstrate it, initially using 11.2.0.4 although the same thing happens on 12.1.0.2. The target is a query that joins to a range/hash composite partitioned table and uses a Bloom filter to do partition pruning at the subpartition level.  (Note to self: is it possible to see Bloom filters that operate at both the partition and subpartition level from a single join? I suspect not.). Here’s my code to create and populate both the partitioned table and a driving table for the query:


create table pt_composite_1 (
        part_key        number(8),
        subp_key        number(8),
        small_vc        varchar2(40),
        padding         varchar2(100)
)
nologging
partition by range(part_key)
subpartition by hash (subp_key)
subpartition template (
        subpartition g1,
        subpartition g2,
        subpartition g3,
        subpartition g4
)
(
        partition p01 values less than ( 10),
        partition p02 values less than ( 20),
        partition p03 values less than ( 30),
        partition p04 values less than ( 40),
        partition p05 values less than ( 50),
        partition p06 values less than ( 60),
        partition p07 values less than ( 70),
        partition p08 values less than ( 80),
        partition p09 values less than ( 90),
        partition p10 values less than (100),
        partition p11 values less than (110),
        partition p12 values less than (120)
)
;

insert into pt_composite_1 (
        part_key, subp_key, small_vc, padding
)
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        to_char(trunc((rownum-1)/20))   small_vc,
        rpad('x',100)                   padding
from
        dual
connect by
        rownum <= 25000
;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

insert /*+ append */ into pt_composite_1 select * from pt_composite_1;
commit;

create table driver (
        part_key        number(8),
        subp_key        number(8),
        test            number(4)
)
;

execute dbms_random.seed(0)

insert into driver
select
        trunc(dbms_random.value(0,120)) part_key,
        trunc(dbms_random.value(0,50))  subp_key,
        mod(rownum - 1, 30)
from
        dual
connect by
        level <= 60
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'driver',
                method_opt      => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'pt_composite_1',
                method_opt      => 'for all columns size 1',
                granularity     => 'all'
        );
end;
/

So I’ve got a table with 12 partitions, each hash subpartitioned into 4 subpartitions, a total of 400,000 rows, and a driving table with 60 rows with two rows per value for column test, which probably means two separate subpartitions identified for most values of test. I set this data up to do a number of different experiments but the only result I’m going to report here is about the sub-partition key. Here’s a query that selects all the data from the partitioned table that matches the subp_key value from a subset of the driver table:


select
        ptc.part_key, ptc.subp_key, count(*), max(ptc.small_vc)
from
        pt_composite_1  ptc
where
        (ptc.subp_key) in (
                select  subp_key
                from    driver
                where   test = 0
        )
group by
        ptc.part_key, ptc.subp_key
;

The optimizer has the option to unnest the subquery and turn the query into a semi-join (specifically a right outer join), and we might hope to see a hash join with Bloom filtering being used to restrict the hash subpartitions that we visit. (We’ve (probably) picked two values for the subp_key, so we don’t expect to visit more than 2 of the hash subpartitions from each of the range partitions.) Here’s the execution plan I got, with rowsource execution statistics:


-----------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                |      1 |        |       |       |    238 |00:00:01.74 |    2329 |       |       |          |
|   1 |  HASH GROUP BY                |                |      1 |    238 |       |       |    238 |00:00:01.74 |    2329 |   960K|   960K| 1377K (0)|
|*  2 |   HASH JOIN RIGHT SEMI        |                |      1 |  15997 |       |       |  15856 |00:00:01.69 |    2329 |  2440K|  2440K|  905K (0)|
|   3 |    PART JOIN FILTER CREATE    | :BF0000        |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|*  4 |     TABLE ACCESS FULL         | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   5 |    PARTITION RANGE ALL        |                |      1 |    400K|     1 |    12 |    104K|00:00:01.04 |    2306 |       |       |          |
|   6 |     PARTITION HASH JOIN-FILTER|                |     12 |    400K|:BF0000|:BF0000|    104K|00:00:00.63 |    2306 |       |       |          |
|   7 |      TABLE ACCESS FULL        | PT_COMPOSITE_1 |     12 |    400K|     1 |    48 |    104K|00:00:00.22 |    2306 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (part_keyentified by operation part_key):
---------------------------------------------------
   2 - access("PTC"."SUBP_KEY"="SUBP_KEY")
   4 - filter("TEST"=0)

Oracle has unnested the subquery and converted to a right outer semi-join using a hash join. While building the in-memory hash table it has constructed a Bloom filter at operation 3 of the plan to help it eliminate hash subpartitions, and used that Bloom filter at operation 6 of the plan. Our query does nothing to eliminate any of the range partitions so we can see operation 5 is a “partition range all”, and the application of the Bloom filter at operation 6 starts 12 times, once for each range partition. As we can see from operation 7, the Bloom filter generated by our selection from the driver table happened to identify just one subpartition – we start the TABLE (subpartition) ACCESS FULL 12 times, once for each range scan. If our driver data (and the Bloom filter) had identified 2 subpartitions we would have seen operation 7 start 24 times.

So we’ve met our first target – demonstrating that we can get a Bloom filter to eliminate at the subpartition level. Now we need to break things – the OP had a problem with a query that used Bloom filters on one system but didn’t use them for (nominally) the same setup on another system. Here’s my first attempt, with the resulting execution plan:


alter table pt_composite_1 add partition p13 values less than (130) subpartitions 8;

--------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name           | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                |      1 |        |       |       |    238 |00:00:01.75 |    2628 |       |       |          |
|   1 |  HASH GROUP BY             |                |      1 |   3310 |       |       |    238 |00:00:01.75 |    2628 |   960K|   960K| 2529K (0)|
|*  2 |   HASH JOIN RIGHT SEMI     |                |      1 |  16000 |       |       |  15856 |00:00:01.71 |    2628 |  2440K|  2440K|  743K (0)|
|*  3 |    TABLE ACCESS FULL       | DRIVER         |      1 |      2 |       |       |      2 |00:00:00.01 |      23 |       |       |          |
|   4 |    PARTITION RANGE ALL     |                |      1 |    400K|     1 |    13 |    104K|00:00:01.05 |    2605 |       |       |          |
|   5 |     PARTITION HASH SUBQUERY|                |     13 |    400K|KEY(SQ)|KEY(SQ)|    104K|00:00:00.64 |    2605 |       |       |          |
|   6 |      TABLE ACCESS FULL     | PT_COMPOSITE_1 |     13 |    400K|     1 |    56 |    104K|00:00:00.22 |    2306 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------------

I’ve added a new partition – with a different number of subpartitions from the table default. The Bloom filter has disappeared and the optimizer has decided to do subquery pruning instead. Drop the partition and recreate it with 2 subpartitions and the same thing happens; drop it and recreate it with 4 subpartitions and we’re back to a Bloom filter. It seems that the Bloom filter depends on every partition having the same number of subpartitions. (That’s not too surprising – the code to handle a Bloom filter when there are a variable number of subpartitions could get a little messy, and there probably aren’t many sites that use variable numbers of subpartitions.)

You might note from the Starts value for operation 5 (the subquery line) that the subquery had to run 13 times. Checking the 10046 trace file we can see the following SQL:


SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 0, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 1, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 2, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 3, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 4, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 5, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 6, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 7, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 8, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 9, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 10, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 11, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1
SELECT distinct TBL$OR$IDX$PART$NUM("PT_COMPOSITE_1", 0, 2, 12, "SUBP_KEY") FROM (SELECT "DRIVER"."SUBP_KEY" "SUBP_KEY" FROM "DRIVER" "DRIVER" WHERE "DRIVER"."TEST"=0) ORDER BY 1

This is the optimizer trying to work out, for each of the 12 partitions, which subpartitions it needs to visit. In my case this resulted in a full tablescan of the driver table for each partition. For hash subpartitions, at least, this does seem to be overkill (and can anyone say “bind variables”) – wouldn’t it be possible to run the query once for the partition with the most subpartitions and then derive the correct subpartition number for all other cases ? Maybe, but perhaps that’s just too much special-case code, or maybe it’s on the todo list. Realistically we might guess that a driver table would be very small compared to the size of the subpartitions you were eventually going to scan, so the excess extra work may be a tiny fraction of the total workload – so the added complexity might be seen as too much investment (and risk) for too little return. Maybe in a future release there will be a bit of patching to reduce this overhead.

Conclusion

You may find that some execution plans involving hash subpartitions become less efficient if you don’t keep the number of subpartitions per partition constant across the entire table. I’ve only tested with range/hash composites but there may be other variations of composite partitoning where a similar change in plans occurs.

Footnote

I haven’t done any exhaustive investigation yet, but so far I haven’t been able to create a data set, or perhaps a query, that allows the optimizer to create a Bloom filter (or two) from the driving table and then filter both the range partitions and the hash subpartitions. The closest I’ve come is a plan that shows a Bloom filter being used to filter the range partitions followed by a pruning subquery for the hash subpartitions.

July 7, 2015

CBO series

Filed under: CBO,Oracle — Jonathan Lewis @ 1:32 pm GMT Jul 7,2015

Update 9th July 2015:  part 4 now published.

I’ve changed the catalogue from a post to a page so that it gets a static address: https://jonathanlewis.wordpress.com/cbo-series/

I’ll leave this posting here for a while, but will probably remember to remove it some time in the future.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 5,954 other followers