Oracle Scratchpad

August 16, 2016

Month End

Filed under: audit,CBO,Non-technical — Jonathan Lewis @ 1:04 pm BST Aug 16,2016

A question about parallel query and cardinality estimates appeared on OTN a little while ago that prompted me to write this note about helping the optimizer do the best job with the least effort.  (A critical point in the correct answer to the original question is that parallel query may lead to “unexpected” dynamic sampling, which can make a huge difference to the choice of execution plans, but that’s another matter.)

The initial cardinality error in the plan came from the following predicate on a “Date dimension” table:


      AR_DATE.CALENDAR_DT   = AR_DATE.MONTH_END_DT 
AND   AR_DATE.CALENDAR_DT  >= ADD_MONTHS(TRUNC(SYSDATE,'fmyy'), -36) 
AND   AR_DATE.MONTH_END_DT >= ADD_MONTHS(TRUNC(SYSDATE,'fmyy'), -36)

In the parallel plan the estimated number of rows on a full tablescan of the table was 742, while on the serial plan the same tablescan produced a cardinality of 1. You will appreciate that having an estimate of 1 (or less) that is nearly three orders of magnitude wrong is likely to lead to a very bad execution plan.

My first thought when I saw this was (based on a purely intuitive interpretation): “there’s one day every month that’s the last day of the month and we’re looking at roughly that last 36 months so we might predict a cardinality of about 36”. That’s still a long way off the 742 estimate and 1,044 actual for the parallel query, but it’s a warning flag that the serial estimate is probably an important error – it’s also an example of the very simple “sanity checking” mental exercises that can accompany almost any execution plan analysis.

My second thought (which happened to be wrong, and would only have been right some time well before version 10.2.0.5) was that the optimizer would treat the add_months() expressions as unknown values and assign a selectivity of 5% to each of the predicates, reducing the combined selectivity to 1/400th of the selectivity it gave to the first predicate. In fact the optimizer evaluates the expressions and would have used the normal (required range / total range) calculation for those two predicates.

It’s the first predicate that I want to examine, though – how does the optimizer calculate a selectivity for it ? Here’s some code to generate sample data to use for testing.


rem
rem     Script:         month_end.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2016
rem

create table t1
nologging
as
select
        rownum                                                   id,
        to_date('01-Jan-2010','dd-mon-yyyy') + rownum - 1       calendar_date,
        add_months(
                trunc(to_date('01-Jan-2010','dd-mon-yyyy') + rownum - 1 ,'MM' ),
                1
        ) - 1                                                   month_end_date
from
        dual
connect by
        level <= trunc(sysdate) - to_date('01-jan_2010','dd-mon-yyyy') + 1
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

This clunky bit of code gives me consecutive dates from 1st Jan 2010 up to “today” with the month_end_date column holding the month end date corresponding to the row’s calendar_date. So now we can check what the optimizer makes of the predciate calendar_date = month_end_date:


set autotrace on explain

select count(*) from t1 where calendar_date = month_end_date;

  COUNT(*)
----------
        79

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    16 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    16 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    16 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("CALENDAR_DATE"="MONTH_END_DATE")

Looking at operation 2 we can see that, in effect, the optimizer has considered two independent predicates “calendar_date = {unknown}” and “month_end_date = {unknown}” and taken the lower of the two selectivities – which means the cardinality estimate is 1 because the calendar_date column is unique across this table.

There are various ways to work around problems like this. One of the simplest would be to tell Oracle to sample this table with the (table-level) hint /*+ dynamic_sampling(t1 1) */; in fact, since this predicate is effectively treated as an examination of two predicates the (cursor-level) hint /*+ dynamic_sampling(4) */ would also cause sampling to take place – note that level 4 or higher is required to trigger sampling for “multiple” predicates on a single table. As a general guideline we always try to minimise the risk of side effects so if this problem were embedded in a much larger query I would prefer the table-level hint over the cursor-level hint.

There are other options, though, that would allow you to bypass sampling – provided you can modify the SQL. The script I used to create this table also included the following statement:


alter table t1 add (
        date_offset1 generated always as (calendar_date - month_end_date) virtual,
        date_flag generated always as (case when calendar_date - month_end_date = 0 then 'Y' end) virtual
);

In 12c I would declare these virtual columns to be invisible to avoid problems with any SQL that didn’t use explicit column lists. For demonstration purposes I’ve set up two options – I can find the rows I want with one of two obvious predicates:

    date_offset1 = 0
    date_flag = 'Y'

In fact there’s a third predicate I could use that doesn’t need to know about the virtual columns:

    calendar_date - month_end_date = 0

Unfortunately I can’t arbitrarily swap the order of the two dates in the last predicate, and the optimizer won’t spot that it is also equivalent to “calendar_date = month_end_date”. Here are a few execution plans – for which the only significant bit is the cardinality estimate of the full tablescans:


select count(*) from t1 where date_flag = 'Y';

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    79 |   158 |     4  (25)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DATE_FLAG"='Y')



select count(*) from t1 where date_offset1 = 0;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    78 |   312 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("DATE_OFFSET1"=0)



select count(*) from t1 where calendar_date - month_end_date = 0;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |    78 |   312 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."DATE_OFFSET1"=0)

It’s interesting to note that the optimizer has transformed the last predicate into the equivalent virtual column expression to do the arithmetic. You might also note that the date_flag option is slightly more accurate, but that’s because it’s based on an expression which is null for the rows we don’t want while the date_offset1 column has a value for every row and a little bit of arithmetical rounding comes into play. You might also note that there’s a small cost difference – which I’d ascribe to the CPU cost that the optimizer has added for the CASE expression being applied on top of the simple date arithmetic.

Of course, whatever else you might play around with when working around a cardinality problem like this, I think the strategic aim for a data warehouse system would be to get a REAL flag column on the table and populate it at data loading time if month-end dates played an important part in the proceedings – though I have to say that the virtual flag column is one I do rather fancy.

 

July 26, 2016

Lost Concatenation

Filed under: Bugs,CBO,Oracle — Jonathan Lewis @ 10:46 am BST Jul 26,2016

This note models one feature of a problem that came up at a client site recently from a system running 12.1.0.2 – a possible bug in the way the optimizer handles a multi-column in-list that can lead to extremely bad cardinality estimates.

The original query was a simple three table join which produced a bad plan with extremely bad cardinality estimates; there was, however, a type-mismatch in one of the predicates (of the form “varchar_col = numeric”), and when this design flaw was addressed the plan changed dramatically and produced good cardinality estimates. The analysis of the plan, 10053 trace, and 10046 trace files done in-house suggested that the problem might relate in some way to an error in the handling of SQL Plan Directives to estimate cardinalities.

This was one of my “solve it in a couple of hours over the internet” assignments and I’d been sent a sample of the original query with the 10046 and 10053 trace files, and a modified version of the query that bypassed the problem, again including the 10046 and 10053 trace files, with a request to explain the problem and produce a simple test case to pass to Oracle support.

The first thing I noticed was that there was something very strange about the execution plan. Here’s the query and plan in from my simplified model, showing the same anomaly:


select  /*+ no_expand */
        count(*)
from    t1, t2
where
        t2.shipment_order_id = t1.order_id
and     (t1.id, t2.v1) in ( (5000, 98), (5000, 99))
;

-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |       |       |       |   331 (100)|          |
|   1 |  SORT AGGREGATE                       |       |     1 |    19 |       |            |          |
|*  2 |   HASH JOIN                           |       |     1 |    19 |  2056K|   331   (5)| 00:00:01 |
|   3 |    TABLE ACCESS FULL                  | T2    |   100K|   878K|       |   219   (3)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100K|   976K|       |     2   (0)| 00:00:01 |
|   5 |     BITMAP CONVERSION TO ROWIDS       |       |       |       |       |            |          |
|   6 |      BITMAP OR                        |       |       |       |       |            |          |
|   7 |       BITMAP CONVERSION FROM ROWIDS   |       |       |       |       |            |          |
|*  8 |        INDEX RANGE SCAN               | T1_PK |       |       |       |     1   (0)| 00:00:01 |
|   9 |       BITMAP CONVERSION FROM ROWIDS   |       |       |       |       |            |          |
|* 10 |        INDEX RANGE SCAN               | T1_PK |       |       |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")
       filter((
                  (TO_NUMBER("T2"."V1")=98 AND "T1"."ID"=5000) 
               OR (TO_NUMBER("T2"."V1")=99 AND "T1"."ID"=5000)
       ))
   8 - access("T1"."ID"=5000)
  10 - access("T1"."ID"=5000)

Before going on I meed to remind you that this is modelling a production problem. I had to use a hint to block a transformation that the optimizer wanted to do with my data set and statistics, I’ve got a deliberate type-mismatch in the data definitions, and there’s a simple rewrite of the SQL that would ensure that Oracle does something completely different).

The thing that caught my eye was the use of the bitmap transformation (operations 5,7,9) using exactly the same index range scan twice (operations 8,10). Furthermore, though not visible in the plan, the index in question was (as the name suggests) the primary key index on the table and it was a single column index – and “primary key = constant” should produce an “index unique scan” not a range scan.

Once you’ve added in the fact that operations 8 and 10 are the same “primary key = constant” predicates, you can also pick up on the fact that the cardinality calculation for the table access to table t1 can’t possibly produce more than one row – but it’s reporting a cardinality estimate of 100K rows (which happens to be the number of rows in the table.)

As a final point, you can see that there are no “Notes” about Dynamic Statistics or SQL Directives – this particular issue is not caused by anything to do with 12c sampling. In fact, having created the model, I ran it on 11.2.0.4 and got the same strange bitmap conversion and cardinality estimate. In the case of the client, the first pass the optimizer took went through exactly the same sort of process and produced a plan which was (probably) appropriate for a query where the driving table was going to produce (in their case) an estimated 4 million rows – but not appropriate for the actual 1 row that should have been identified.

In my example, if I allowed concatenation (i.e. removed the no_expand hint) I got the following plan:


------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |       |       |       |     8 (100)|          |
|   1 |  SORT AGGREGATE                        |       |     1 |    19 |            |          |
|   2 |   CONCATENATION                        |       |       |       |            |          |
|   3 |    NESTED LOOPS                        |       |     1 |    19 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID        | T1    |     1 |    10 |     2   (0)| 00:00:01 |
|*  5 |      INDEX UNIQUE SCAN                 | T1_PK |     1 |       |     1   (0)| 00:00:01 |
|*  6 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |     9 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |    NESTED LOOPS                        |       |     1 |    19 |     4   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID        | T1    |     1 |    10 |     2   (0)| 00:00:01 |
|* 10 |      INDEX UNIQUE SCAN                 | T1_PK |     1 |       |     1   (0)| 00:00:01 |
|* 11 |     TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |     9 |     2   (0)| 00:00:01 |
|* 12 |      INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1"."ID"=5000)
   6 - filter(TO_NUMBER("T2"."V1")=99)
   7 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")
  10 - access("T1"."ID"=5000)
  11 - filter((TO_NUMBER("T2"."V1")=98 AND (LNNVL(TO_NUMBER("T2"."V1")=99) OR
              LNNVL("T1"."ID"=5000))))
  12 - access("T2"."SHIPMENT_ORDER_ID"="T1"."ORDER_ID")

This is a much more appropriate plan – and similar to the type of plan the client saw when they eliminated the type-mismatch problem (I got a completely different plan when I used character values ’98’ and ’99’ in the in-list or when I used a numeric column with numeric literals).

Examining my 10053 trace file I found the following:

  • In the BASE STATISTICAL INFORMATION, the optimizer had picked up column statistics about the order_id column, but not about the id column in the in-list – this explained why the cardinality estimate was 100K, Oracle had “lost” the predicate.
  • In the “SINGLE TABLE ACCESS PATH”, the optimizer had acquired the statistics about the id column and calculated the cost of using the t1_pk index to access the table for a single key (AllEqUnique), then calculated the cost of doing a bitmap conversion twice (remember we have two entries in the in-list – it looks like the optimizer has “rediscovered” the predicate). But it had still kept the table cardinality of 4M.

After coming up with a bad plan thanks to this basic cardinality error, the 10053 trace file for the client’s query then went on to consider or-expansion (concatenation). Looking at this part of their trace file I could see that the BASE STATISTICAL INFORMATION now included the columns relevant to the in-list and the SINGLE TABLE ACCESS PATH cardinalities were suitable. Moreover when we got to the GENERAL PLANS the join to the second table in the join order showed a very sensible cost and cardinality – unfortunately, having been sensible up to that point, the optimizer then decided that an SQL Plan Directive should be used to generate a dynamic sampling query to check the join cardinality and the generated query again “lost” the in-list predicate, resulting in a “corrected” cardinality estimate of 6M instead of a correct cardinality estimate of 1. As usual, this massive over-estimate resulted in Oracle picking the wrong join method with a huge cost for the final join in the client’s query – so the optimizer discarded the or-expansion transformation and ran with the bad bitmap/hash join plan.

Bottom line for the client – we may have seen the same “lose the predicate” bug appearing in two different ways, or we may have seen two different “lose the predicate” bugs – either way a massive over-estimate due to “lost” predicates during cardinality calculations resulted in Oracle picking a very bad plan.

Footnote:

If you want to do further testing on the model, here’s the code to generate the data:


create table t1
nologging
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  id,
        rownum                                  order_id,
        rpad('x',100)                           padding
from
        generator, generator
where
        rownum <= 1e5
;

execute dbms_stats.gather_table_stats(user,'t1')

alter table t1 modify order_id not null;
alter table t1 add constraint t1_pk primary key(id);


create table t2
nologging
as
with generator as (
        select  rownum id
        from    dual
        connect by
                level <= 1e4
)
select
        rownum                                  shipment_order_id,
        mod(rownum-1,1000)                      n1,
        cast(mod(rownum-1,1000) as varchar2(6)) v1,
        rpad('x',100)                           padding
from
        generator, generator
where
        rownum <= 1e5
;

execute dbms_stats.gather_table_stats(user,'t2')

alter table t2 modify shipment_order_id not null;
create index t2_i1 on t2(shipment_order_id);

The interesting question now is WHY does Oracle lose the predicate – unfortunately my model may be too simplistic to allow us to work that out, but it might be sufficient to make it easy for an Oracle developer to see what’s going on and how best to address it. There is one bug on MoS (23343961) that might be related in some way, but I wasn’t convinced that the description was really close enough.

Update

This issue is now recorded on MoS as: Bug 24350407 : WRONG CARDINALITY ESTIMATION IN PRESENCE OF BITMAP OR

 

July 7, 2016

Invisible Bug

Filed under: 12c,Bugs,CBO,Indexing,Oracle — Jonathan Lewis @ 5:27 pm BST Jul 7,2016

At this Wednesday’s Oracle Midlands event someone asked me if Oracle would use the statistics on invisible indexes for the index sanity check. I answered that there had been a bug in the very early days of invisible indexes when the distinct_key statistic on the index could be used even though the index itself would not be considered as a candidate in the plan (and the invisible index is still used to avoid foreign key locking – even in 12c – it’s only supposed to be invisible to the optimizer).

The bug was fixed quite a long time ago – but a comment on the “Index Sanity” article has introduced me to a related bug that is still present in 11.2.0.4 where the presence of an invisible index can affect an execution plan. Here’s a little model (run under 11.2.0.4) to demonstrate:

rem
rem     Script:         invisible_index_bug.sql
rem     Author:         Jonathan Lewis
rem

execute dbms_random.seed(0)

drop table t2;
drop table t1;

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(mod(rownum,1000) as number(8,0))           n1,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;

create table t2
as
select
        rownum id,
        trunc(dbms_random.value(0,10000)) n1
from
        dual
connect by
        level <= 100
;
begin 
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

column n1 new_value m_n1
select n1 from t2 where id = 50;
clear columns

set autotrace traceonly explain

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

create unique index t2_i1 on t2(n1)
-- invisible
;

select
        t1.*
from
        t1, t2
where
        t2.n1 = &m_n1
;

set autotrace off

All I’ve done is create a couple of tables then do a join that we might expect to see executed as a cartesian merge join; at one point I was going to make the data more complicated and include a join condition, but decided to keep things small and simple so it’s a silly example but it is sufficient to make the point. The funny little bit about selecting an n1 value from t2 was also in anticipation of a more complex example but it does, at least, ensure I query for a value that is in range.

Here are the two execution plans from 11.2.0.4 – the key feature is that the plan changes after the invisible index is created:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  MERGE JOIN CARTESIAN|      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL  | T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   BUFFER SORT        |      |  1000K|   115M|  2261   (3)| 00:00:12 |
|   4 |    TABLE ACCESS FULL | T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|   1 |  NESTED LOOPS      |      |  1000K|   119M|  2263   (3)| 00:00:12 |
|*  2 |   TABLE ACCESS FULL| T2   |     1 |     4 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   |  1000K|   115M|  2261   (3)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N1"=5308)

Notice how the plan without the invisible index does a “sort” (actually a “buffer sort” so it’s just copying the data into local memory), while the plan with the not quite invisible enough index in place gets away with just a full tablescan. This is bug 16564891, associated with base bug 16544878.

The bug notes say “fixed in 12.2”, but in Oracle 12.1.0.2 the first plan appears in both cases, and we have to make the index visible to get the second plan. (Take note of the need for the “negative” test to prove the point; the fact that the same plan appears for both cases doesn’t, by itself, prove that the bug was fixed, we have to show that the plan would have changed if the bug had still been present).

I believe the problem isn’t the problem of Oracle using the statistics when it shouldn’t; the change appears because in 11g Oracle incorrectly allows itself to see the uniqueness of the index and infer that table t2 is a “single row” table. In 12c the optimizer calculates that there will probably be only one row but that doesn’t stop it choosing the merge join cartesian as the “insurance bet” against having to do more than one tablescan of the t1 table. We can see this difference in the 10053 trace files, the 11g file has an entry for the “Single Table Access Path” for t2 that reads:

1-ROW TABLES:  T2[T2]#0

If you read the bug note for bug 16564891 you’ll see that it has a more realistic example of the problem – and it may give you some idea of where you might run into the bug. In general I don’t think many people are likely to come across the problem since it revolves around uniqueness, which is rather an important property, and there can’t be many occasions when someone decides to add (or test dropping) a unique index. Given that the example in the bug looks like “add a unique index to a dimension table that’s joining to a fact table” that may be a good pointer to where you’re most likely to run into the problem — when you’re trying to enforce data correctness in a data warehouse.

 

June 28, 2016

Index Sanity

Filed under: CBO,extended stats,Indexing,Oracle,Statistics — Jonathan Lewis @ 8:43 am BST Jun 28,2016

By popular demand (well, one person emailed me to ask for it) I’m going to publish the source code for a little demo I’ve been giving since the beginning of the millennium – it concerns indexes and the potential side effects that you can get when you drop an index that you’re “not using”. I think I’ve mentioned the effect several times in the history of this blog, but I can’t find an explicit piece of demo code, so here it is – starting at the conclusion – as a cut and paste from an SQL*Plus session running against an 11g instance:


SQL> set autotrace traceonly explain
select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 1184213596

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |   484 | 64856 |   227   (2)| 00:00:02 |
|*  1 |  HASH JOIN          |      |   484 | 64856 |   227   (2)| 00:00:02 |
|*  2 |   HASH JOIN         |      |   484 | 57596 |    14   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T2   |    20 |  1160 |     4   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T1   |   484 | 29524 |    10   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   |  5000 | 75000 |   213   (2)| 00:00:02 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T3"."ID"="T1"."ID1")
   2 - access("T2"."ID1"="T1"."ID1" AND "T2"."IND_PAD"="T1"."IND_PAD"
              AND "T2"."ID2"="T1"."ID2")
   4 - filter("T1"."N1"<=50 AND "T1"."N1">=40)

SQL> drop index t2_i1;

Index dropped.

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
 11  ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2290830436

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    52 |  6968 |    67   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |    52 |  6968 |    67   (2)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |    52 |  6968 |    67   (2)| 00:00:01 |
|*  3 |    HASH JOIN                 |       |    52 |  6188 |    14   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL        | T2    |    20 |  1160 |     4   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS FULL        | T1    |   484 | 29524 |    10   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN         | T3_PK |     1 |       |     0   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID| T3    |     1 |    15 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."ID1"="T1"."ID1" AND "T2"."IND_PAD"="T1"."IND_PAD" AND
              "T2"."ID2"="T1"."ID2")
   5 - filter("T1"."N1"<=50 AND "T1"."N1">=40)
   6 - access("T3"."ID"="T1"."ID1")

Starting from the top – I’ve enabled autotrace which, technically, could mean that the plans are not the ones I’d see at run-time, but you can take my word for it that in 11g they are the run-time plans; then I’ve supplied a query that produces a plan with 3 full tablescans, two hash joins, and no index usage at all.

You’ll notice at operation 3 of the plan that table t2 is very small – only 20 rows selected, with no predicates that could have filtered that result down from a large table (take my word for it the stats have just been collected) so, as the ancient mythology would have it, we don’t really need an index on that table (a quick check tells me that the index wasn’t there to enforce uniqueness). Immediately after the first execution plan you can see that I’ve dropped an index called t2_i1 – trust me that IS the index on table t2.

We “run” the original query again, it gets re-optimised (and there’s no question of cardinality feedback or any other feature coming into play) and we get a different plan.

Dropping, or adding, a multi-column index to a table could change execution plans – even if the index is not present in the plan.

The reason for this is the “index sanity check”. When the optimizer is doing its cardinality estimates, if it see equality conditions on the set of columns that make up an index it can use the distinct_keys statistic from the index in the calculation rather than using the standard calculation of multiplying together the num_distinct of the separate columns. In earlier versions of Oracle there were some restrictions about uniqueness, but the limitations were removed in 11.1.0.7.

In my case there were 10 distinct values for id1, just one value for ind_pad, and 20 distinct values for id2 – but a total of only 20 distinct values for the combination. With an index in place on the combination the optimizer used the value 20 in its calculation, in the absence of the index it used the value 200 – that factor of 10 led to a drop in the join cardinality estimate from 484 rows to 52 rows – at which point the optimizer calculations made the next step in the plan change from a hash join to a nested loop join.

If you want to reproduce the demo, here’s the full script – the data isn’t a realistic data set, and I’ve had to use various non-standard settings to make the script as repeatable as possible – I’ve built the data set in a tablespace using an 8KB block size, 1MB uniform extents and manual (freelist) segment space management.


rem
rem     Script:         index_sanity.sql
rem     Author:         Jonathan Lewis
rem

drop table t3;
drop table t2;
drop table t1;

execute dbms_random.seed(0);

begin   
        begin           execute immediate 'purge recyclebin';
        exception       when others then null;
        end; 

        begin
                dbms_stats.set_system_stats('MBRC',16);
                dbms_stats.set_system_stats('MREADTIM',10);
                dbms_stats.set_system_stats('SREADTIM',5);
                dbms_stats.set_system_stats('CPUSPEED',1000);
        exception
                when others then null;
        end;

end;
/

create table t1
as
select
        mod(rownum,10)          id1,
        mod(rownum,20)          id2,
        rpad('x',40,'x')        ind_pad,
        mod(rownum,100)         n1,
        lpad(rownum,10,'0')     small_vc,
        rpad('x',50)            padding
from
        all_objects
where
        rownum  <= 4000
;

create table t2 
pctfree 99
pctused 1
as
select
        mod(rownum,10)          id1,
        mod(rownum,20)          id2,
        rpad('x',40,'x')        ind_pad,
        mod(rownum,100)         n1, 
        lpad(rownum,10,'0')     small_vc,
        rpad('x',200)           padding
from
        all_objects
where
        rownum <= 20
;

create table t3
pctfree 95
pctused 1
as
select
        rownum          id,
        rpad(rownum,10) small_vc,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 5000
;
begin
        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T1',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T2',
                method_opt => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname => user,
                tabname => 'T3',
                method_opt => 'for all columns size 1'
        );

end;
/

create        index t1_i1 on t1(id1, ind_pad, id2) pctfree 91;
create        index t2_i1 on t2(id1, ind_pad, id2) pctfree 91;
alter table t3 add constraint t3_pk primary key (id);

set autotrace traceonly explain

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
;

-- alter index t1_i1 invisible;
-- alter index t2_i1 invisible;

drop index t1_i1;
-- drop index t2_i1;

accept X prompt "Press return to coninue"

select
        t1.small_vc, t2.small_vc, t3.small_vc
from
        t1, t2, t3
where
        t1.n1 between 40 and 50
and     t2.id1 = t1.id1
and     t2.ind_pad = t1.ind_pad
and     t2.id2 = t1.id2
and     t3.id = t1.id1
;

set autotrace off

You’ll notice from the commented lines in the above that the effect appears whether you drop the index or make it invisible, also that there’s a similar index on the t1 table that matches the index on the t2 table – I could get the effect from dropping or making invisible either index.

There is a saving grace in 11g – if I do drop, or make invisible, one of these indexes I can protect myself against the statistical effect by create a column group on the same set of columns, and the num_distinct from the column group would serve the same purpose as the distinct_keys from the index.

June 17, 2016

Cardinality trick

Filed under: CBO,Oracle — Jonathan Lewis @ 1:02 pm BST Jun 17,2016

In the absence of a virtual column or function-based index, the optimizer uses a basic selectivity guess of 1% for a predicate of the form: “function(column) = constant”; but there is (at least) one special case where it gets clever; simple type conversion:


create table t1 nologging
as
select  cast(
                case
                        when mod(rownum,1000) = 0 then 0
                        when mod(rownum,100)  = 0 then 1
                                                  else 9
                end as varchar2(1)
        ) v1
from
        all_objects
where   rownum <= 50000
;

execute dbms_stats.gather_table_stats(user,'t1')

set autotrace on explain

select count(*) from t1 where v1 = 9;
select count(*) from t1 where sign(v1) = 1;

set autotrace off

If you think about the table creation script you’ll agree that there are 49,500 rows where v1 = ‘9’ so the first query could (in theory) produce an estimated cardinality of 49,500. However I’ve got a datatype error in the predicate and I haven’t created a histogram – and that’s not very helpful in two different ways. In general Oracle will use a guessed selectivity of 1% after applying a function to a column with equality, which would make it report an estimated cardinality of 500 for my sample query, but in this case Oracle uses the number of distinct values for the column (i.e. 3) to infer a value for the number of distinct values for the funciton and uses that in the first query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 16667 | 33334 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_NUMBER("V1")=9)

On the other hand, while the optimizer “knows” that the number of distinct values for the varchar2 will match the number of distinct numerical equivalents (not that that’s actually true), it has no idea how many of the varchar2 values will equate to negative, zero, or positive values, so the 1% selectivity re-appears for the second query:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     2 |    25   (4)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     2 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   500 |  1000 |    25   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(SIGN(TO_NUMBER("V1"))=1)

It shouldn’t surprise you to see that you would also get 500 as the estimated cardinality if the predicate were to read “sign(v1) = 2” — a value that the sign() function can’t take. The optimizer is using a generic rule, it doesn’t know the specifics of the function you’re using.

Footnote:

If you’re wondering when the number of distinct character values doesn’t match the number of distinct numeric values (and assuming all the character values are valid for conversion to numeric) just remember that the same number can be represented in different ways, for example you might change the original cast() that I used the generate the data to:

        cast(
                case
                        when mod(rownum,1000) = 0 then '0'
                        when mod(rownum, 100) = 0 then '1'
                        when mod(rownum,   2) = 0 then '9'
                                                  else '09'
                end as varchar2(2)
        ) v1

Now we have 4 distinct character values (so the optimizer’s estimate would drop to 15,000) but only 3 distinct numeric equivalents.

This, by the way, is why the optimizer transforms a predicate like “character_column = {numeric value}” into “to_number(character_column) = {numeric value}”, rather than converting it to “character_column = to_char({numeric value})”. A character string can only represent one numeric value while a numeric value can be displayed as an infinite number of different character strings (assuming the availability of the appropriate number of typing monkeys).

 

May 25, 2016

CBO++

Filed under: CBO,Oracle,Tuning — Jonathan Lewis @ 1:23 pm BST May 25,2016

While browsing the web recently for articles on the HyperLogLog algorithm that Oracle uses for some of its approximate functions, I came upon a blog post written in Jan 2014 with the title Use Subqueries to Count Distinct 50X Faster. There are various ways that subqueries can be used to rewrite queries for improved performance, but when the title caught my eye I couldn’t think of a way in which they could improve “count distinct”.  It turned out that the word “subquery” was being used (quite correctly) in the sense of “inline view” while my mind had immediately turned to subqueries in the select list or where clause.

The article started by pointing out that if you have a query that does a join then aggregates the result you might be able to improve performance by finding a way of rewriting the query to aggregate before doing the join. (See this note from 2008). The article then went one step further to optimise a “count distinct” by wrapping a “select count” around a “select distinct” inline view as follows:

Original
--------
  select
    dashboard_id,
    count(distinct user_id) as ct
  from time_on_site_logs 
  group by dashboard_id

Rewrite
-------
select 
    inline.dashboard_id, 
    count(1) as ct
  from (
    select distinct dashboard_id, user_id
    from time_on_site_logs
  ) as inline
  group by inline.dashboard_id

(I’ve reproduced only the central part of the query being examined and I’ve changed the name of the inline view to eliminate the potential visual confusion due to the word “distinct” appearing in its name in the original).

The article was written using the Postgres SQL with the comment that the technique was universal; and this brings me to the point of the post. The technique can be applied to Oracle’s dialect of SQL. Both ideas are good ideas whose effectiveness depends on the data patterns, data volume, and (potentially) indexing; but you may not need to rewrite the code because the optimizer is programmed to know that the ideas are good and it can transform your query to the appropriate form internally. The “place group by” transformation appeared in 11.1.0.6 in 2007, and the “transform distinct aggregation” appeared in 11.2.0.1 in 2009.

Here’s a litte demo of Oracle handling a variation of the query I’ve shown above:


rem     Script: transform_distinct_agg.sql
rem     Dated:  May 2016
rem     Author: J.P.Lewis

create table t1 nologging 
as 
select  * 
from    all_objects 
where   rownum <= 60000
;
execute dbms_stats.gather_table_stats(user,'t1', method_opt=>'for all columns size 1')

alter session set statistics_level = all;

select owner, count(distinct object_type) from t1 group by owner;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

prompt  ===============
prompt  Rewritten query
prompt  ===============

select  owner, count(1)
from    (
         select distinct owner, object_type
         from   t1
        ) distinct_types
group by
        owner
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

Here are the two execution plans, pulled from memory – with the outline and some other peripheral lines deleted:


-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |      1 |        |      5 |00:00:00.23 |     865 |       |       |          |
|   1 |  HASH GROUP BY       |           |      1 |      5 |      5 |00:00:00.23 |     865 |  1452K|  1452K|  728K (0)|
|   2 |   VIEW               | VM_NWVW_1 |      1 |     78 |     30 |00:00:00.23 |     865 |       |       |          |
|   3 |    HASH GROUP BY     |           |      1 |     78 |     30 |00:00:00.23 |     865 |  4588K|  1708K| 2497K (0)|
|   4 |     TABLE ACCESS FULL| T1        |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------

===============
Rewritten query
===============

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |00:00:00.23 |     865 |       |       |          |
|   1 |  HASH GROUP BY       |      |      1 |      5 |      5 |00:00:00.23 |     865 |  1452K|  1452K|  735K (0)|
|   2 |   VIEW               |      |      1 |     78 |     30 |00:00:00.23 |     865 |       |       |          |
|   3 |    HASH UNIQUE       |      |      1 |     78 |     30 |00:00:00.23 |     865 |  4588K|  1708K| 1345K (0)|
|   4 |     TABLE ACCESS FULL| T1   |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Apart from the change from “HASH UNIQUE” to “HASH GROUP BY” the two plans are the same, using the same resources – the UNIQUE being a special case of the algorithm for the GROUP BY. Here (with some cosmetic editing) is the SQL of the “unparsed query” taken from the 10053 (CBO) trace file – notice how similar it is to the text suggested by the original article, in particular the inline view to get the distinct list of owner and object_type (using a group by with no aggregated columns, rather than a distinct):

SELECT 
        VM_NWVW_1.$vm_col_2 OWNER,
        COUNT(VM_NWVW_1.$vm_col_1) COUNT(DISTINCTOBJECT_TYPE)
FROM    (
                SELECT
                        T1.OBJECT_TYPE $vm_col_1,
                        T1.OWNER $vm_col_2
                FROM    TEST_USER.T1 T1
                GROUP BY 
                        T1.OWNER,T1.OBJECT_TYPE
        ) VM_NWVW_1
GROUP BY
        VM_NWVW_1.$vm_col_2
;

The Oracle optimizer is pretty good at finding efficient transformations for the query you wrote so, rather than rewriting a query (with the option for making a mistake as you do so), you may only need to add a couple of hints to generate a suitable SQL Plan Baseline that you can attach to the original query.

Footnote:

Sometimes the optimizer will decide not to transform when it should, or decide to transform when it shouldn’t, so it’s nice to know that there are hints to block transformations – here’s the effect of adding /*+ qb_name(main) no_transform_distinct_agg(main) */ to my query:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      5 |00:00:00.25 |     865 |       |       |          |
|   1 |  SORT GROUP BY     |      |      1 |      5 |      5 |00:00:00.25 |     865 |  4096 |  4096 | 4096  (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  60000 |  60000 |00:00:00.12 |     865 |       |       |          |
----------------------------------------------------------------------------------------------------------------

The interesting thing to note here is that even though the query took a little longer to complete the amount of memory allocated to run the query in memory was only 4K compared to the 2M needed by the transformed query (In this example both workareas would have been in existence at the same time – that won’t be true of every query using multiple workareas.) This isn’t significant in this trivial case, but it demonstrates the point that sometimes there is no one best path – you can choose the path that protects the resource that’s under most pressure.

May 23, 2016

Virtual Partitions

Filed under: 12c,CBO,Infrastructure,Oracle,Partitioning — Jonathan Lewis @ 1:16 pm BST May 23,2016

Here’s a story of (my) failure prompted by a recent OTN posting.

The OP wants to use composite partitioning based on two different date columns – the table should be partitioned by range on the first date and subpartitioned by month on the second date. Here’s the (slightly modified) table creation script he supplied:


rem
rem     Script: virtual_partition.sql
rem     Dated:  May 2016
rem

CREATE TABLE M_DTX
(
        R_ID    NUMBER(3),
        R_AMT   NUMBER(5),
        DATE1   DATE,
        DATE2   DATE,
        VC GENERATED ALWAYS AS (EXTRACT(MONTH FROM DATE2))
)
PARTITION BY RANGE (DATE1) interval (numtoyminterval(1,'MONTH'))
SUBPARTITION BY LIST (VC)
        SUBPARTITION TEMPLATE (
                SUBPARTITION M1 VALUES (1),
                SUBPARTITION M2 VALUES (2),
                SUBPARTITION M3 VALUES (3),
                SUBPARTITION M4 VALUES (4),
                SUBPARTITION M5 VALUES (5),
                SUBPARTITION M6 VALUES (6),
                SUBPARTITION M7 VALUES (7),
                SUBPARTITION M8 VALUES (8),
                SUBPARTITION M9 VALUES (9),
                SUBPARTITION M10 VALUES (10),
                SUBPARTITION M11 VALUES (11),
                SUBPARTITION M12 VALUES (12)
        )
        (
        PARTITION M_DTX_2015060100 VALUES LESS THAN (TO_DATE('2015-06-01 00:00:01', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
        )
;

There’s nothing particularly exciting about this – until you get to the query requirement – the user wants to query on date1 and date2, and doesn’t know about the virtual month column, e.g. (and, I know that there should be a to_date() or ANSI equivalent here):

SELECT * FROM m_dtx WHERE date1 = trunc(sysdate) AND date2 = '01-Jun-2016';

Now, as a general rule, you don’t expect partition elimination to occur unless the partitioning column appears with a predicate that make elimination possible, so your first response to this query is that it could eliminate on date1, but can’t possibly eliminiate on vc because vc isn’t in the where clause. However it’s possible that the partitioning code might be coded to recognise that the subpartition is on a virtual column that is derived from date2, so perhaps it could generate a new predicate before optimising, for example:

date2 = '01-Jun-2016'  => vc = 6

Unfortunately, your first response is correct – the optimizer doesn’t get this clever, and doesn’t do the sub-partition elimination. Here’s the execution plan from 12.1.0.2 for the sample query, followed by the execution plan when I explicitly add the predicate vc = 6.


SQL_ID  8vk1a05uv16mb, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy')

Plan hash value: 3104206240

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST ALL   |       |     1 |    57 |    15   (7)| 00:00:01 |     1 |    12 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))



SQL_ID  33q012bdhjrpn, child number 0
-------------------------------------
SELECT /*+ dynamic_sampling(0) */  * FROM m_dtx WHERE date1 =
trunc(sysdate) AND date2 = to_date('01-Jun-2016','dd-mon-yyyy') and vc
= 6

Plan hash value: 938710559

------------------------------------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
|   2 |   PARTITION LIST SINGLE|       |     1 |    57 |    15   (7)| 00:00:01 |     6 |     6 |
|*  3 |    TABLE ACCESS FULL   | M_DTX |     1 |    57 |    15   (7)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter(("DATE2"=TO_DATE(' 2016-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "DATE1"=TRUNC(SYSDATE@!)))


Note how the predicate vc = 6  doesn’t show up in the predicate section in either case, but the execution plan shows PARTITION LIST ALL at operation 2 when we omit the predicate and PARTITION LIST SINGE when we include it (with suitable values also appearing for Pstart and Pstop). (The cost, by the way, is the cost of scanning a whole (range)partition whether or not the optimizer expects to restrict that scan to just one sub-partition.)

So the optimizer isn’t quite clever enough (yet). BUT … the optimizer can be very clever with constraints, combining constraints with predicates and applying transitive closure to produce new predicates – so maybe we could get the optimizer to do this if we helped it a little bit. Given the table definition supplied I’m going to assume that the date2 column is supposed to be non-null, so let’s add some truthful constraints/declarations to the table definition:


alter table m_dtx modify date2 not null;
alter table m_dtx modify vc  not null;
alter table m_dtx add constraint md_ck_vc check (vc = extract(month from date2));

Alas, this didn’t make any difference to the execution plan. But it did do something surprising to my attempts to load data into the table:


insert into m_dtx (r_id, r_amt, date1, date2)
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        mod(rownum, 1000),
        rownum,
        trunc(sysdate,'yyyy') + dbms_random.value(0,365),
        trunc(sysdate,'yyyy') + dbms_random.value(0,365)
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4
;

insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-01400: cannot insert NULL into (???)

So the array insert with the virtual column doesn’t like the NOT NULL constraint on the virtual column because vc is, presumably, still null when the constraint is checked (though there’s no problem with single row inserts with the values() clause – I wonder what happens with the PL/SQL “FORALL” clause) – so let’s remove the not null constraint on vc and see what happens.


insert into m_dtx (r_id, r_amt, date1, date2)
*
ERROR at line 1:
ORA-02290: check constraint (TEST_USER.MD_CK_VC) violated

Unsurprisingly, given the fact that Oracle didn’t like the not null constraint, the critical check constraint also fails. This, by the way, is odd because a check constraint should accept a row when the constraint doesn’t evaluate to FALSE, so (a) vc can’t have been evaluated at this point or the constraint would evaluate to TRUE – which is not FALSE, and (b) vc at this point can no longer be null or the constraint would evaluate to NULL – which is not FALSE: so what “value” has vc got that makes the constraint check return FALSE ?

Bottom line:

I can see some scope for an optimizer enhancement that tries to find eliminating predicates from virtual columns; and I think there’s a need for ensuring that we can safely add constraints to virtual columns – after all we might want to create an index on a virtual column and sometimes we need a NOT NULL declaration to ensure that an index-only execution path can be found. Unfortunately I have to end this blog without finding an immediate solution for the OP.

Despite this failure, though, there are cases (as I showed a couple of years ago) where the optimizer in 12c can get clever enough to recognize the connection between a queried date column and the virtual partitioning column based on that date column.

May 3, 2016

Debugging

Filed under: CBO,compression,Execution plans,Infrastructure,Oracle,Uncategorized — Jonathan Lewis @ 8:11 am BST May 3,2016

The OTN database forum supplied a little puzzle a few days ago – starting with the old, old, question: “Why is the plan with the higher cost taking less time to run?”

The standard (usually correct) answer to this question is that the optimizer doesn’t know all it needs to know to predict what’s going to happen, and even if it had perfect information about your data the model used isn’t perfect anyway. This was the correct answer in this case, but with a little twist in the tail that made it a little more entertaining. Here’s the query, with the two execution plans and the execution statistics from autotrace:


SELECT  /* INDEX(D XPKCLIENT_ACCOUNT) */ 
        E.ECID,A.acct_nb
FROM    
        client_account d, 
        client         e, 
        account        a
where
        A.acct_nb ='00000000000000722616216'</li>


AND     D.CLNT_ID = E.CLNT_ID
AND     D.ACCT_ID=A.ACCT_ID;

Plan (A) with a full tablescan of client_account – cost 808, runtime 1.38 seconds, buffer gets 17,955


-------------------------------------------------------------------------------------------------
| Id | Operation                      | Name           | Rows  | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                |     1 |    59  |   808 (14) | 00:00:10 |
|  1 |  NESTED LOOPS                  |                |     1 |    59  |   808 (14) | 00:00:10 |
|  2 |   NESTED LOOPS                 |                |     1 |    59  |   808 (14) | 00:00:10 |
|* 3 |    HASH JOIN                   |                |     1 |    42  |   806 (14) | 00:00:10 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT        |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT    |     1 |        |     4  (0) | 00:00:01 |
|  6 |     TABLE ACCESS FULL          | CLIENT_ACCOUNT |  9479K|   108M |   763 (10) | 00:00:09 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT      |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT         |     1 |    17  |     2  (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 17955  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Plan (B) with an index fast full scan on a client_account index – cost 1,190, runtime 0.86 seconds, buffer gets 28696


----------------------------------------------------------------------------------------------------
| Id | Operation                      | Name              | Rows  | Bytes  | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT               |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  1 |  NESTED LOOPS                  |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|  2 |   NESTED LOOPS                 |                   |     1 |    59  |  1190  (8) | 00:00:14 |
|* 3 |    HASH JOIN                   |                   |     1 |    42  |  1188  (8) | 00:00:14 |
|  4 |     TABLE ACCESS BY INDEX ROWID| ACCOUNT           |     1 |    30  |     5  (0) | 00:00:01 |
|* 5 |      INDEX RANGE SCAN          | XAK1ACCOUNT       |     1 |        |     4  (0) | 00:00:01 |
|  6 |     INDEX FAST FULL SCAN       | XPKCLIENT_ACCOUNT | 9479K |   108M |  1145  (5) | 00:00:13 |
|* 7 |    INDEX UNIQUE SCAN           | XPKCLIENT         |     1 |        |     1  (0) | 00:00:01 |
|  8 |   TABLE ACCESS BY INDEX ROWID  | CLIENT            |     1 |    17  |     2  (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
     0  recursive calls
     0  db block gets
 28696  consistent gets
     0  physical reads
     0  redo size
   623  bytes sent via SQL*Net to client
   524  bytes received via SQL*Net from client
     2  SQL*Net roundtrips to/from client
     0  sorts (memory)
     0  sorts (disk)
     1  rows processed

Note, particularly, that the two plans are the same apart from operation 6 where a full tablescan changes to an index fast full scan, predicting the same number of rows but with an increase of 50% in the cost; the increase in cost is matched by an increase in the reported workload – a 60% increase in the number of consistent reads and no disk reads or recursive SQL in either case. Yet the execution time (on multiple repeated executions) dropped by nearly 40%.

So what’s interesting and informative about the plan ?

The cost of a tablescan or an index fast full scan is easy to calculate; broadly speaking it’s “size of object” / “multiblock read count” * k, where k is some constant relating to the hardware capability. The costs in these plans and the autotrace statistics seem to be telling us that the index is bigger than the table, while the actual run times seem to be telling us that the index has to be smaller than the table.

It’s easy for an index to be bigger than its underlying table, of course; for example, if this table consisted of nothing but two short columns the index could easily be bigger (even after a rebuild) because it would be two short columns plus a rowid. If that were the case here, though, we would expect the time to fast full scan the index to be higher than the time to scan the table.

So two thoughts crossed my mind as I looked at operation 6:

  • Mixing block sizes in a database really messes up the optimizer costing, particularly for tablescans and index fast full scans. Maybe the table had been built in a tablespace using 32KB  blocks while the index had been built in a tablespace using the more common 8KB blocksize – I didn’t want to start working out the arithmetic but that might be just enough to produce the contradiction.
  • Maybe the table was both bigger AND smaller than the index – bigger because it held more data, smaller because it had been compressed. If so then the difference in run-time would be the overhead of decompressing the rows before projecting and comparing the data.

Conveniently the OP has included an extract from the 10053 trace:


Table Stats::
  Table: CLIENT_ACCOUNT  Alias:  D
    #Rows: 9479811  #Blks:  18110  AvgRowLen:  71.00  ChainCnt:  0.00
  Column (#1): CLNT_ID(
    AvgLen: 6 NDV: 1261035 Nulls: 0 Density: 0.000001 Min: 0 Max: 4244786
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 239
  Column (#2): ACCT_ID(
    AvgLen: 6 NDV: 9479811 Nulls: 0 Density: 0.000000 Min: 1 Max: 22028568
    Histogram: HtBal  #Bkts: 254  UncompBkts: 254  EndPtVals: 255

Index Stats::
  Index: XPKCLIENT_ACCOUNT  Col#: 1 2
    LVLS: 2  #LB: 28543  #DK: 9479811  LB/K: 1.00  DB/K: 1.00  CLUF: 1809449.00

Note that the index is called xpclient_account – which suggests “primary key” –  and the number of distinct keys in the index (#DK) matches the number of rows in the table(#Rows). The index and table stats seem to be consistent so we’re not looking at a problem of bad statistics.

Now to do some simple (ballpark) arithmetic: for the table can we check if  “rows * average row length / 8K =  blocks”. We can read the numbers directly from the trace file:  9,500,000 * 71 / 8,000 = 84,000.  It’s wrong by a factor of about 4 (so maybe it’s a 32K block, and maybe I could rule out that possibility by including more detail in the arithmetic – like allowing properly for the block header, row overheads, pctfree etc).

For the index – we believe it’s the primary key, so we know the number of rows in the index – it’s the same as the number of distinct keys. As for the length of an index entry, we have the index definition (col#: 1 2) and we happen to have the column stats about those columns so we know their average length. Allowing for the rowid and length bytes we can say that the average index entry is (6 +1) + (6 + 1) + 6 = 20 bytes.  So the number of leaf blocks should be roughy 9,500,000 * 20 / 8,000 = 23,750. That’s close enough given the reported 28,543 and the fact that I haven’t bothered to worry about row overheads, block overheads and pctfree.

The aritmetic provides an obvious guess – which turned out to be correct: the table is compressed, the index isn’t. The optimizer hasn’t allowed for the CPU cost of decompressing the compressed rows, so the time required to decompress 9.5M rows doesn’t appear in the execution plan.

Footnote.

Looking at the column stats, it looks like there are roughly 8 acct_ids for each clnt_id, so it would probably be sensible to compress the primary key index (clnt_id, acct_id) on the first column as this would probably reduce the size of the index by about 20%.

Better still – the client_account table has very short rows – it looks like a typical intersection table with a little extra data carried. Perhaps this is a table that should be an index-organized table with no overflow. It looks like there should also be an index (acct_id, clnt_id) on this table to optimse the path from account to client and this would become a secondary index – interestingly being one of those rare cases where the secondary index on an IOT might actually be a tiny bit smaller than the equivalent index on a heap table because (in recent versions of Oracle) primary key columns that are included in the secondary key are not repeated in the index structure. (It’s a little strange that this index doesn’t seem to exist already – you might have expected it to be there given the OP’s query, and given that it’s an “obvious” requirement as an index to protect the foreign key.)

The only argument against the IOT strategy is that the table clearly compresses very well as a heap table, so a compressed heap table plus two B-tree indexes might be more cost-effective than an IOT with a single secondary index.

 

April 1, 2016

Set Operations

Filed under: 12c,CBO,Execution plans,Oracle — Jonathan Lewis @ 2:20 pm BST Apr 1,2016

A recent post on the OTN database forum highlights a couple of important points ideas for optimising SQL. There are: (a) is there a logically equivalent way of stating the SQL and (b) is there a different “natural language” way of posing the problem.

The posting starts with a query, part of an execution plan, and a request to “get rid of the tablescan”. I guessed originally that the query came from an 11g instance, and the OP gave us some code to create the tables and indexes, so I’ve modelled the tables to get the indicated plan (then filled in the original numbers). This is the query, and my cosmetically adjusted version of the plan output that the OP probably got:


SELECT a.hotel_code
  FROM lf_hotel_temp a
WHERE a.service_id = : p_service_id
       AND (NOT EXISTS (SELECT *
          FROM lf_ts_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code)
        or NOT EXISTS (SELECT *
          FROM lf_gta_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code) 
       or  NOT EXISTS (SELECT *
          FROM lf_hb_roomtype_properties b
         WHERE a.hotel_code = b.hotel_code))

-------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  |  Bytes | Cost |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          | 12613 | 113517 |  135 |
|*  1 |  FILTER            |                          |       |        |      |
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            | 88433 | 795897 |  135 |
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |      7 |    1 |
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |      9 |    1 |
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     2 |     14 |    3 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
              "B" WHERE "B"."HOTEL_CODE"=:B3))
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   3 - access("B"."HOTEL_CODE"=:B1)
   4 - access("B"."HOTEL_CODE"=:B1)
   5 - access("B"."HOTEL_CODE"=:B1)

We were told in the original posting that there’s a primary key on lf_hotel_temp declared on (hotel_code, service_id), and we were given the definitions, sizes, and index declarations of all the table in a follow-up posting. It turns out that lf_hotel_temp consists of just those two columns and holds 278,000 rows: the optimizer’s estimate for the number of rows identified by a single service_id is over 88,000, and the nature of the query tells us that the optimizer would have to examine every one of those rows to check if it satisfied any of the three subqueries.

So how might Oracle access the rows ?  Given that the only columns used will all be in the primary key index (which implies not null constraints) there are four basic options: tablescan, index fast full scan, index full scan, and index skip scan. Given the most likely data content (i.e. lots of different hotel_codes), we can assume the skip scan would be a very bad idea. We can be sure that an index fast full scan will be lower cost than an index full scan – for anything except tiny indexes. Ultimately the question is really “why a tablescan instead of an index fast full scan?”. As I pointed out, though, the table consists of just those two columns – which means it’s perfectly reasonable for the index to be larger than the table as each entry of the index will consist of the two columns AND a rowid.

The first interesting bit

The question of why the access to lf_hotel_temp was by tablescan rather than some indexed method isn’t really interesting. The interesting bit is how (in principle) we might make the plan more efficient (if it really needs it); and this leads to two key, and general purpose, observations. As Andrew Sayer pointed out on the thread, we have a compound predicate:

    (not exists A OR not exists B OR not exists C)

and this is logically equivalent to

   not (exists A AND exists B AND exists C)

If we rewrite the query to reflect this equivalence could the optimizer find a different, better way of executing it:


select  /*+ dynamic_sampling(0) */
        a.hotel_code
from    lf_hotel_temp a
where
        a.service_id = :p_service_id
and     not(
                exists (
                        select  null
                        from    lf_ts_roomtype_properties ts
                        where   ts.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_gta_roomtype_properties gta
                        where   gta.hotel_code = a.hotel_code
                )
            and exists (
                        select  null
                        from    lf_hb_roomtype_properties hb
                        where   hb.hotel_code = a.hotel_code
                )
        )
;

Of course, I didn’t have the original data; so I copied the DDL supplied in the OTN thread and added a little DML to insert a few rows in the tables. The data I used looked like this:


insert into lf_hotel_temp (hotel_code, service_id) values ('A',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('B',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('C',1);
insert into lf_hotel_temp (hotel_code, service_id) values ('D',1);

-- insert into lf_ts_roomtype_properties values ( 'A','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_ts_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_gta_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_gta_roomtype_properties values ( 'B','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_gta_roomtype_properties values ( 'D','x','x',0,1,'x');

-- insert into lf_hb_roomtype_properties values ( 'A','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'B','x','x',0,1,'x');
-- insert into lf_hb_roomtype_properties values ( 'C','x','x',0,1,'x');
insert into lf_hb_roomtype_properties values ( 'D','x','x',0,1,'x');
commit;

It’s possible that with different data volumes you’d get different execution plans, but in 11g the optimizer transformed my query back into the original form – in other words it recognised the equivalence of “not (A and B and C)” and rewrote it as “(not A or not B or not C)” !

However, I also have 12c available, and I had created a script to build a model, so I ran the test on 12c. Both versions of the query produced the following plan:


----------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                       | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI |                            |     1 |  2027 |     8  (13)| 00:00:01 |
|   2 |   VIEW                | VW_SQ_1                    |    82 |   984 |     6   (0)| 00:00:01 |
|*  3 |    HASH JOIN SEMI     |                            |    82 |  2952 |     6   (0)| 00:00:01 |
|*  4 |     HASH JOIN         |                            |    82 |  1968 |     4   (0)| 00:00:01 |
|   5 |      TABLE ACCESS FULL| LF_GTA_ROOMTYPE_PROPERTIES |    82 |   984 |     2   (0)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| LF_HB_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|   7 |     TABLE ACCESS FULL | LF_TS_ROOMTYPE_PROPERTIES  |    82 |   984 |     2   (0)| 00:00:01 |
|*  8 |   INDEX FULL SCAN     | LF_HOTEL_TEMP_PK           |   101 |   198K|     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("VW_COL_1"="A"."HOTEL_CODE")
   3 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("TS"."HOTEL_CODE"))
   4 - access(SYS_OP_MAP_NONNULL("HB"."HOTEL_CODE")=SYS_OP_MAP_NONNULL("GTA"."HOTEL_CODE"))
   8 - access("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))

Ignore the numbers (I hadn’t collected stats, which is why I added the /*+ dynamic_sampling(0) */ hint – with stats in place 12c produced the FILTER plan that 11g had produced) the key feature is that Oracle has managed to transform my three filter subqueries into a single join subquery and then transformed the resulting subquery into an anti-join. It’s a pretty amazing transformation – the optimizer did it automatically in 12c, but if you are aware of the logical equivalence then you may find cases where you can turn “OR’s” into “AND’s” and help the optimizer to find transformations that it can’t find automatically.

The second interesting bit

If you think about the meaning behind the query (prompted, perhaps, by the logical equivalence described above) you might rephrase the question as “find me the hotel codes that fail to appear in all three related tables” – in English this is ambigious and open to catastrophic mis-interpretation so you might have another go and say “find me the hotel codes that appear in every one of the three related tables – those are the hotel codes I don’t want”. This latter expression, of course, is exactly what Oracle is doing by joining the three tables and then doing the “not exists”/anti-join against the result. Obviously you could translate the new English form into SQL by hand, with a three table join in a “not exists” subquery.

I actually took a different approach (which might, or might not, be efficient – depending on the actual data and indexes).  I translated the new English statement into the following:


select  /*+ dynamic_sampling(0) */
        hotel_code
from    lf_hotel_temp
where   service_id = :p_service_id
minus   (
        select  hotel_code
        from    lf_ts_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_gta_roomtype_properties
        where   hotel_code is not null
        intersect
        select  hotel_code
        from    lf_hb_roomtype_properties
        where   hotel_code is not null
        )
;

The three way intersection gets me the list of hotels that appear in all three tables; the minus operator takes the list of hotel with the correct service_id and eliminates from it the hotels that appear in the intersection – giving me the result I want.

For my tiny data set, this is the plan I got:

--------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     8  (50)| 00:00:01 |
|   1 |  MINUS                |                          |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     2  (50)| 00:00:01 |
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     1   (0)| 00:00:01 |
|   4 |   INTERSECTION        |                          |       |       |            |          |
|   5 |    INTERSECTION       |                          |       |       |            |          |
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     2  (50)| 00:00:01 |
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     1   (0)| 00:00:01 |
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     2  (50)| 00:00:01 |
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   7 - filter("HOTEL_CODE" IS NOT NULL)
   9 - filter("HOTEL_CODE" IS NOT NULL)
  11 - filter("HOTEL_CODE" IS NOT NULL)

Important note: I am not claiming that this use of set operators will be more efficient than a filter subquery or anti-join/semi-join approach, performance ultimately depends on the volume and patterns in the data combined with the available indexing. In this case you can almost see the classic performance compromise that we often see in Oracle – even in the trade-off between something as simple as choosing between a hash join and a nested loop join – should we operate this query as a tiny number of “bulk” operations, or as a (potentially) large number of tiny, high-precision operations.

If the original query was spending all it’s time on CPU running lots of subqueries, or doing lots of single block random I/Os because of the random ordering of the subqueries, then perhaps a couple of brute force “db file parallel read” index full scans would be a friendlier use of the available resources, run more quickly, and have less impact on every other user.

 

January 24, 2016

Semijoin_driver

Filed under: bitmaps,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 11:42 am BST Jan 24,2016

Here’s one of those odd little tricks that (a) may help in a couple of very special cases and (b) may show up at some future date – or maybe it already does – in the optimizer if it is recognised as a solution to a more popular problem. It’s about an apparent restriction on how the optimizer uses the BITMAP MERGE operation, and to demonstrate a very simple case I’ll start with a data set with just one bitmap index:


create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                  id,
        mod(rownum,1000)        n1,
        rpad('x',10,'x')        small_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create bitmap index t1_b1 on t1(n1);

select index_name, leaf_blocks, num_rows from user_indexes;

/*
INDEX_NAME           LEAF_BLOCKS   NUM_ROWS
-------------------- ----------- ----------
T1_B1                        500       1000
*/

Realistically we don’t expect to use a single bitmap index to access data from a large table, usually we expect to have queries that give the optimizer the option to choose and combine several bitmap indexes (possibly driving through dimension tables first) to reduce the target row set in the table to a cost-effective level.

In this example, though, I’ve created a column data set that many people might view as “inappropriate” as the target for a bitmap index – in one million rows I have one thousand distinct values, it’s not a “low cardinality” column – but, as Richard Foote (among others) has often had to point out, it’s wrong to think that bitmap indexes are only suitable for columns with a very small number of distinct values. Moreover, it’s the only index on the table, so no chance of combining bitmaps.

Another thing to notice about my data set is that the n1 column has been generated by the mod() function; because of this the column cycles through the 1,000 values I’ve allowed for it, and this means that the rows for any given value are scattered widely across the table, but it also means that if I find a row with the value X in it then there could well be a row with the value X+4 (say) in the same block.

I’ve reported the statistics from user_indexes at the end of the sample code. This shows you that the index holds 1,000 “rows” – i.e. each key value requires only one bitmap entry to cover the whole table, with two rows per leaf block.  (By comparison, a B-tree index oon the column was 2,077 leaf block uncompressed, or 1,538 leaf blocks when compressed).

So here’s the query I want to play with, followed by the run-time execution plan with stats (in this case from a 12.1.0.2 instance):


alter session set statistics_level = all;

select
        /*+
                qb_name(main)
        */
        max(small_vc)
from
        t1
where
        n1 in (1,5)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last outline'));

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |       |      1 |        |      1 |00:00:00.03 |    2006 |      4 |
|   1 |  SORT AGGREGATE                       |       |      1 |      1 |      1 |00:00:00.03 |    2006 |      4 |
|   2 |   INLIST ITERATOR                     |       |      1 |        |   2000 |00:00:00.03 |    2006 |      4 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      2 |   2000 |   2000 |00:00:00.02 |    2006 |      4 |
|   4 |     BITMAP CONVERSION TO ROWIDS       |       |      2 |        |   2000 |00:00:00.01 |       6 |      4 |
|*  5 |      BITMAP INDEX SINGLE VALUE        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |      4 |
------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(("N1"=1 OR "N1"=5))

The query is selecting 2,000 rows from the table, for n1 = 1 and n1 = 5, and the plan shows us that the optimizer probes the bitmap index twice (operation 5), once for each value, fetching all the rows for n1 = 1, then fetching all the rows for n1 = 5. This entails 2,000 buffer gets. However, we know that for every row where n1 = 1 there is another row nearby (probably in the same block) where n1 = 5 – it would be nice if we could pick up the 1 and the 5 at the same time and do less work.

Technically the optimizer has the necessary facility to do this – it’s known as the BITMAP MERGE – Oracle can read two or more entries from a bitmap index, superimpose the bits (effectively a BITMAP OR), then convert to rowids and visit the table. Unfortunately there are cases (and it seems to be only the simple cases) where this doesn’t appear to be allowed even when we – the users – can see that it might be a very effective strategy. So can we make it happen – and since I’ve asked the question you know that the answer is almost sure to be yes.

Here’s an alternate (messier) SQL statement that achieves the same result:


select
        /*+
                qb_name(main)
                semijoin_driver(@subq)
        */
        max(small_vc)
from
        t1
where
        n1 in (
                select /*+ qb_name(subq) */
                        *
                from    (
                        select 1 from dual
                        union all
                        select 5 from dual
                        )
        )
;

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |      1 |        |      1 |00:00:00.02 |    1074 |       |       |          |
|   1 |  SORT AGGREGATE                      |       |      1 |      1 |      1 |00:00:00.02 |    1074 |       |       |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |   2000 |   2000 |00:00:00.02 |    1074 |       |       |          |
|   3 |    BITMAP CONVERSION TO ROWIDS       |       |      1 |        |   2000 |00:00:00.01 |       6 |       |       |          |
|   4 |     BITMAP MERGE                     |       |      1 |        |      1 |00:00:00.01 |       6 |  1024K|   512K| 8192  (0)|
|   5 |      BITMAP KEY ITERATION            |       |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   6 |       VIEW                           |       |      1 |      2 |      2 |00:00:00.01 |       0 |       |       |          |
|   7 |        UNION-ALL                     |       |      1 |        |      2 |00:00:00.01 |       0 |       |       |          |
|   8 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|   9 |         FAST DUAL                    |       |      1 |      1 |      1 |00:00:00.01 |       0 |       |       |          |
|* 10 |       BITMAP INDEX RANGE SCAN        | T1_B1 |      2 |        |      2 |00:00:00.01 |       6 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  10 - access("N1"="from$_subquery$_002"."1")

Key points from this plan – and I’ll comment on the SQL in a moment: The number of buffer visits is roughly halved (In many cases we picked up two rows as we visited each buffer); operation 4 shows us that we did a BITMAP MERGE, and we can see in operations 5 to 10 that we did a BITMAP KEY ITERATION (which is a bit like a nested loop join – “for each row returned by child 1 (operation 6) we executed child 2 (operation 10)”) to probe the index twice and get two strings of bits that operation 4 could merge before operation 3 converted to rowids.

For a clearer picture of how we visit the table, here are the first few rows and last few rows from a version of the two queries where we simply select the ID column rather than aggregating on the small_vc column:

select  id from ...

Original query structure
         1
      1001
      2001
      3001
...
    997005
    998005
    999005

2000 rows selected.

Modified query structure:

         1
         5
      1001
      1005
      2001
      2005
...
    998001
    998005
    999001
    999005
    
2000 rows selected.

As you can see, one query returns all the n1 = 1 rows then all the n1 = 5 rows while the other query alternates as it walks through the merged bitmap. You may recall the Exadata indexing problem (now addressed, of course) from a few years back where the order in which rows were visited after a (B-tree) index range scan made a big difference to performance. This is the same type of issue – when the optimizer’s default plan gets the right data in the wrong order we may be able to find ways of modifying the SQL to visit the data in a more efficient order. In this case we save only fractions of a second because all the data is buffered, but it’s possible that in a production environment with much larger tables many, or all, of the re-visits could turn into physical reads.

Coming back to the SQL, the key to the re-write is to turn my IN-list into a subquery, and then tell the optimizer to use that subquery as a “semijoin driver”. This is essentially the mechanism used by the Star Tranformation, where the optimizer rewrites a simple join so that each dimension table (typically) appears twice, first as an IN subquery driving the bitmap selection then as a “joinback”. But (according to the manuals) a star transformation requires at least two dimension tables to be involved in a join to the central fact table – and that may be why the semi-join approach is not considered in this (and slightly more complex) cases.

 

 

My reference: bitmap_merge.sql, star_hack3.sql

January 11, 2016

Subquery Effects

Filed under: Bugs,CBO,Execution plans,Oracle,subqueries — Jonathan Lewis @ 12:50 pm BST Jan 11,2016

Towards the end of last year I used a query with a couple of “constant” subqueries as a focal point for a blog note on reading parallel execution plans. One of the comments on that note raised a question about cardinality estimates and, coincidentally, I received an email about the cost calculations for a similar query a few days later.

Unfortunately there are all sorts of anomalies, special cases, and changes that show up across versions when subqueries come into play – it’s only in recent versions of 11.2, for example, that a very simple example I’ve got of three equivalent statements that produce the same execution plan report the same costs and cardinality. (The queries are:  table with IN subquery, table with EXISTS subquery, table joined to “manually unnested” subquery – the three plans take the unnested subquery shape.)

I’m just going to pick out one particular anomaly, which is a costing error with multiple subqueries when “OR-ed”. Here’s my sample data set:


create table t1
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 20000
;


create table t2
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 25000
;

create table t3
nologging
as
select
        rownum                  n1,
        rownum                  n2,
        rownum                  n3,
        lpad(rownum,10)         small_vc,
        rpad('x',100,'x')       padding
from dual
connect by
        level <= 30000
;
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t1',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t2',
                method_opt       => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'t3',
                method_opt       => 'for all columns size 1'
        );
end;
/

The three tables are slightly different sizes so that it will be easy to see different costs of tablescans, and there are no indexes to everything I do in the queries will be tablescans. Here are six queries I’m going to test – they all scan t1, with “constant” subqueries against t2 and/or t3. The first pair is just to show you the basic cost of the query with a single subquery, the second pair shows you the default action with two subqueries in two different orders, the final pair shows you what happens with two subqueries when you block subquery pushing.


select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n2 > (select avg(t2.n2) from t2)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     t1.n3 > (select avg(t3.n3) from t3)
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select avg(t2.n2) from t2)
         or t1.n3 > (select avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select avg(t3.n3) from t3)
         or t1.n2 > (select avg(t2.n2) from t2)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
         or t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
        )
;

select
        max(t1.n1)
from
        t1
where
        t1.n1 > 10000
and     (
            t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
         or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
        )
;

Here are the first two plans, pulled from memory (which you might have guessed thanks to the “disappearing subquery predicate” in the predicate section. These examples came from 12.1.0.2, but the same happens in 11.2.0.4:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N2">))

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   123 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   500 |  5000 |    49   (3)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND "T1"."N3">))

As you can see, the cost of the query is the cost of the t1 tablescan plus the cost of running the t2 or t3 subquery once: 111 = 49 + 62, and 123 = 49 + 74.

(As a general guideline, recent versions of the optimizer tend to allow for subqueries by including “cost of subquery” * “number of times the optimizer thinks it will execute” – in this case the optimizer knows that the subquery will run exactly once).

But what happens when we test the query that applies BOTH subqueries to the tablescan ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N2"> OR "T1"."N3">)))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |    50 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   975 | 14625 |    50   (4)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   4 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   5 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   6 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N1">10000 AND ("T1"."N3"> OR "T1"."N2">)))

The cost of the query in both cases is just the cost of the tablescan of t1 – the subqueries are, apparently, free. You can check from the predicate section, by the way, that the subqueries are applied in the order they appear in original statement.

Does anything change if the subqueries are not pushed ?


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   111 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N2"> OR "T1"."N3">))
   3 - filter("T1"."N1">10000)

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |       |       |   124 (100)|          |
|   1 |  SORT AGGREGATE      |      |     1 |    15 |            |          |
|*  2 |   FILTER             |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   4 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   5 |     TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   6 |    SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |     TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("T1"."N3"> OR "T1"."N2">))
   3 - filter("T1"."N1">10000)

The two plans have different costs – and the cost is the cost of the tablescan of t1 plus the cost of just the first subquery in the filter predciate list.

The non-pushed subqueries show up another anomaly: you’ll notice that the t1 tablescan reports 10,001 rows cardinality, but the FILTER operation doesn’t have an associated cardinality so we can’t see how many rows the optimizer thinks will survive the subqueries. So let’s run a query that allows us to see the surviving row estimate:


select
        max(n1)
from
        (
        select
                /*+ no_eliminate_oby */
                t1.n1
        from
                t1
        where
                t1.n1 > 10000
        and     (
                   t1.n3 > (select /*+ no_push_subq */ avg(t3.n3) from t3)
                or t1.n2 > (select /*+ no_push_subq */ avg(t2.n2) from t2)
                )
        order by
                n1
        )
;

-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |       |       |   126 (100)|          |
|   1 |  SORT AGGREGATE        |      |     1 |    13 |            |          |
|   2 |   VIEW                 |      | 10001 |   126K|   126   (5)| 00:00:01 |
|   3 |    SORT ORDER BY       |      | 10001 |   146K|   126   (5)| 00:00:01 |
|*  4 |     FILTER             |      |       |       |            |          |
|*  5 |      TABLE ACCESS FULL | T1   | 10001 |   146K|    50   (4)| 00:00:01 |
|   6 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   7 |       TABLE ACCESS FULL| T3   | 30000 |   146K|    74   (3)| 00:00:01 |
|   8 |      SORT AGGREGATE    |      |     1 |     5 |            |          |
|   9 |       TABLE ACCESS FULL| T2   | 25000 |   122K|    62   (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(("T1"."N3"> OR "T1"."N2">))
   5 - filter("T1"."N1">10000)

As you can see, the SORT ORDER BY operation thinks it’s going to handle 10,0001 rows – it looks as if the optimizer arithmetic hasn’t applied the usual subquery guess of 5% for the two subqueries. (When the subqueries were automatically pushed you saw a cardinality of 975 – which is 5% for subquery t2 plus (due to OR) 5% for subquery t3 minus 5% of 5% (=25) for the overlap – this is the standard OR arithmetic)

tl;dr

Although the optimizer code has been enhanced in many places for dealing with subquery estimates, but there are still some odd errors and inconsistencies that you need to be aware of. The examples I’ve shown may not be particularly significant in terms of what they do, but the pattern is one that you may recognise in more complex queries.

 

Reference script: subq_cost_anomaly_2.sql

 

January 6, 2016

NLS Mess

Filed under: Bugs,CBO,Execution plans,Function based indexes,Indexing,Oracle — Jonathan Lewis @ 1:18 pm BST Jan 6,2016

The Oracle database has all sorts of little details built into it to help it deal with multi-national companies, but since they’re not commonly used you can find all sorts of odd “buggy” bits of behaviour when you start to look closely. I have to put “buggy” in quotes because some of the reported oddities are the inevitable consequences of (for example) how multi-byte character sets have to work; but some of the oddities look as if they simply wouldn’t be there if the programmer writing the relevant bit of code had remembered that they also had to cater for some NLS feature.

Here’s an example of the type of unexpected behaviour that can appear. There probably are some bugs in the area I’m going to demonstrate but, at first glance, I thought I was looking at an acceptable limitation imposed by a generic requirement. The example came from AskTom. which is why the data set isn’t my usual “t1” generation (and the formatting and capitalisation isn’t according to my usual standards).

The problem involves Case Insensitive indexing.


ALTER session SET nls_sort=binary_ci;
ALTER session SET nls_comp=linguistic;

CREATE TABLE log_data(
  account_id NUMBER,
  log_type NUMBER,
  sys_name VARCHAR2(30),
  log_time TIMESTAMP,
  msg varchar2(4000)
)
nologging
;

insert /*+ append */ into log_data(
  account_id,
  log_type,
  sys_name,
  log_time,
  msg
)
select
        5,
        2,
        dbms_random.string('a',1),
        sysdate + dbms_random.value,
        rpad('x',200)
from
        dual
connect by
        level <= 26000
;


create index log_date on log_data (
        account_id, 
        log_type, 
--      sys_name,
        NLSSORT(sys_name,'NLS_SORT=BINARY_CI'),
        log_time
)
nologging
;
  
rem     ======================================================================
rem     Need to gather stats AFTER index creation because of the hidden column
rem     ======================================================================
  
begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'LOG_DATA',
                method_opt       => 'for all columns size 1'
        );
end;
/

And here’s the query I want to optimize:


SELECT 
        *
FROM
  (
    SELECT
        sys_name, log_time,  substr(msg,1,40) msg
    FROM log_data
    WHERE
      account_id=5
      AND log_type=2
      AND sys_name='a'
    ORDER BY
      log_time  desc
  )
WHERE
  rownum <= 10
;

The requirement of the query is that we see the ten most recent entries for a given combination of account_id, log_type and sys_name (ignoring case in sys_name). The orginal table has tens of millions of rows, of course, with many combinations, and some of the combinations have a very large number of entries hence the desire to find an access path that gets just the 10 rows we want without getting all the rows for a combination and sorting them before returning the ten.

Normally we would just create an index that started with the 3 columns used in the equality and ending with the column in the order by clause, and that would be enough for the optimizer to see the option for a “sort order by nosort” operation to get the required data through an index range scan; so that’s the index the code sample creates, except that since we’ve enabled case insensitive sorting we need to use a function-based index to hold the case-insensitive version of sys_name.

Here’s the execution plan we would get if we DIDN’T use the nlssort() function in the index – I’ve run the query in 11.2.0.4 and pulled the plan from memory with rowsource execution stats enabled:


---------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |   605 (100)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.02 |    1065 |       |       |          |
|   2 |   VIEW                         |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY       |          |      1 |    500 |   605   (1)|     10 |00:00:00.02 |    1065 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID| LOG_DATA |      1 |    500 |   603   (1)|    966 |00:00:00.01 |    1065 |       |       |          |
|*  5 |      INDEX RANGE SCAN          | LOG_DATE |      1 |    500 |   103   (3)|    966 |00:00:00.01 |     100 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2)
       filter(NLSSORT("SYS_NAME",'nls_sort=''BINARY_CI''')=HEXTORAW('6100') )

Notice particularly the filter predicate at operation 5: that’s the thing we need to get into the index before we can avoid picking up excess data and sorting it. Notice also in the A-Rows column that we acquired 966 rows from the table before sorting and discarding all but 10 of them at operation 3.

Notice especially how important it is to look at the predicate section of an execution plan to gain a full understanding of what’s happening.

So here’s the execution plan we get by default with the function-based index in place:


----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |          |      1 |        |    13 (100)|     10 |00:00:00.01 |     969 |       |       |          |
|*  1 |  COUNT STOPKEY                  |          |      1 |        |            |     10 |00:00:00.01 |     969 |       |       |          |
|   2 |   VIEW                          |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |       |       |          |
|*  3 |    SORT ORDER BY STOPKEY        |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |     969 |  2048 |  2048 | 2048  (0)|
|   4 |     TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|    966 |00:00:00.01 |     969 |       |       |          |
|*  5 |      INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|    966 |00:00:00.01 |       5 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   3 - filter(ROWNUM<=10)
   5 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

It didn’t work ! (Check the A-Rows at operations 4 and 5, and the sort that we didn’t want at operation 3 where the data is finally reduced to 10 rows.

But there’s something odd going on here – look at the predicate section: our three predicates are all access predicates for the index range scan descending. We are doing exactly what we want to do with the index, but we’re not stopping after the 10 rows that we need, we’re getting all of them (in the order we want) and then doing a trivial sort and discard. Look at the Cost column – the cost at operation 4 is exactly what we might expect for the 10 rows we want to see, and the E-rows at line 5 is clearly based on our “first 10 rows” requirement.

This raises two questions:

  1. What’s gone wrong ?
  2. Can we work around the problem ?

The answer to (1) is, I think, that there’s a bug in the code. Looking at the 10053 trace file I can see the optimizer correctly handling the arithmetic of the virtual column (the sys_nc000006$) representing the function in the index and then getting to the point where it goes into a code section relating to “Recost for ORDER BY”, and brings back the original function as a filter predicate – I think that in the recosting it may be losing track of the fact that sys_nc000006$ and nlssort(sys_name, ‘nls_sort=binary_ci’) are the same thing and therefore can’t apply the rule about “Equality on 1st N columns, order by on the remainder”.

There are several answers to (2).

Workarounds

The honest hack

The first one is simply to fall back to the old (probably version 7, possibly version 8) requirement for getting the “sort order by nosort” operation – put all the index columns into the order by clause. Unfortunately the optimizer then did a tablescan rather than an index range scan because my data set was so small, so I had to hack the system stats temporarily to make the tablescan very expensive:


begin
        dbms_stats.set_system_stats('MBRC',2);
        dbms_stats.set_system_stats('MREADTIM',20); 
        dbms_stats.set_system_stats('SREADTIM',5);
        dbms_stats.set_system_stats('CPUSPEED',1000); 
end;
/

... order by account_id desc, log_type desc, sys_name desc, lot_time desc

Unfortunately the optimizer still went wrong – it did an ASCENDING index range scan sorting all the data. I actually had to hint the code to use the index in descending order to get the following execution plan:


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |  1215 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |   1000 |  1215   (1)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |  1006   (1)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |   1000 |     5   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

The A-Rows tells us we’ve accessed the minimum data set, and the absence of the SORT ORDER BY STOPKEY operation tells us that we’ve avoided doing the sort. Notice, though that the cost is the cost that would have been appropriate if we have accessed all 1,000 rows that matched the equality predicates. This is an example of a plan that you couldn’t really trust if all you had done was an “explain plan” rather than running the query and checking the rowsource execution stats. If you ignore the A-Rows it looks as if the plan WOULD get all the data in order and only eliminate the redundant rows at operation 1.

The silly surprise

The original author of the problem came up with this one. Put in two predicates which, between them are equivalent to the original requirement:


where ...
and     sys_name >= 'a'
and     sys_name <= 'a'

Clearly this is totally silly – the optimizer can fold this pair of predicates into the single predicate “sys_name = ‘a'”, so it shouldn’t make any difference. But here’s the execution plan:

------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "LOG_DATA"."SYS_NC00006$"=HEXTORAW('6100') )

Yes, it’s (structurally) exactly the same plan, with exactly the same predicate section except that (a) it gets there without being hinted, (b) the Cost column looks appropriate all down the line, and (c) the E-Rows value for the VIEW operator would have helped us appreciate that the correct elimination was (probably) going to happen if all we had done was the Explain Plan.

The dirty hack

I know the name of the hidden column that’s causing the problem, and I know how to generate the value it has to be – so let’s give Oracle exactly what it needs to see rather than allowing its internal transformation to rewrite the SQL:

...
AND sys_nc00006$ = nlssort('a','nls_sort=binary_ci')
...


------------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |      1 |        |    13 (100)|     10 |00:00:00.01 |      13 |
|*  1 |  COUNT STOPKEY                 |          |      1 |        |            |     10 |00:00:00.01 |      13 |
|   2 |   VIEW                         |          |      1 |     11 |    13   (0)|     10 |00:00:00.01 |      13 |
|   3 |    TABLE ACCESS BY INDEX ROWID | LOG_DATA |      1 |   1000 |    13   (0)|     10 |00:00:00.01 |      13 |
|*  4 |     INDEX RANGE SCAN DESCENDING| LOG_DATE |      1 |     11 |     2   (0)|     10 |00:00:00.01 |       3 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(ROWNUM<=10)
   4 - access("ACCOUNT_ID"=5 AND "LOG_TYPE"=2 AND "SYS_NC00006$"=HEXTORAW('6100') )

We get exactly the plan we need – and the silly thing about this example is that it’s a case where we get the plan we want by EXPLICITLY transforming the SQL to reproduce the transformation that Oracle had done IMPLICITLY and then messed up !

Final Choice
Of the three options – the dirty hack is definitely a no-no in production; the “double the predicate” trock is undesirable because it may depend in some unexpected way on a particular optimizer bug or on some statistical detail that could change; so I’d choose the hinted path with the (nominally) redundant columns.

One final point about this solution, we actually needed to include only the sys_name in the order by clause to use the descending range scan and early stop – which is basically another indication that it’s something about the function-based column that is breaking the normal code path.

Reference Script: nls_sort_anomaly.sql

November 5, 2015

Column Groups

Filed under: CBO,extended stats,Oracle,Statistics — Jonathan Lewis @ 6:48 am BST Nov 5,2015

I think the “column group” variant of extended stats can be amazingly useful in helping the optimizer to generate good execution plans because of the way they supply better details about cardinality; unfortunately we’ve already seen a few cases (don’t forget to check the updates and comments) where the feature is disabled, and another example of this appeared on OTN very recently.

Modifying the example from OTN to make a more convincing demonstration of the issue, here’s some SQL to prepare a demonstration:


create table t1 ( col1 number, col2 number, col3 date);

insert  into t1
select 1 ,1 ,to_date('03-Nov-2015') from dual
union all
select 1, 2, to_date('03-Nov-2015')  from dual
union all
select 1, 1, to_date('03-Nov-2015')  from dual
union all
select 2, 2, to_date('03-Nov-2015')  from dual
union all   
select 1 ,1 ,null  from dual
union all  
select 1, 1, null  from dual
union all  
select 1, 1, null  from dual
union all
select 1 ,1 ,to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
union all  
select 1, 1, to_date('04-Nov-2015')  from dual
;

begin
        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname         => user,
                tabname         => 'T1',
                method_opt      => 'for columns (col1, col2, col3)'
        );
end;
/

I’ve collected stats in a slightly unusual fashion because I want to make it clear that I’ve got “ordinary” stats on the table, with a histogram on the column group (col1, col2, col3). You’ll notice that this combination is a bit special – of the 10 rows in the table there are three with the values (1,1,null) and three with the values (1,1,’04-Nov-2015′), so some very clear skew to the data which results in Oracle gathering a frequency histogram on the table.

These two combinations are remarkably similar, so what happens when we execute a query to find them – since there are no indexes the plan will be a tablescan, but what will we see as the cardinality estimate ? Surely it will be the same for both combinations:


select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 = '04-Nov-2015'
;

select  count(*)
from    t1
where
        col1 = 1
and     col2 = 1
and     col3 is null

Brief pause for thought …

and here are the execution plans, including predicate section – in the same order (from 11.2.0.4 and 12.1.0.2):


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |    36 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL1"=1 AND "COL2"=1 AND "COL3"=TO_DATE(' 2015-11-04
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     1 |    12 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("COL3" IS NULL AND "COL1"=1 AND "COL2"=1)

The predictions are different – the optimizer has used the histogram on the column group for the query with “col3 = to_date()”, but not for the query with “col3 is null”. That’s a bit of a shame really because there are bound to be cases where some queries would benefit enormously from having a column group used even when some of its columns are subject to “is null” tests.

Analysis

The demonstration above isn’t sufficient to prove the point, of course; it merely shows an example of a suspiciously bad estimate. Here are a few supporting details – first we show that both the NULL and the ’04-Nov-2015′ combinations do appear in the histogram. We do this by checking the column stats, the histogram stats, and the values that would be produced by the hashing function for the critical combinations:


set null "n/a"

select distinct
        col3,
        mod(sys_op_combined_hash(col1, col2, col3), 9999999999)
from    t1
where
        col3 is null
or      col3 = to_date('04-Nov-2015')
order by
        2
;

column endpoint_actual_value format a40
column column_name           format a32
column num_buckets           heading "Buckets"

select
        column_name,
        num_nulls, num_distinct, density,
        histogram, num_buckets
from
        user_tab_cols
where
        table_name = 'T1'

break on column_name skip 1

select
        column_name,
        endpoint_number, endpoint_value,
        endpoint_actual_value -- , endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name not like 'COL%'
order by
        table_name, column_name, endpoint_number
;

(For an explanation of the sys_op_combined_hash() function see this URL).

Here’s the output from the three queries:


COL3      MOD(SYS_OP_COMBINED_HASH(COL1,COL2,COL3),9999999999)
--------- ----------------------------------------------------
04-NOV-15                                           5347969765
n/a                                                 9928298503

COLUMN_NAME                       NUM_NULLS NUM_DISTINCT    DENSITY HISTOGRAM          Buckets
-------------------------------- ---------- ------------ ---------- --------------- ----------
COL1                                      0            2         .5 NONE                     1
COL2                                      0            2         .5 NONE                     1
COL3                                      3            2         .5 NONE                     1
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE            0            5        .05 FREQUENCY                5


COLUMN_NAME                      ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE
-------------------------------- --------------- -------------- ----------------------------------------
SYS_STU2IZIKAO#T0YCS1GYYTTOGYE                 1      465354344
                                               4     5347969765
                                               6     6892803587
                                               7     9853220028
                                              10     9928298503

As you can see, there’s a histogram only on the combination and Oracle has found 5 distinct values for the combination. At endpoint 4 you can see the combination that includes 4th Nov 2015 (with the interval 1 – 4 indicating a frequency of 3 rows) and at endpoint 10 you can see the combination that includes the null (again with an interval indicating 3 rows). The stats are perfect to get the job done, and yet the optimizer doesn’t seem to use them.

If we examine the optimizer trace file (event 10053) we can see concrete proof that this is the case when we examine the “Single Table Access Path” sections for the two queries – here’s a very short extract from each trace file, the first for the query with “col3 = to_date()”, the second for “col3 is null”:


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.3000


ColGroup (#1, VC) SYS_STU2IZIKAO#T0YCS1GYYTTOGYE
  Col#: 1 2 3    CorStregth: 1.60
ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

Apparently “col3 is null” is not a predicate!

The column group can be used only if you have equality predicates on all the columns. This is a little sad – the only time that the sys_op_combined_hash() will return a null is (I think) when all its input are null, so there is one very special case for null handling with column groups – and even then the num_nulls for the column group would tell the optimizer what it needed to know. As it is, we have exactly the information we need to get a good cardinality estimate for the second query, but the optimizer is not going to use it.

Summary

If you create a column group to help the optimizer with cardinality calculations it will not be used for queries where any of the underlying columns is used in an “is null” predicate. This is coded into the optimizer, it doesn’t appear to be an accident.

Reference script: column_group_null.sql

September 30, 2015

Estimate_percent

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 5:10 pm BST Sep 30,2015

Here’s a live one from OTN – here are a couple of extracts from the problem statement:

We’re experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night.
It’s resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between).
It started occurring about 2 months ago and now it’s becoming more prevalent where the bad query plan is coming up more often.
I noticed that the query plans vary for a simple query.
We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( ‘userenv’, ‘current_schema’ ), estimate_percent => 1);)

The query and two execution plans look like this:

select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0;

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |     1 |    12 |   155   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE              |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 |  1148 | 13776 |   155   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | CS_BUCKET_MEMBER_N1 |  1272 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0)
   3 - access("ORDER_ID"=0)


------------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |    12 | 11215   (2)| 00:01:41 |
|   1 |  SORT AGGREGATE    |                     |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 |  1522K|    17M| 11215   (2)| 00:01:41 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')

There are a couple of bits of information that would be useful – such as the database version, the number of rows in the table, the number of distinct values in each column, and whether any of the columns have histograms – but there are a couple of reasonable guesses that we might make about the problem. Notice particularly that the number of rows estimated from the index ranges scan is 1272 and only a small volume is then eliminated by the table filter predicates on sec_id and bucket_type. This suggests that the optimizer has information that tells it that most of the rows in the table have sec_id > 0 and bucket_type = ‘P’, and you might note that that suggests that there’s a histogram on bucket_type.

Rather than stating the most obvious guesses about the problem, though, I’ll start by creating a data set and emulating the problem, starting from an empty schema on 11.2.0.4:

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id 
        from dual
        connect by 
                level <= 1e4
)
select
        rownum                  sec_id,
        case
                when mod(rownum,1000) = 0
                        then 'X'
                        else 'P'
        end                     bucket_type,
        case
                when rownum < 1e6 - 50000 
                        then mod(rownum-1,1e5)
                        else 1000
        end                     order_id,
        lpad(rownum,10,'0')     id_vc,
        rpad('x',100,'x')       padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6

create index t1_i1 on t1(order_id) nologging; 

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

The column names in the table match those needed by the query, and the bucket_p column has a very skewed distribution that will eliminate very little data; the sec_id column is also not going to eliminate data, but it’s very evenly distributed with no large gaps so not a good candidate for a histogram in any case. The order_id has 50,000 rows out of 1,000,000 (5%) set of a single value, and most of those special rows are at the end of the table – it’s a pretty good candidate for a histogram (if Oracle spots it, and if we actually write queries to access that data).

I’ve run a query that references all three columns so that the default method_opt of “for all columns size auto” will apply to them when I gather stats. So here’s the code that gathers stats and checks the resulting execution plans, first for “auto_sample_size” then for the 1% used by the OP:


set autotrace traceonly explain

begin
        dbms_stats.gather_schema_stats(
/*              estimate_percent => 1, */
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

begin
        dbms_stats.gather_schema_stats(
                estimate_percent => 1,
                ownname          => user
        );
end;
/

select count(*) from t1 where order_id = 1000 and bucket_type = 'P' and sec_id > 1000;

set autotrace off

And here are the two plans – in the same order:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |  2333   (4)| 00:00:12 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   | 51063 |   598K|  2333   (4)| 00:00:12 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("ORDER_ID"=1000 AND "SEC_ID">1000 AND "BUCKET_TYPE"='P')


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    12 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    12 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| T1    |    20 |   240 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |    20 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):  
---------------------------------------------------
   2 - filter("SEC_ID">1000 AND "BUCKET_TYPE"='P')
   3 - access("ORDER_ID"=1000)


[Update: Following on from a question in the comments, I’ve expanded this section, and wandered a little off-topic]

I don’t know why, but with a 1% sample (which really did sample 10,000 rows) the optimizer didn’t spot the need for a histogram on order_id, but with the auto_sample_size (which sampled 5,500 – yes, half as many rows) the optimizer spotted the need for the histogram. Checking the trace files the only difference visible in the sample SQL was the presence in the 1% sample of the id_vc and padding columns which were not present in the auto_sample_size trace.

According to the manuals when the method_opt is “for all columns size auto”, then

“Oracle determines the columns on which to collect histograms based on data distribution and the workload of the columns.”

There is nothing in the manuals to suggest that there is a deliberate link between the auto_sample_size and estimate_percent, and there is room for ambiguity in how we interpret this bit of text in the manual so the difference in the SQL used and the effects thereof requires (a) some hand-waving, and/or (b) lots more experimentation.  At the moment I’m prepared to go for hand-waving:

Hypothesis 1: auto_sample_size did not sample the id_vc and padding columns because the (100%) sample taken had given Oracle enough information to decide that the data distribution of those columns was not skewed enough to merit further consideration; but it sampled the three columns that had been used in a fashion that might be helped by a histogram. This sampling spotted the benefit of a histogram on order_id and bucket_type but decided that sec_id didn’t need a histogram

Hypothesis 2: the 1% sample got pretty close to the same results in its estimates of number of distinct values for id_vc and padding as the (100%) auto_sample_size, but still decided to do a sampled test for the data distribution (the manual seems to suggest that the histograms will only be considered if there has been some use of the columns in predicates, but doesn’t explicitly preclude the possibility of creating the histogram on the basis of just the data distribution). After doing the 1% sample to analyze the data for suitability of a histogram the result suggested that only the histogram on bucket_type would be beneficial.  (In fact, after the first sample Oracle took a second 1% histogram sample on just the order_id before deciding that it a histogram on order_id wasn’t needed.)

Bottom line on this: I don’t know if the auto_sample_size “accidentally” eliminated a couple of columns from histogram sampling and if a larger fixed sample size (say 50%, or even 100%) might result in Oracle eliminating a few columns from the histogram; or maybe the code path for histogram samples with auto_sample_size in place is actually a different code path. The only thing I can say is that the two sets of events that appeared from my demonstration don’t seem to be entirely self-consistent, but it would probably take most of a day doing experiments to narrow down the variation in behaviour to a few “best guess” ideas of what’s going on behind the scenes – though unwrapping the code might lead to a more accurate answer more quickly.

Moral

Histograms are tricky things – and you can only make things worse in 11g by NOT using the auto_sample_size.

Footnote

Based on previous experience – my “obvious” guess about the OP’s data was that there was a special-case value for order_id, that the rows for that value were fairly well clustered, probably towards the end of the table, and constituted a small percentage of the table, and that the rest of the data reported “a few” rows per value. That’s why I built the model you see above.

September 4, 2015

Histogram Tip

Filed under: CBO,Histograms,Oracle,Statistics — Jonathan Lewis @ 8:32 am BST Sep 4,2015

I’ve just responded to the call for items for the “IOUG Quick Tips” booklet for 2015 – so it’s probably about time to post the quick tip that I put into the 2014 issue. It’s probably nothing new to most readers of the blog, but sometimes an old thing presented in a new way offers fresh insights or better comprehension.

Histogram Tips

A histogram, created in the right way, at the right time, and supported by the correct client-side code, can be a huge benefit to the optimizer; but if you don’t create and use them wisely they can easily become a source of inconsistent performance, and the automatic statistics gathering can introduce an undesirable overhead during the overnight batch. This note explains how you can create histograms very cheaply on the few columns where they are most likely to have a beneficial effect.

set_column_stats

The dbms_stats package have many procedures and functions built into it that allow us to see (get) and manipulate (set) the stored statistics; in particular it holds two functions get_column_stats() and set_column_stats(), and we can use these procedures to create a histogram very cheaply whenever we want at very low cost. Here’s an example that could be modified to suit a character column in a table where you’ve previously collected some stats. It uses a copy of all_objects, limited to exactly 10,000 rows.


create table t1 as
select
	*
from
	all_objects
where
	rownum <= 10000 ; begin dbms_stats.gather_table_stats( ownname => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'
	);
end;
/


declare

	m_distcnt		number;		-- num_distinct
	m_density		number;		-- density
	m_nullcnt		number;		-- num_nulls
	m_avgclen		number;		-- avg_col_len

	srec			dbms_stats.statrec;
	c_array		dbms_stats.chararray;

begin

	dbms_stats.get_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec			=> srec,
		avgclen		=> m_avgclen
	); 

	c_array		:= dbms_stats.chararray('A', 'B', 'C', 'X', 'Y');
	srec.bkvals	:= dbms_stats.numarray (  2,   2,   2, 500, 494);
--	srec.rpcnts	:= dbms_stats.numarray (  0,   0,   0,   0,   0);
	srec.epc := 5;

	dbms_stats.prepare_column_values(srec, c_array);

	m_distcnt	:= 5;
	m_density	:= 1/(5000);

	dbms_stats.set_column_stats(
		ownname		=> user,
		tabname		=> 't1',
		colname		=> 'object_type', 
		distcnt		=> m_distcnt,
		density		=> m_density,
		nullcnt		=> m_nullcnt,
		srec			=> srec,
		avgclen		=> m_avgclen
	); 

end;
/

Key features of the code: as you can see, the two calls have identical parameters which identify the table and column name (there is an optional parameter for a (sub) partition name), and most of the basic statistics about the column. The histogram (or low and high values) are accessed through a special record type, and we can populate that record type by supplying an ordered list of values, a matching list of frequencies, and a count of how many values we have supplied.

Since my code is fixing stats on a varchar2() column I’ve declared an array of type dbms_stats.chararray to hold the list of values I want to see in a frequency histogram – there are other array types for dates, raw, number, etc. I’ve then used the structure of the stats record I had declared to hold the list of frequencies (srec.bkvals – possibly a short name for “bucket values”) and the count (srec.epc“end-point count”).

The call to dbms_stats.prepare_column_stats() takes my two arrays and massages them into the correct format for storage as a histogram that I can then write into the data dictionary with the closing call to dbms_stats.set_column_stats(). Before making that call, though, I’ve also set the “num_distinct” variable to tell the optimizer that there are 5 distinct values for the column (it makes sense, but isn’t absolutely necessary, for the num_distinct to match the number of values in the array), and set the “density” to a value that I would like the optimizer to use in it calculations if someone asks for a value that is not in my list.

I’ve included (but commented out) a line that’s relevant to the new histogram mechanisms in 12c –the srec.rpcnts (“repeat counts”) array is used in “hybrid histograms”. It’s not relevant to my example where I’m trying to create a pure frequency histogram, but if I don’t set the array I get an Oracle error: “ORA-06532: Subscript outside of limit”.

Results

There’s one important point to the method that isn’t instantly visible in the code – I created my table with 10,000 rows and there will be no nulls for object_type; but if you sum the array of frequencies it comes to exactly 1,000. This apparent contradiction is not a problem – the optimizer will compare the histogram figures to the total number of non-null entries it has recorded (in other words user_tables.num_rowsuser_tab_columns.num_nulls), and scale up the histogram accordingly. This means that a query for ‘A’ should return an estimated row count of 20 (rather than 2), ‘X’ should return 5,000 (rather than 500) and ‘D’ should return 2 (10,000 rows * 1/5000, the selectivity I had set for non-existent values).

With a little editing to save space, here’s a cut-n-paste from an SQL*Plus session running against 12c:


SQL> set autotrace traceonly explain
SQL> select * from t1 where object_type = 'A';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    20 |  2100 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    20 |  2100 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='A')

SQL> select * from t1 where object_type = 'X';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  5000 |   512K|    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  5000 |   512K|    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='X')

SQL> select * from t1 where object_type = 'D';

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |   105 |    22   (5)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |   105 |    22   (5)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("OBJECT_TYPE"='D')

Conclusion

It is very easy to create truly representative histograms (if you know your data) and the resources required to do so are minimal. If you see instability due to bad luck, or bad timing, gathering a histogram then you benefit enormously from writing code to construct histograms.

Footnote

In 12c the introduction of the “approximate NDV” strategy to collecting frequency histograms, and the introduction of the “Top-frequency” histogram has made automatic gathering of histograms on columns with a relatively small number of distinct values much faster and safer – but timing may still be an issue, and the resources needed to gather a safe hybrid histogram may still justify a hand-coded approach.

 

Next Page »

Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 6,668 other followers