Oracle Scratchpad

February 24, 2010

CBO Surprise 2

Filed under: CBO,Execution plans,Partitioning — Jonathan Lewis @ 7:45 pm BST Feb 24,2010

In an earlier article I gave a description of how splitting a single date ranges into a pair of date ranges with an OR would change the arithmetic and so run the risk (or introduce the benefit) of changing the execution plan.

At the time I made a couple of comments about other details that could be demonstrated by the same query – but postponed saying anything about them. This follow-up article addresses the omission.

Transitivity Bug

The first point was that the original query to split the date range was wrong because it assumed (without telling the database) that the date column contained date-only values. The query, therefore, had to be rewritten to allow for the possibility of values with a time-component. But if you have a date column that does contain date-only information why not add a check constraint to the table to tell the database that that’s the case.

As a reminder, this is what the execution plans looked like without the check constraint:

select
	small_vc
from	t1
where
	bit_col = 5
and	t1.date_col between to_date('01-feb-2010','dd-mon-yyyy')
	            and     to_date('25-feb-2010','dd-mon-yyyy')
;

----------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost  |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1312 | 28864 |   435 |
|*  1 |  TABLE ACCESS FULL| T1   |  1312 | 28864 |   435 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("BIT_COL"=5 AND "T1"."DATE_COL">=TO_DATE('2010-02-01
              00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
              "T1"."DATE_COL"<=TO_DATE('2010-02-25 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

select
	small_vc
from	t1
where
	bit_col = 5
and	(
	t1.date_col between to_date('01-feb-2010','dd-mon-yyyy')
	            and     to_date('12-feb-2010','dd-mon-yyyy')
	or
	t1.date_col between to_date('13-feb-2010','dd-mon-yyyy')
	            and     to_date('25-feb-2010','dd-mon-yyyy')
	)
;

-------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |  1269 | 27918 |       |   434 |
|   1 |  TABLE ACCESS BY INDEX ROWID      | T1      |  1269 | 27918 |       |   434 |
|   2 |   BITMAP CONVERSION TO ROWIDS     |         |       |       |       |       |
|   3 |    BITMAP AND                     |         |       |       |       |       |
|   4 |     BITMAP CONVERSION FROM ROWIDS |         |       |       |       |       |
|*  5 |      INDEX RANGE SCAN             | T1_BIT  |       |       |       |    10 |
|   6 |     BITMAP OR                     |         |       |       |       |       |
|   7 |      BITMAP CONVERSION FROM ROWIDS|         |       |       |       |       |
|   8 |       SORT ORDER BY               |         |       |       |   424K|       |
|*  9 |        INDEX RANGE SCAN           | T1_DATE |       |       |       |    36 |
|  10 |      BITMAP CONVERSION FROM ROWIDS|         |       |       |       |       |
|  11 |       SORT ORDER BY               |         |       |       |   456K|       |
|* 12 |        INDEX RANGE SCAN           | T1_DATE |       |       |       |    39 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("BIT_COL"=5)
   9 - access("T1"."DATE_COL">=TO_DATE('2010-02-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "T1"."DATE_COL"<=TO_DATE('2010-02-12 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))
  12 - access("T1"."DATE_COL">=TO_DATE('2010-02-13 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "T1"."DATE_COL"<=TO_DATE('2010-02-25 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

So let’s add the check constraint and see what happens:

alter table t1 add constraint t1_ck_date check (date_col = trunc(date_col));

Here’s the execution plan of the “single range” version of the query:


-----------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |   127 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| T1      |     3 |    66 |   127 |
|*  2 |   INDEX RANGE SCAN          | T1_DATE |    66 |       |    71 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("BIT_COL"=5)
   2 - access("T1"."DATE_COL">=TO_DATE('2010-02-01 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND "T1"."DATE_COL"<=TO_DATE('2010-02-25
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
       filter(TRUNC(INTERNAL_FUNCTION("DATE_COL"))>=TO_DATE('2010-02-01
              00:00:00', 'yyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("DATE_COL"))<=TO_DATE('2010-02-25 00:00:00',
              'yyyy-mm-dd hh24:mi:ss'))

Note how the cardinality estimates have changed dramatically with a consequent change in execution plan. Note, in particular, the extra predicate using an “internal_function()” that has appeared as the filter predicate on the index. This is the effect of transitive closure being applied to our constraint to generate a predicate.

The optimizer generally applies a selectivity of 5% when it sees “function of column” with a range operator – and we have two such predicates so the optimizer has multiplied two lots of 5% for a net selectivity of 0.25% (in the absence of the constraint the index selectivity would have been 26242).

If the extreme change to an index range scan with an appalling estimate of cardinality has appeared with the single date range, what’s going to happen to the split date range? Check the following plan carefully:

-------------------------------------------------------------------------------------
| Id  | Operation                         | Name    | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |         |  1269 | 27918 |       |   434 |
|   1 |  TABLE ACCESS BY INDEX ROWID      | T1      |  1269 | 27918 |       |   434 |
|   2 |   BITMAP CONVERSION TO ROWIDS     |         |       |       |       |       |
|   3 |    BITMAP AND                     |         |       |       |       |       |
|   4 |     BITMAP CONVERSION FROM ROWIDS |         |       |       |       |       |
|*  5 |      INDEX RANGE SCAN             | T1_BIT  |       |       |       |    10 |
|   6 |     BITMAP OR                     |         |       |       |       |       |
|   7 |      BITMAP CONVERSION FROM ROWIDS|         |       |       |       |       |
|   8 |       SORT ORDER BY               |         |       |       |   424K|       |
|*  9 |        INDEX RANGE SCAN           | T1_DATE |       |       |       |    36 |
|  10 |      BITMAP CONVERSION FROM ROWIDS|         |       |       |       |       |
|  11 |       SORT ORDER BY               |         |       |       |   456K|       |
|* 12 |        INDEX RANGE SCAN           | T1_DATE |       |       |       |    39 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("BIT_COL"=5)
   9 - access("T1"."DATE_COL">=TO_DATE('2010-02-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "T1"."DATE_COL"<=TO_DATE('2010-02-12 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))
  12 - access("T1"."DATE_COL">=TO_DATE('2010-02-13 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "T1"."DATE_COL"<=TO_DATE('2010-02-25 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))

There’s no change – the predicate due to transitive closure simply doesn’t appear. Strangely, though, the 10053 trace file does show the typical check constraint consideration taking place, but it doesn’t result in any modification to the original predicates.

As David Aldridge pointed out in a comment on the previous article, the errors introduced by double-counting the generated predicates should have been fixed in 10.2.0.4 – and certainly the first query behaved correctly when I tested it (with a slight change in data definition) against 11.1.0.6.

Partition Sophistication

The other point I made in the previous article related to how carefully you should supply range-based predicates when using partitioned tables; “between” and range partitioning don’t really go together perfectly because a range partition is defined by “greater than or equal to lower bound and strictly less than upper bound”, whereas “between” equates to “greater than or equal to lower bound and less than or equal to upper bound”.

Let’s take just our single range query again, but create the table with range partitioning by day – I’ve taken a slight shortcut in my code and used daily partitions for February and one partition before and after February.

We’ll start with the query as given (“date_col between 1st Feb and 25th Feb”) –  and this is the execution plan we get:

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |  1312 | 28864 |   115 |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |  1312 | 28864 |   115 |     3 |    27 |
|*  2 |   TABLE ACCESS FULL      | T1   |  1312 | 28864 |   115 |     3 |    27 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("BIT_COL"=5 AND "T1"."DATE_COL"<=TO_DATE('2010-02-25
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

There are two things to note – we have an explicit partition start and stop so we have limited the query quite strictly, and the date comparison with 1st Feb has disappeared from the filter predicates because Oracle “knows” that its selection of partitions guarantees that that test will be true for every row. On the other hand partition 27 is the one that include 25th Feb and therefore goes up to “just before” 26th Feb – so Oracle has to test every row of every partition because there may be some unwanted data in the last partition.

Since we know that the only value allowed in that last partition is date-only, we could rewrite the last bit of the predicate as “date_col < 26th Feb”, at which point the plan changes to the following:

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |  1313 | 28886 |   115 |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |  1313 | 28886 |   115 |     3 |    27 |
|*  2 |   TABLE ACCESS FULL      | T1   |  1313 | 28886 |   115 |     3 |    27 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("BIT_COL"=5)

We are still accessing partitions 3 to 27, but the optimizer know that every row in that set of partitions MUST satisfy both date conditions – so it doesn’t test them at all. In cases like this the CPU saving can be quite significant.

Interestingly, if you try to enforce the correctness of the query by adding the constraint “date_col = trunc(date_col)” in 10.2.0.3, the optimizer doesn’t play nicely. Apart from the minor change in the 25th/26th predicate, the plan is the same in both cases:

----------------------------------------------------------------------------------------------
| Id  | Operation                          | Name    | Rows  | Bytes | Cost  | Pstart| Pstop |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |         |     3 |    66 |   101 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |         |     3 |    66 |   101 |     3 |    27 |
|*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID| T1      |     3 |    66 |   101 |     3 |    27 |
|*  3 |    INDEX RANGE SCAN                | T1_DATE |    66 |       |    74 |     3 |    27 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("BIT_COL"=5)
   3 - access("T1"."DATE_COL">=TO_DATE('2010-02-01 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss') AND "T1"."DATE_COL"<TO_DATE('2010-02-26 00:00:00', 'yyyy-mm-dd
              hh24:mi:ss'))
       filter(TRUNC(INTERNAL_FUNCTION("DATE_COL"))>=TO_DATE('2010-02-01 00:00:00',
              'yyyy-mm-dd hh24:mi:ss') AND TRUNC(INTERNAL_FUNCTION("DATE_COL"))<TO_DATE('2010-02-26
              00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

The date predicates have reappeared, as has the predicate generated by transitive closure. We also see that the cardinality has, as consequence of transitive closure, dropped by a factor of 400 (that 5% of 5% for two ranges) leading to Oracle using an index to pick up every single row in each partition.

In this case, even 11.1.0.6 lets us down – it gets the arithmetic right and does the table (partition) scan that we expect, but it still turns the constraint into a predicate so that our plan looks like this for the “good” use of dates:

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost  | Pstart| Pstop |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |  1313 | 28886 |   115 |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |  1313 | 28886 |   115 |     3 |    27 |
|*  2 |   TABLE ACCESS FULL      | T1   |  1313 | 28886 |   115 |     3 |    27 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("BIT_COL"=5 AND TRUNC(INTERNAL_FUNCTION("DATE_COL"))>=TO_DA
              TE(' 2010-02-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              TRUNC(INTERNAL_FUNCTION("DATE_COL"))<TO_DATE(' 2010-02-26 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Such is life in the optimizer fast-lane: you try to do things right, and (sometimes) the optimizer lets you down.

3 Comments »

  1. Jonathan,

    Do you recommend creating the column constraint date_col = trunc(date_col) for all date columns if you planning to use them in your query as a predicate? Or we need to use the column constraint date_col = trunc(date_col) only if we see performance issues due to this issue?

    Thanks
    Aswath Rao

    Comment by Aswath RAo — February 27, 2010 @ 2:17 am BST Feb 27,2010 | Reply

    • Aswath Rao

      My general directive is that check constraints protect your data from programmer error. Anything which is true about your data should be embedded in the database as a constraint if possible. This makes the check constraint the default choice until you discover that it causes optimisation problems that you can’t work around. (Or can only find workarounds that produce problems that are worse than the problem you’re trying to solve).

      Comment by Jonathan Lewis — February 28, 2010 @ 1:02 pm BST Feb 28,2010 | Reply

  2. Thanks Jonathan.

    Thanks
    Aswath Rao

    Comment by Aswath Rao — February 28, 2010 @ 4:52 pm BST Feb 28,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,088 other followers