Oracle Scratchpad

October 20, 2016

Conjuctive Normal Form

Filed under: 12c,Exadata,in-memory,Oracle — Jonathan Lewis @ 1:00 pm GMT Oct 20,2016

I recently tweeted about a comment I’d picked up at the Trivadis performance days regarding tablescans and performance.

“If you can write your SQL in conjunctive normal form it can help the optimizer to offload more predicates”

Inevitably someone asked me if I had an example to demonstrate this – I didn’t, and still don’t really, but here’s an interesting demo based on an example from the Oracle In-Memory blog showing how the optimizer will rearrange your filter predicates before passing them to the tablescan code for evaluation against an inmemory table.


rem
rem     Script:         in_memory_conjunctive.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2016
rem     Purpose:
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        trunc(dbms_random.value(1,501)) qty,
        mod(rownum,200) + 1             part_no,
        lpad(rownum,10,'0')             v1,
        lpad('x',50,'x')                padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e7
;
prompt  ==========
prompt  Base query
prompt  ==========

select
        count(v1)
from
        t1
where
        (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  ===============
prompt  predicate added
prompt  ===============

select
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;
prompt  =================
prompt  Ordered predicate
prompt  =================

select  /*+ ordered_predicates */
        count(v1)
from
        t1
where
        (qty > 495 or qty < 3) and (qty > 495 or (qty < 3 and part_no = 50))
;

The 2nd and 3rd queries add a predicate to the first query – which, unfortunately, changes the estimated cardinality even though it has no effect on the result. This predicate is one that would be added by the inmemory code path if the table were declared to be inmemory. I’ve got two versions of the query, one with the (deprecated) ordered_predicates hint because in my initial tests the optimizer swapped the order of the predicates and I wanted to see if the ordering was at all critical.

Here’s the plan for the base query – first before declaring the table inmemory, then after declaring the table inmemory:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14739 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |   100K|  1862K| 14739   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))
------------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |       |       |  1974 (100)|          |
|   1 |  SORT AGGREGATE             |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| T1   |   100K|  1862K|  1974  (44)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - inmemory((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))) filter(("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)))

And here, after putting the table back to no inmemory are the plans for the second and third queries; note, particularly the different order of the predicates in the predicate section: the predicate order matches the inmemory predicate order only if I use the ordered_predicates hint:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR ("QTY"<3 AND "PART_NO"=50)) AND ("QTY">495
              OR "QTY"<3)))
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       | 14741 (100)|          |
|   1 |  SORT AGGREGATE    |      |     1 |    19 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |  1404 | 26676 | 14741   (6)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((("QTY">495 OR "QTY"<3) AND ("QTY">495 OR ("QTY"<3 AND
              "PART_NO"=50))))

Finally the run times – after running the queries a few times each to check for consistency:

  • Base query: 0.82 seconds
  • Query with extra predicate: 0.86 seconds
  • Query with extra predicate and forced order of predicate evaluation: 0.71 seconds

The query with the predicate arrangement matching the inmemory rewrite actually ran 13% faster than the original. Unfortunatly the rewrite without the ordered_predicates hint ran slower – which is a bit of a shame but understandable – the first predicate is the more complex, and then the code has to run a completely redundant second predicate; I was a little surprised at how much slower it was, but the table is 10M rows and we’re only looking at sub-second times anyway.

My table was fully cached and just under 112,000 blocks, so not very large, and this was running a serial query on a basic Oracle instance. Nevetheless there is a difference in execution time that is more than just “random noise” – If this is an indication of how a little unsightly tweaking of SQL for small data sets can make a difference, you can imagine that there might be a worthwhile benefit to considering ways of tweaking your predicates that make a significant difference to execution time if the extra predicates end up being pushed down to storage on an Exadata machine.

Footnote:

Another “not quite” example I happen to have written about a few months ago is a case where rewriting “not exists() OR not exists() OR not exists()” as “not (exists() AND exists() AND exists())” allowed Oracle to rewrite three subqueries as a single subquery with three-table join.

 

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.