Oracle Scratchpad

December 21, 2006

Constraints

Filed under: CBO,Execution plans,Infrastructure — Jonathan Lewis @ 8:17 am BST Dec 21,2006

A little while ago I discussed one side-effect of transitive closure and predicate generation. Coincidentally, David Aldridge and Jeff Hunter have both come up with further examples of predicate generation - in their examples generated from check constraints – that can cause problems.

Constraints are a very good thing – they protect your database, and make sure that incorrect data cannot appear by accident, or through the back door; they can also give the optimizer more options for finding an efficient access path.

But – and it’s an important but – the optimizer and run-time engines are not yet handling generated predicates correctly.

Both Dave and Jeff have examples of a check constraint similar to : trunc(date_col) = date_col, when this is applied as a generated predicate, it introduces a 1% scale factor to the selectivity.  (As a general rule, predicates like function(col) = const are given a 1% selectitivity, although in some special cases for built-in functions, the underlying column selectivity is used instead). The change in selectivity can, of course, make a big difference to the execution path.

But in Jeff’s case, his query returns 355,000 rows and applies the generated predicate (which must be true, because it started life as a check constraint) to every single row. As a consequence the query with the check constraint in place take 3.48 seconds; the query without the check constraint takes 0.53 seconds.

You can’t really do anything to work around this (at present) – but if you remenber to check the filter_predicates and access_predicates from the execution plan (using dbms_xplan is the easy option for this) you may get an early clue about why your queries are doing something a little odd, or taking some extra (unexpected) time.

12 Comments »

  1. The part of this that grates my cheese is that the optimizer is sophisticated enough to infer the predicate from the constraint but it is not smart enough to realise that it is redundant in regards to the table data.

    I may not have thought this through enough, but these generated predicates really ought to be treated very differently by the optimizer. They are a different class of predicate entirely, neither access nor filter, as they are redundant in both respects. Maybe they ought to be described as “informational” predicates — they should be treated as a form of statistical information, IMHO,and no more.

    One major benefit comes from comparison with user-supplied predicates, of course, to detect cases where no rows are likely to be returned or where the user-supplied predicate is completely redundant.

    The treatment of constraints as it stands today is rather disappointing — there is no known hidden parameter to prevent predicates from being generated based on constraints? (If so, it would probably prevent partition pruning I expect).

    Comment by David Aldridge — December 21, 2006 @ 7:32 pm BST Dec 21,2006 | Reply

  2. David, I think they’re working on it – the whole area of generated predicates is full of inconsistencies. It’s the sort of thing that’s likely to change silently in point releases.

    Comment by Jonathan Lewis — December 21, 2006 @ 8:17 pm BST Dec 21,2006 | Reply

  3. And who came up with “Oh, since this is a check constraint, it is 100 times better than not having one.”?

    Comment by Jeff Hunter — December 21, 2006 @ 8:29 pm BST Dec 21,2006 | Reply

  4. Jeff, Guilty as charged! So now I’ll make sure that every constraint carries a government health warning. But they are still a good idea.

    Comment by Jonathan Lewis — December 21, 2006 @ 9:41 pm BST Dec 21,2006 | Reply

  5. “the generated predicate….which must be true, because it started life as a check constraint”

    If you want to be precise, shouldn’t we add that the constraint is both VALIDATED and NON-DEFERRED. A check constraint that isn’t VALIDATED or is DEFERRED may not be true for all rows. While the 1% selectivity is abitrary, and most check constraints will be 100% true, a more complete resolution may require a statistic on the check constraint.

    Comment by Gary — December 21, 2006 @ 10:05 pm BST Dec 21,2006 | Reply

  6. Gary, the joys of blogging – someone, somewhere remembers the boundary conditions and special cases. Good point – if the constraint is either novalidate or deferrable then it won’t be used for predicate generation.

    In the short term, this might let you have your cake and eat it (i.e. have the constraint and not wreck the cardinality) – on the other hand there are probably some cases where this would eliminate the options for extra execution paths.

    Comment by Jonathan Lewis — December 21, 2006 @ 10:22 pm BST Dec 21,2006 | Reply

  7. Oh marvellous. And I just persuaded my boss to throw in a load of new check constraints for things like “record_sequence_no > 0″ on the grounds that the database should hold all the information.

    This was recommended in a seminar by some bozo called Jonathan Lewis, as I recall. :)

    Comment by Dave Hemming — December 28, 2006 @ 9:31 am BST Dec 28,2006 | Reply

  8. Dave, Thanks for the referral :-(

    Looking on the bright side, a check constraint of the type “n1 > 0″ does not seem to generate an extra predicate. At least, it doesn’t until you also happen to check for things like “n1 = trunc(n1)” – at which point you may get two extra predicates, one of which is very cunning: e.g., from the supplied predicate n1 = 30, I got:

      “N1″=30 AND TRUNC(“N1″)=30 AND TRUNC(“N1″)>0

    In theory, of course, you need not have a constraint like “n1 = trunc(n1)”, you need only declare the column as number(N,0) to ensure it can store only integer values.

    On the other hand, if you try to insert a non-integral value with the constraint in place you get Oracle error ORA-02290: check constraint violated; whereas if you simply declare the column to be integral any non-integral values are silently rounded to integer as you insert them.

    (Strangely, if you have both the declaration and the constraint in place, the rounding occurs before the constraint is checked, so the constraint will never fail).

    Comment by Jonathan Lewis — December 28, 2006 @ 10:07 am BST Dec 28,2006 | Reply

  9. I thought I might share an interesting issue related to a transitive predicate generation bug that I recently ran across while supporting a client – the bug actually identified improperly configured check constraints throughout a schema once we upgraded the database from 9.2.0.7 to 10.2.0.2. Here’s a simplified example, executed using Oracle XE 10.2.0.1 to demonstrate the issue.

    — The NULL in the following check constraint declaration makes it useless.

    SQL> CREATE TABLE sbobrows.test_bad(
    2 id INT CONSTRAINT test_pk PRIMARY KEY,
    3 status INT,
    4 CONSTRAINT status_ck CHECK(STATUS IN (0, 1, NULL))
    6 );

    Table created.

    — I can insert a row with any number into STATUS

    SQL> INSERT INTO sbobrows.test_bad (id, status) VALUES(1,-1);
    1 row created.
    SQL> INSERT INTO sbobrows.test_bad (id, status) VALUES(2,-10000);
    1 row created.
    SQL> INSERT INTO sbobrows.test_bad (id, status) VALUES(3,10000);
    1 row created.
    SQL> COMMIT;
    Commit complete.

    But because of a transitive predicate generation bug in 10.2.0.x, which (I guesss) should recognize that the constraint is useless, queries that formerly worked with 9.2.0.7 now don’t return the rows that do not meet the check constraint condition. So, my client freaked out because the rows that shouldn’t have been there in the first place now did not appear when the application ran against the newly upgraded database. For example:

    SQL> SELECT * FROM sbobrows.test_bad WHERE status = -1;

    no rows selected

    SQL> SELECT * FROM sbobrows.test_bad WHERE status = 10000;

    no rows selected

    SQL> SELECT * FROM sbobrows.test_bad;

    ID STATUS
    ———- ———-
    1 -1
    2 -10000
    3 10000

    Of course, the solution is to fix, invalidate, disable, or remove the constraint declarations, and then the problem goes away. I thought the problem was interesting enough to add to this discussion, and I hope it helps someone who might run into the same problem.

    Comment by Steve Bobrowski — February 20, 2007 @ 2:12 pm BST Feb 20,2007 | Reply

  10. Steve, nice example of how things go wrong. NULLs cause so many problems. Combined with constraints, it’s not surprising that someone, somewhere, had to make a mistake.

    Critical point, often overlooked, easily forgotten, so worth mentioning here: values pass a check constraint provided the check does not return false. But that isn’t the same as returning true. In this case, the check was equivalent to:

        col = 0
      or col = 1
      or col = null

    which would return true for 0 and 1, but null for any other value – and since null is not false any value can be accepted.

    Comment by Jonathan Lewis — February 20, 2007 @ 7:09 pm BST Feb 20,2007 | Reply

  11. Apparently function-based indexes also have this issue:

    CREATE TABLE KAK_T1 (COL1) AS
    SELECT ‘S’ || LEVEL
    FROM DUAL CONNECT BY LEVEL < 1000
    /
    CREATE TABLE KAK_T2 (COL1) AS
    SELECT ‘S’ || LEVEL
    FROM DUAL CONNECT BY LEVEL < 100
    /
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (‘APPS’,’KAK_T1′); END;
    /
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (‘APPS’,’KAK_T2′); END;
    /
    alter session set tracefile_identifier = TEST5
    /
    ALTER SESSION SET EVENTS ‘10053 trace name context forever, level 12′
    /
    SELECT /* query1 */ * FROM KAK_T1 WHERE COL1 IN (
    SELECT COL1 FROM KAK_T2)
    /
    ALTER SESSION SET EVENTS ‘10053 trace name context off’
    /
    CREATE INDEX KAK_T1_f1 ON kak_T1
    (SUBSTR(“COL1″,1,7))
    /
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (‘APPS’,’KAK_T1′); END;
    /
    BEGIN DBMS_STATS.GATHER_TABLE_STATS (‘APPS’,’KAK_T2′); END;
    /
    alter session set tracefile_identifier = TEST6
    /
    ALTER SESSION SET EVENTS ‘10053 trace name context forever, level 12′
    /
    SELECT /* query2 */ * FROM KAK_T1 WHERE COL1 IN (
    SELECT COL1 FROM KAK_T2)
    /
    ALTER SESSION SET EVENTS ‘10053 trace name context off’
    /
    exit

    The plan of the first query (before function based index built):
    Current SQL statement for this session:
    SELECT /* query1 */ * FROM KAK_T1 WHERE COL1 IN (
    SELECT COL1 FROM KAK_T2)

    ============
    Plan Table
    ============
    —————————————-+———————————–+
    | Id | Operation | Name | Rows | Bytes | Cost | Time |
    —————————————-+———————————–+
    | 0 | SELECT STATEMENT | | | | 5 | |
    | 1 | HASH JOIN RIGHT SEMI | | 99 | 693 | 5 | 00:00:01 |
    | 2 | TABLE ACCESS FULL | KAK_T2 | 99 | 297 | 2 | 00:00:01 |
    | 3 | TABLE ACCESS FULL | KAK_T1 | 999 | 3996 | 2 | 00:00:01 |
    —————————————-+———————————–+
    Predicate Information:
    ———————-
    1 – access(“COL1″=”COL1″)

    …and after the function based index:

    Current SQL statement for this session:
    SELECT /* query2 */ * FROM KAK_T1 WHERE COL1 IN (
    SELECT COL1 FROM KAK_T2)

    ============
    Plan Table
    ============
    —————————————-+———————————–+
    | Id | Operation | Name | Rows | Bytes | Cost | Time |
    —————————————-+———————————–+
    | 0 | SELECT STATEMENT | | | | 5 | |
    | 1 | HASH JOIN RIGHT SEMI | | 1 | 8 | 5 | 00:00:01 |
    | 2 | TABLE ACCESS FULL | KAK_T2 | 99 | 297 | 2 | 00:00:01 |
    | 3 | TABLE ACCESS FULL | KAK_T1 | 999 | 4995 | 2 | 00:00:01 |
    —————————————-+———————————–+
    Predicate Information:
    ———————-
    1 – access(SUBSTR(“KAK_T1″.”COL1″,1,7)=SUBSTR(“KAK_T2″.”COL1″,1,7) AND “COL1″=”COL1″)

    Comment by Kevin Kirkpatrick — April 23, 2007 @ 7:21 pm BST Apr 23,2007 | Reply

  12. Hi Jonathan,

    We have two third-party application, lets us call them appA and appB which are developed and sold to our company by different vendors. Data is transferred from appA to appB, some columns in appA have length larger than columns in appB or the data type is VARCHAR in appA and Number in appB, because of which sometimes we get the following ORA-12899: value too large for column string (actual: string, maximum: string) or Non nummeric data . We don’t have a way through which we could control the length/type of data entered from the front end. Could you please let us know which options will be better?

    1) Use check constraints
    2) Or Write triggers to control the length.

    The reason given by people to go in for Trigger is that since we have around 10 fields in the table whose length we have to check, we may get one message which will list all the field names where as if we use a Check constraint we will get one message at a time. Also I tried having a Triiger and chaeck constraint on a column and saw that the Trigger got executed got firs. Please let us know what would be a better option.

    Comment by Himanshu — August 23, 2009 @ 11:47 am BST Aug 23,2009 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,114 other followers