Oracle Scratchpad

August 27, 2010

Quiz Night.

Filed under: Infrastructure,Performance — Jonathan Lewis @ 6:52 pm GMT Aug 27,2010

I was on a customer site recently where I needed to add a NOT NULL constraint to a table of 200 million rows – without taking any downtime. It’s not difficult (provided you are happy with a check constraint rather than a column definition.)

alter table t1
	add constraint t1_ck_colX_nn check (colX is not null)
	enable novalidate
;

The first step creates the constraint and enables it – but doesn’t validate it. This means that future data (and changes) will obey the constraint, but there may be illegal data already in the table that will not be checked. You will have an interruption to service doing this, as your session will wait to lock the table in share mode (mode 4) to add the constraint – so will be blocked by current update transactions, and will block new update transactions. In a typical OLTP system this should result in just a brief pause.

The second step validates the constraint, which needs a slightly more complex piece of code – perhaps something like the following:

declare
	resource_busy exception;
	pragma EXCEPTION_INIT(resource_busy, -54);
begin
	loop
		begin
			execute immediate
			'alter table t1 modify constraint t1_ck_colX_nn validate';
			dbms_output.put_line('Succeeded');
			exit;
		exception
			when resource_busy then
				dbms_output.put_line('Failed');
		end;
		dbms_lock.sleep(0.01);
	end loop;
end;
/

This code tries to validate the constraint and goes into a loop, sleeping for 1/100 second, if it hits the “resource busy” error. Unlike the call to add the constraint, the call to validate it doesn’t wait for a lock – it requests an immediate lock, so it’s going to fail immediately if there are any active transactions on the table. Once it has the lock it drops it, so you’re not going to shut your users out while the validation takes place.

Finally, we get to the quiz.

I ran the code above on a table with 200 million rows. It took Oracle just 0.15 seconds to validate the constraint. How could that be possible ?

Update 28th Aug:

What a lot of excellent responses. The answer in my case was that the table was list partitioned and I didn’t have a partition defined as capable of accepting NULL. But there are other options (as one of the comments said the partitioning could have been range partitioning then in the absence of a “maxvalue” partition there’s nowhere for NULL to go. [See also comment 10 below])

When validating the constraint, Oracle simply has to execute the following query as quickly as possible (sample taken from an 11.1.0.6 trace file):

select /*+ all_rows ordered */ A.rowid, :1, :2, :3 from "TEST_USER"."T1" A where not ( subobject_name is not null)

Here are three plans which I caused to appear by changing the infrastructure of the table:

STAT #6 id=1 cnt=0 pid=0 pos=1 obj=0 op='PARTITION LIST EMPTY (cr=0 pr=0 pw=0 time=0 us cost=309 size=37 card=1)'
STAT #6 id=2 cnt=0 pid=1 pos=1 obj=75588 op='TABLE ACCESS FULL TP (cr=0 pr=0 pw=0 time=0 us cost=309 size=37 card=1)'

STAT #4 id=1 cnt=0 pid=0 pos=1 obj=0 op='BITMAP CONVERSION TO ROWIDS (cr=2 pr=2 pw=2 time=0 us cost=1 size=37 card=1)'
STAT #4 id=2 cnt=0 pid=1 pos=1 obj=75586 op='BITMAP INDEX SINGLE VALUE T1_B1 (cr=2 pr=2 pw=2 time=0 us)'

STAT #6 id=1 cnt=0 pid=0 pos=1 obj=75587 op='INDEX RANGE SCAN T1_I1 (cr=3 pr=3 pw=3 time=0 us cost=2 size=37 card=1)'

The first came from my list-partitioned table, so the optimizer went for a plan that would search just the partition that was allowed to hold NULLs, and found at compile time that there wasn’t one. (But it still had to produce a plan that would describe the action it would have taken if it had found one.)

The second plan came from having a bitmap index on the column. NULLs are included in bitmap indexes, so an index access to find the first NULL would be the fastest thing that Oracle could do to test whether there were any rows failing the constraint.

The last one came from having a multi-column B-tree index with the leading column as my constrained column and one of the other columns defined as NOT NULL. Because at least one of the columns in the index was declared NOT NULL (and I should point out that this particular constraint was both enabled and validated) the optimizer knew that every single row in the table had to be in the index. Because the index started with the correct column Oracle was able to do an index range scan on it to check for NULLs very quickly.

If I’d switched the column order in the index the optimizer could still have chosen to use the index for the check, but it would have to have chosen an index skip scan, index full scan, or index fast full scan to do so and this probably wouldn’t have matched my 0.15 second response time. (Unless I were using Exadata, perhaps.)

15 Comments »

  1. My guess would be that there is a bitmap index on that column, which Oracle could use in the check.

    Comment by Craig — August 27, 2010 @ 7:18 pm GMT Aug 27,2010 | Reply

  2. Was there was an existing index on colx – perhaps with the same number of index entries as there are table rows?

    Comment by Margaret Norman — August 27, 2010 @ 7:22 pm GMT Aug 27,2010 | Reply

  3. I can think of 2 things. Either Craig’s suggestion of a bitmap. Or a multi-column b-tree index with colX as first column, and an existing not-null column as second column.

    For Margaret’s suggestion, I think it won’t work. If the index is only on colx, null values won’t be stored. (you need a multi-column index for that). And the number of entries is not available in a read-consistent way, until you full-scan the index.

    Comment by Arian — August 27, 2010 @ 7:55 pm GMT Aug 27,2010 | Reply

  4. Just thought of another reason.
    The column could be in a (validated) foreign key constraint. That would disallow the null value too.

    Comment by Arian — August 27, 2010 @ 8:06 pm GMT Aug 27,2010 | Reply

  5. I agree with Arian. I tried to do the same operation with and without an index on that column in question. Oracle used a fast full scan of the index if there is a multi-column index with one of them being not null.
    The trace file showed a Full table scan in my first test (without the index) and an index fast full scan in the second test.
    Again, I am still not sure if the validation from Jonathan’s test failed or not, because once it hits the first failure, it is going throw that exception.

    Comment by Kumar Ramalingam — August 27, 2010 @ 8:21 pm GMT Aug 27,2010 | Reply

  6. Maybe because your stopwatch broke. Or maybe you were in a jet going so fast that time slowed. Perhaps your server is from the future when there’s processing power is an order faster. Maybe you ran it on a table where the constraint had already been validated. Wow I can think of lots of reasons.

    Comment by Robert V — August 27, 2010 @ 9:50 pm GMT Aug 27,2010 | Reply

    • Robert,

      The future’s already here – Exadata with tens of gigabutes of SSD at the drive level. (But that wasn’t it – but it probably could have been.)

      Comment by Jonathan Lewis — August 28, 2010 @ 7:56 am GMT Aug 28,2010 | Reply

  7. The table was partitioned (list or range) on this column?

    Comment by Flado — August 27, 2010 @ 10:56 pm GMT Aug 27,2010 | Reply

  8. I note in the docs:

    …validating a constraint that is already enabled does not require any DML locks during validation (unlike validating a previously disabled constraint). Enforcement guarantees that no violations are introduced during the validation. Hence, enabling without validating enables you to reduce the downtime typically associated with enabling a constraint.

    Some benefits of using constraints in this order are:

    No locks are held.
    All constraints can go to enable state concurrently.
    Constraint enabling is done in parallel.

    So maybe you have a highly parallel system. And maybe some quantity of the rows were added while you had the enable novalidate on.

    Comment by jgarry — August 27, 2010 @ 11:03 pm GMT Aug 27,2010 | Reply

  9. My guess: DBA_TAB_COL_STATISTICS has the column NUM_NULLS. Can the system use a zero value from here, knowing it is up2date (not stale) ?

    For 200M rows we can also think of a possible partitioning in place, where partition-wise statistics or local indexes might be present. So if not the stats the parallel processing of partitions can also make things like this faster.

    Btw when using 11g do you still need the pl/sql and the loop? Would “ALTER SESSION SET ddl_lock_timeout=…” work here or it does not apply?

    Comment by Andras Gabor — August 28, 2010 @ 5:24 am GMT Aug 28,2010 | Reply

    • Andras,

      In this case, Oracle wouldn’t depend on num_nulls being zero. Statisitics are always treated as estimates and potentially out of date – even the microsecond after a 100% compute.

      I haven’t tested the effects of ddl_lock_timeout. It isn’t really possible to guess from the description whether or not it should have an effect in this case. Perhaps some other reader has already experimented with it – or can point to a decent blog item about it.

      Comment by Jonathan Lewis — August 28, 2010 @ 7:53 am GMT Aug 28,2010 | Reply

  10. Jonathan,
    Just wanted to point out that it shouldn’t be necessary to not have a partition that can accept NULLs – just a partition that only accepts NULLs, e.g. DEFAULT for list or MAXVALUE for range. The partition would be empty (if the constraint was satisfied) and a full scan would be very quick (assuming there have never been NULLs in that column, otherwise the HWM would have been bumped). Under 11g it might even be “segmentless,” making the performance identical to your case; under 10g and earlier, it would require just a single LIO for the segment header so it could still fit in your 0.15 seconds.
    Cheers,

    Flado

    Comment by Flado — August 28, 2010 @ 9:13 am GMT Aug 28,2010 | Reply

    • Flado,

      Good point. In fact, several good points at once about the benefits of isolating values to partitions and the effects of highwater marks on performance. (I was looking at a table recently that had a highwater mark at 438000 blocks and no data – it was a materialized view log table that had suffered an accident at some point in the past, and refreshes weren’t going quite as fast as expected.)

      Comment by Jonathan Lewis — August 28, 2010 @ 9:24 am GMT Aug 28,2010 | Reply

  11. [...] may remember a short note I wrote a few days ago about adding a not null constraint to a table with 200 million rows without causing the system to stop for a few minutes. I did it with the intent of giving the [...]

    Pingback by Not NULL « Oracle Scratchpad — September 5, 2010 @ 5:56 pm GMT Sep 5,2010 | Reply

  12. Not commenting on the Quiz question but on the CHECK CONSTRAINT instead of a Column Definition to impose NOT NULL, Tom Kyte writes about the advantage of a NOT NULL definition over a Constraint : http://www.oracle.com/technetwork/issue-archive/2010/10-nov/o60asktom-176254.html

    as the Optimizer can better use the Column definition.

    Hemant K Chitale

    Comment by Hemant K Chitale — October 12, 2010 @ 2:29 pm GMT Oct 12,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,267 other followers