Oracle Scratchpad

October 31, 2006

Virtual columns revisited

Filed under: Indexing,Oracle,Statistics,Tuning — Jonathan Lewis @ 11:57 pm GMT Oct 31,2006

Here’s an idea I had some time ago, but have never implemented on a production system. I’d be interested to know if anyone has tried it.

Virtual columns are useful because they can give Oracle statistics about values derived from stored columns. So, for example, you might like to have a virtual column defined as salary + nvl(commission,0) so that the optimizer can estimate resonably accurately how many employess have salary + nvl(commission,0) > 100000.

But (until 11g, perhaps) you can’t create virtual columns unless you index them using the feature that Oracle describes as ‘function-based indexes’, and perhaps you have far too many virtual columns to make index creation viable. Here’s a workaround (maybe).

Create a single, huge, function-based index in an unusable state, containing all the virtual columns you would like to have – or at least the first 32 of them as that’s the largest number of columns you can have in a single index. Then gather stats and see what happens. For example:

create table t1
with generator as (
        select  --+ materialize
                rownum  id
        from    all_objects
        where   rownum <= 3000 -- > comment to avoid wordpress format  issue
        /*+ ordered use_merge(v2) */
        rownum                          id,
                sysdate + rownum/20,
        )                               date_char,
        dbms_random.string('U',6)       vc1,
        rpad('x',100)                   padding
        generator       v1,
        generator       v2
        rownum <= 10000 -- > comment to avoid wordpress format  issue

create index t1_i1 on t1(

alter session set skip_unusable_indexes = true;    

                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1',
                cascade          => true

I was running this test on a 9.2 database, so I’ve set parameter skip_unusable_indexes to true (which is the default state in 10g). Note, in particular, that I’ve created the index as unusable and that I’ve introduced two different ways of manipulating the date column that I’ve stored as a character on the table.

When you run the gather_table_stats() procedure, you will get an error because the index is unusable, but you will get column-level statistics created on the hidden columns that represent the virtual column definitions in the index:

        column_name, num_distinct, low_value, high_value
        table_name = 'T1'
and     column_name != 'PADDING'

------------ ------------ ------------------ ------------------
ID                  10000 C102               C302
DATE_CHAR             500 3230303631313031   3230303830333134
VC1                 10000 414142524F51       5A5A594F4743
SYS_NC00005$          500 786A0B01010101     786C030E010101
SYS_NC00006$         9888 41414252           5A5A594F
SYS_NC00007$            7 467269646179202020 5765646E6573646179        

Note particularly that column SYS_NC00007$ (the column representing the converson of the ‘pseudo-date’ column into the day of a week) has 7 distinct values, a low value of ‘Monday’ (space padded, interestingly to nine characters) and a high value of ‘Wednesday’). Look what these statistics do for queries tun from SQL*PLus with autotrace. enabled.

set autotrace traceonly explain      

        date_char between '20061115' and '20061215'


Execution Plan
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=20 Card=1 Bytes=9)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=20 Card=20 Bytes=180)      

        to_date(date_char,'YYYYMMDD') between to_date('15-Nov-2006','dd-mon-yyyy')
                                          and to_date('15-Dec-2006','dd-mon-yyyy')

Execution Plan
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=20 Card=1 Bytes=9)
   1    0   SORT (AGGREGATE)
   2    1     TABLE ACCESS (FULL) OF 'T1' (Cost=20 Card=641 Bytes=5769)      

Suddenly we get correct cardinalities for queries. (Note that the ‘pseudo-date’ column was based on sysdate + rownum / 20, so in a one-month range we expect about 620 rows – and the optimizer’s estimate is only out by one day’s worth at 641, which is pretty good when compared to being out by a factor of 31 when we use the original character-based column.

Any problems ?

None that I can think of. Of course it will be a little embarassing (and resource intensive) if someone, or something, tries to rebuild that index. And if you truncate the table the index will automatically be marked as usable and be maintained as new data arrives unless you remember to set it back to unusable immediately after the truncate.

It’s also a bit of a problem that any attempt to gather table stats (or schema or database stats) with the cascade option enabled will cause the procedure to fail when it hits the unusable indexes. So you’re going to have to write some code to do a special job of collecting statistics on any tables with these funny indexes so that you collect their statistics without the cascade option, and then collect the statistics on just their ‘proper’ indexes.

Finally, you’ll need to handle that skip_unusable_indexes parameter somehow – possibly through a logon trigger in versions of Oracle prior to 10g.

So maybe it’s worth a little testing: the threat seems small, the benefits could be huge. But I’ve only played with it on a small scale; possibly a near-real test (on a UAT, or full-scale Development system) would highlight some flaw I hadn’t thought of.  If you try it, let me know.


  1. I’m playing with it…but our scenario is a bit different…no time to explain in depth, but our functional column involves an NVL on a NULLable column to the max date of 31-DEC-9999(“Bad Defaults Affect Cardinality” is the slide you present I believe) – our problem is that we can get the virtual column index but can’t get a histogram on it so can’t see the skew we have on the column in order to assist the CBO to get the cardinalty right.

    One thing I have just seen though, is that you obtain the column low/high values from USER_TAB_COLS. I tried doing the same select, not realising I’d mistakenly typed USER_TAB_COLUMNS and I couldn’t see the SYS virtual column…on realising my typo and switching back to USER_TAB_COLS it appears…which I thought was an interesting little gotcha.

    Comment by Jeff Moss — November 2, 2006 @ 3:02 pm GMT Nov 2,2006 | Reply

  2. Jeff, There is a method_opt for gather_table_stats which does the job. Try ‘for all hidden columns size NNN’. This is available from 9.2 if not earlier.

    Comment by Jonathan Lewis — November 2, 2006 @ 4:11 pm GMT Nov 2,2006 | Reply

  3. …I knew about method_opt for getting the histograms – I’ve worked out now that when I was looking for the histogram I was using a little script of mine to give me lots of optimiser stats for the table in question…unfortunately, as I alluded to in the last post, I found that my script happened to be using the DBA_TAB_COLUMNS view rather than the DBA_TAB_COLS view…which meant I couldn’t actually see the virtual column at all let alone whether there was a histogram on it…after fixing my script and retesting all this I find it’s exactly as you say, the histogram can be created and it does help with getting the correct cardinality for a query with a predicate on a column with skewed data in it.

    Now our problem is that the value we are testing for BETWEEN the from_date AND NVL(to_date,TO_DATE(’31-DEC-9999′,’DD-MON-YYYY’)) comes from a scalar subquery select…so the optimiser doesn’t know what value it is and can’t get the right cardinality – with a literal date value it works fine…more of an architectural approach issue now I guess.

    Comment by Jeff Moss — November 3, 2006 @ 7:53 am GMT Nov 3,2006 | Reply

  4. Can’t remember which version it came into, but you can also directly reference hidden column names, eg “for column SYS_… size 10” etc

    Comment by Connor — November 4, 2006 @ 2:35 am GMT Nov 4,2006 | Reply

  5. Excellent!!!
    (Few months too late… but just what I need right now :-)

    Comment by Mirjana — May 4, 2007 @ 7:57 am BST May 4,2007 | Reply

  6. Indeed, virtual columns are in 11g now!

    Comment by Shakespeare — August 30, 2007 @ 3:32 pm BST Aug 30,2007 | Reply

  7. […] **1: In fact, you can collect statistics on “virtual columns” pre-11g by creating function-based indexes in an unusable state. The method works, but it’s not really a good […]

    Pingback by Virtual Columns « Oracle Scratchpad — April 16, 2009 @ 7:26 pm BST Apr 16,2009 | Reply

RSS feed for comments on this post.

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by

%d bloggers like this: