Oracle Scratchpad

September 27, 2013

Virtual Stats

Filed under: CBO,Execution plans,Oracle,Statistics — Jonathan Lewis @ 6:49 am BST Sep 27,2013

Or – to be more accurate – real statistics on a virtual column.

This is one of the “10 top tips” that I came up with for my session with Maria Colgan at OOW13. A method of giving more information that might improve execution plans when you can’t change the code. I’ll start with a small data set including a virtual column (running, and a couple of problem queries:

create table t1
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
 	rownum		n1,
 	rownum		n2
 	generator	v2

alter table t1 add (
 	m2	generated always as (mod(n2,3)) virtual
) ; 

	dbms_stats.gather_table_stats( 		ownname		 => user,
		tabname		 =>'T1',
		method_opt 	 => 'for all columns size 1'


explain plan for
select * from t1 where mod(n1,3) = 0

explain plan for
select * from t1 where mod(n1,3) != 0

In many cases the wrong cardinality will make a big difference to the optimizer’s choice of execution plan, and anything we can do “legally” to help the optimizer estimate a better cardinality may give us a better plan. I’m only intending to make a point about cardinality estimates with this example, so I haven’t bothered to create a subtle data set, or any indexes – so when I look at the execution plans I’ll only be looking at the “rows” column of the output.

Given the mod(n1,3) predicates and the 10,000 rows in the table we can estimate that the two cardinalities ought to be 3,333 and 6,667 respectively – but what do the plans show us:

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |   100 |  1000 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   100 |  1000 |     5  (20)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter(MOD("N1",3)=0)

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |   500 |  5000 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   500 |  5000 |     5  (20)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter(MOD("N1",3)<>0)

The estimate are 100 and 500 respectively – examples of the optimizer’s basic guessing strategy of 1% for “function of column = constant” and 5% for “function of column != constant”. (There are a couple of variations on this theme, but 1% and 5% commonly appear.)

You’ll notice, of course, that n1 and n2 are defined in the same way, so what do the estimates look like when I change the predicates to references n2. There’s an enormous difference:

explain plan for
select * from t1 where mod(n2,3)= 0

select * from table(dbms_xplan.display);

| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |  3333 | 33330 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  3333 | 33330 |     5  (20)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter("T1"."M2"=0)

explain plan for
select * from t1 where mod(n2,3) != 0

select * from table(dbms_xplan.display);
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT  |      |  6667 | 66670 |     5  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |  6667 | 66670 |     5  (20)| 00:00:01 |

Predicate Information (identified by operation id):
   1 - filter("T1"."M2"<>0)

The optimizer has got the estimates right! And the reason becomes clear when you look at the predicate section of the execution plan. The optimizer has transformed the queries by recognising that the supplied predicates match the definition of the virtual column, so it has rewritten the queries to use the virtual columns, which means getting rid of the references to the mod() function, and it has accurate statistics about the virtual column.

Think carefully about the significance of what we’ve done. Without changing the code we’ve made it possible for the optimizer to do a better job because it’s no longer guessing.

As I said on the day – this tip (like all tips) is just the tip of the iceberg – take it, think about it, and apply it sensibly and carefully.

Update 4th Oct 2013

It didn’t take long for an iceberg to appear: I was running a training session for a client in Prague yesterday and presented an example of using virtual columns to change execution plans without changing the code and one member of the audience (I forget whether he was a developer of a DBA) promptly came up with this very important question: “What if your code includes statements like insert into t1 values (1,1);”

Of course, your code should always list the column names in the insert statement, but some code doesn’t. In 11g the code will fail with Oracle error “ORA-00947: not enough values”. There is a workaround: rather than creating a virtual column create extended stats to represent the expression. The internal, virtual column representing the expression will then be invisible and the ORA-00947 will not occur.

There is a slight downside to this strategy, though – you’re only allowed 20 sets of extended stats on a single table. That’s quite a lot, but for some people it may not be enough; the limit on virtual columns is dictated by the standard 1,000 column limit for a table.

In 12c there is a nicer alternative – simply declare the virtual column to be invisible. Conveniently Oracle won’t see it for the insert – but you can still use it explicitly by name, and if you use the defining expression in a WHERE clause the optimizer will still recognise that the expression exists as a virtual column with real statistics.

Footnote: a follow-on thought I had about visibility was that some screen-painters automatically generate a single update statement intended to update every column that has been selected by the screen: so if you display a virtual column on-screen (rather than displaying the result of the expression) then the update statement will fail with Oracle error “ORA-54017: UPDATE operation disallowed on virtual columns.”

Update 16th Jan 2014

Another iceberg has just appeared – in a question on OTN.

What’s the side effect of a virtual column based on a function that takes a (relatively) long time to run; and by “relatively” long time I mean that it won’t take long enough for a user inserting or updating a couple of rows to notice the difference – say 10 micro-seconds if it’s pure CPU, say 1 millisecond if it does something like reading a database block (as the function from the example on OTN probably does) ?

When you gather stats for the table Oracle has to run that function for every row in the sample – which is the whole table if you’re using 11g with the auto_sample_size (hence approximate NDV) enabled. In the case of the poster on OTN they’d added a function-based index – which means a hidden virtual column on the table – and seen their stats collection time change from 3 minutes to 2 hours.

Oracle doesn’t store the value of a virtual column on the table – it calculates it at run time when you query the table for that column. It does store the value if the column is part of an index, of course, but only in the index itself, and the stats collection routines don’t (yet) have a mechanism for scanning an index to calculate the column stats for such a column.

A little footnote to this: if you add a virtual column to a table and then gather stats on the table you will see that the average row length increases by the average length of the virtual column. This suggests at first sight that Oracle is actually storing the values for the column; but a block dump will show that this is not the case – it’s just an example of the way in which the full code set for a new feature tends to have little bits missing round the edges: the code to calculate the average column blindly calls the definition of the virtual column to calculate the length of the result, and the code to calculate the average row length simply sums the average column lengths – neither piece of code (yet) allows for the fact that the column is virtual.  (If you use the analyze command it, I think it takes the row length from the block by comparing address offsets within the block – which means it gets a more consistent answer.)


  1. When I read your article about histograms the following text:

    “There is a limitation to the virtual column / function-based index approach (whichever method you use) – you have to change the application code to take advantage of it – the “proper” virtual columns in 11g make the code look neater than the FBI code, but there still has to be a change, e.g (for the FBI approach):”

    I was going to signal in the histogram article that, to take advantage of virtual column, it might not be necessary to change the code, but finally renounced to publish the comment. I am happy to see you proving that here in this new article on virtual stats.

    And, for this particular case of virtual column definition that contains a literal value, mod (n2,3), I was wondering what will happen if the cursor sharing value has been set to FORCE

    SQL> alter session set cursor_sharing=FORCE;
    Session altered.
    SQL> select * from t1 where mod(n2,3)= 0;
    3333 rows selected.
    SQL>  select * from table(dbms_xplan.display_cursor);
    SQL_ID  bv18rk2yf2ck9, child number 0
    select * from t1 where mod(n2,:"SYS_B_0")= :"SYS_B_1"
    Plan hash value: 3617692013
    | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    |   0 | SELECT STATEMENT  |      |       |       |     8 (100)|          |
    |*  1 |  TABLE ACCESS FULL| T1   |   100 |  1100 |     8   (0)| 00:00:01 |
    Predicate Information (identified by operation id):
       1 - filter(MOD("N2",:SYS_B_0)=:SYS_B_1)

    The CBO got it wrong in this case.

    Best regards

    Comment by Mohamed Houri — September 29, 2013 @ 3:55 pm BST Sep 29,2013 | Reply

  2. Mohamed,

    Thanks for pointing out that the side effects that cursor_sharing can have on this strategy. It’s another of those “what if” types of questions that has to be raised as early on as possible before committing to a new feature.

    Comment by Jonathan Lewis — October 4, 2013 @ 9:39 am BST Oct 4,2013 | Reply

  3. […] or boundary condition, or edge case. I’ve already picked this up once as an addendum to an earlier blog note on virtual stats, which linked to an article on OTN describing how the time taken to collect stats on a table […]

    Pingback by Auto Sample Size | Oracle Scratchpad — March 2, 2014 @ 6:39 pm BST Mar 2,2014 | Reply

  4. Josef,

    Very easy to read a little more into that sentence in the manual than is really there.
    I made a similar point, but perhaps with less risk of misinterpretation in the first of a three-parter on 12c histogram:

    Comment by Jonathan Lewis — July 8, 2015 @ 7:09 pm BST Jul 8,2015 | Reply

RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by