In my previous post, I made the comment:
In general, if you have a three-column index that starts with the same columns in the same order as the two-column index then the three-column index will be bigger and have a higher clustering_factor.
So what scenarios can you come up with that fall outside the general case ?
Alternatively, what argument could you put forward that justifies the general claim ?
I’ll try to respond to comments on this post a little more quickly than the last one, but I still have quite a lot of other comments to catch up on.

We can get higher CF if will use large value for pctfree for two-columns index or create it within tablespace with smaller block size.
Comment by Valentin Nikotin — January 19, 2012 @ 9:06 am UTC Jan 19,2012 |
i.e. index will be bigger, CF does’t depend on index storing.
Comment by Valentin Nikotin — January 19, 2012 @ 9:30 am UTC Jan 19,2012 |
Also three-column index may be rebuilt recently.
Comment by Valentin Nikotin — January 19, 2012 @ 9:34 am UTC Jan 19,2012 |
An example how to get smaller values for CF with 3-col index: Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> create table t1 (a int, b int, c int, pad varchar2(3000) default lpad('x',3000,'x')); Table created. SQL> insert into t1(a, b, c) values(2, 2, 2); 1 row created. SQL> insert into t1(a, b, c) values(null, null, null); 1 row created. SQL> insert into t1(a, b, c) values(1, 1, 0); 1 row created. SQL> insert into t1(a, b, c) values(2, 2, 1); 1 row created. SQL> commit; Commit complete. SQL> create index i1 on t1 (a, b, c); Index created. SQL> create index i2 on t1 (a, b); Index created. SQL> select i.index_name, 2 i.clustering_factor, 3 i.leaf_blocks 4 from user_indexes i where i.index_name in ('I1','I2'); INDEX_NAME CLUSTERING_FACTOR LEAF_BLOCKS ------------------------------ ----------------- ----------- I1 2 1 I2 3 1Comment by Valentin Nikotin — January 19, 2012 @ 12:41 pm UTC Jan 19,2012 |
Valentin,
Another good point – it is possible to find edge cases (particularly with a small number of index entries) where the order of data arrival can have a surprising impact. In your case the effect is also largely dependent on the great length of the table rows.
Comment by Jonathan Lewis — January 26, 2012 @ 10:32 pm UTC Jan 26,2012 |
Hi Jonathan,
if you use index key compression on the three-column index, it can be smaller and the clustering_factor can vary.
Regards
Stefan
Comment by Stefan — January 19, 2012 @ 11:18 am UTC Jan 19,2012 |
Clustering_factor does not depend on the way index is stored but only on indexed columns.
Comment by Valentin Nikotin — January 19, 2012 @ 12:58 pm UTC Jan 19,2012 |
Hello Valentin,
sorry i misinterpret it. You are right, that the clustering_factor depends on how the data of the table is sorted in relation to the index and this does not change in case of index key compression.
Regards
Stefan
Comment by Stefan — January 19, 2012 @ 1:34 pm UTC Jan 19,2012 |
Stefan,
As Valentin has pointed out, compression won’t affect the clustering_factor, but it will affect the size (number of leaf blocks) in the index – and that does have some impact on the cost of using the index; so if you have different compression on the indexes you can get counter-intuitive results.
Comment by Jonathan Lewis — January 26, 2012 @ 10:34 pm UTC Jan 26,2012 |
Well, if you have a lot of ties for the second column and the third column in context of the key path is well correlated with the physical order of rows in blocks, then I suppose you could get a better cluster factor. (I haven’t tested this – yet.) If Oracle is already ordering by file, block, and row within ties across possibly many leaf blocks spanning the tie, then this wouldn’t help [and points to a possible enhancement if they aren't already doing it.] If it does help, then possibly the reduction in rowid reference key size for partially repeated values could, I suppose, win a trade-off and the extra bit of key length required for the third column. I guess I know what I’ll be doing the next little while…
I suppose too that the third column’s key could consistently fit in the leaf block space that won’t fit another full row, then the index would be the same size, not bigger.
I’m not sure why you claim the 3rd key makes the cluster factor worse. Even if Oracle already sorts ties within the second key, the third key could represent an equally good cluster factor, right? Is this just a quibble on the <= versus < boundary? Probably.
This test will delay the start of my reading your new book, which has arrived.
Comment by Mark Farnham — January 19, 2012 @ 2:37 pm UTC Jan 19,2012 |
If we have index on column A then within the group defined by any fixed value a1 all index entries will be ordered by rowid, thus when we are going through the leaf rows we get minimal possible number of changes of relevant table blocks.
If we add additional column B into the index, it may change the order of rowid when we are going through the leaf rows, thus the number of changes of relevant table blocks may also change (increase from the minimum).
So far so good, but when we jump to the next group defined by a2 (…<a1<a2<…), we can get one additional change of table block for 1-col index as compared with 2-col index in the following situation:
the table has rows with A = a1 and A = a2 in the same block (B1) and also rows with A = a2 in the previous (order by rowid) block (B2),
but column B might help to order the index entries within group defined by a2 so that they will go first from the block B1 and then from the block B2 for A = a2.
Thus we will get one less change for the 2-col index as compared with 1-col. Same is for 2- and 3- columns indexes.
The simplest situation (for 2-col and 3-col indexes) is presented in the previous comment.
Comment by Valentin Nikotin — January 19, 2012 @ 6:16 pm UTC Jan 19,2012 |
Valentin,
I think your first couple of statements summarise the informal argument about why we generally assume a two-column index will have a lower clustering_factor than a three-column index – the inclusion of the rowid in the index entry is very important, so if we have six rows with the same (two-column) key, three rows in each of two blocks (call them B1, B2) the block component of the index entries would be for (B1, B1, B1, B2, B2, B2). Adding an extra column to the index could then very easily change the order of visiting the rows to (B1, B2, B1, B2, B1, B2).
On the other hand, as you point out, there are patterns of data where an odd synchronsiation of values and block locations could contradict the general intuition.
Comment by Jonathan Lewis — January 26, 2012 @ 11:10 pm UTC Jan 26,2012 |
Jonathan,
for a bitmap index the number of columns should have no impact on the clustering factor – as it is equal to num_rows. In a b*tree index the clustering factor should remain unchanged if the third column is a constant (or null). Maybe the value of these indexes would be limited…
Comment by Martin Preiss — January 19, 2012 @ 6:33 pm UTC Jan 19,2012 |
for the sake of completeness: the CF would be the same, but the three-column indexes would be bigger in both cases
Comment by Martin Preiss — January 19, 2012 @ 6:36 pm UTC Jan 19,2012 |
Martin,
I wasn’t thinking about bitmap indexes when I posed the question, but my first thought is that if (‘x’,'y’) is a key value in the two-column index then it may have one index entry; if you add a third column you may have N index entries like (‘x’,'y’,'a’), (‘x’,'y’,'b’) – with a corresponding increase in the clustering_factor. This won’t change the cost of using the index, of course, since the clustering_factor of a bitmap index isn’t used in the cost.
Comment by Jonathan Lewis — January 26, 2012 @ 10:40 pm UTC Jan 26,2012 |
Following stronger statement is true. For many “not very unique” indexes we can create a function (falsely deterministic) such that if you create the FBI with the columns from the original index and this function will be after them, the clustering factor of the resulting index will be smaller than the original.
See simple example below.
create or replace procedure decrease_cf ( i_table_owner varchar2, i_table_name varchar2, i_index_owner varchar2, i_index_name varchar2 ) authid current_user is l_pk_cols varchar2(32767); l_pk_cols_arg varchar2(32767); l_pk_cols_pred varchar2(32767); l_cols varchar2(32767); l_cols_pred varchar2(32767); l_tbl_name_cf varchar2(30) := substr(i_table_name, 1, 23)||'_4INDCF'; l_ind_bno varchar2(30) := substr(l_tbl_name_cf,1,26)||'_BNO'; l_fnc_name_cf varchar2(30) := 'FNC_'||substr(i_table_name, 1, 19)||'_4INDCF'; l_ind_name_cf varchar2(30) := substr(i_index_name, 1, 27)||'_CF'; l_cnt number; l_stmt_ctas varchar2(32767) := q'{ create table "$tbl_name_cf$" as with T as ( select dbms_rowid.rowid_block_number(rowid) bno, dense_rank()over(order by $col_list$) rn, min(dbms_rowid.rowid_block_number(rowid))over(partition by $col_list$) minbno, max(dbms_rowid.rowid_block_number(rowid))over(partition by $col_list$) maxbno, $col_list$ from "$tbl_owner$"."$tbl_name$" group by dbms_rowid.rowid_block_number(rowid), $col_list$ ) select $col_list$, bno from t t1 where minbno <> bno and exists (select 1 from t t2 where t2.maxbno = t1.bno and t2.rn = t1.rn - 1) }'; l_stmt_fnc varchar2(32767) := q'{ create or replace function "$fnc_name_cf$" ($pk_cols_arg$) return number deterministic is l_res number; begin select 1 - count(*) into l_res from "$tbl_owner$"."$tbl_name$" t , "$tbl_name_cf$" tc where tc.bno = dbms_rowid.rowid_block_number(t.rowid) $pk_cols_pred$ $cols_pred$; return l_res; end; }'; begin select rtrim(listagg('"'||cc.column_name||'",' )within group(order by cc.position), ','), rtrim(listagg('"'||cc.column_name||'" '|| '"'||c.owner||'".'|| '"'||c.table_name||'".'|| '"'||cc.column_name||'"%type,' )within group(order by cc.position), ','), listagg('and t."'||cc.column_name||'" = "$fnc_name_cf$"."'||cc.column_name||'" ' )within group(order by cc.position) into l_pk_cols, l_pk_cols_arg, l_pk_cols_pred from all_constraints c , all_cons_columns cc where c.owner = cc.owner and c.table_name = cc.table_name and c.constraint_type = 'P' and c.owner = i_table_owner and c.table_name = i_table_name; if l_pk_cols is null then dbms_output.put_line('table hasn''t pk'); return; end if; --dbms_output.put_line('l_pk_cols = '||l_pk_cols); --dbms_output.put_line('l_pk_cols_arg = '||l_pk_cols_arg); --dbms_output.put_line('l_pk_cols_pred = '||l_pk_cols_pred); select rtrim(listagg('"'||ic.column_name||'",')within group(order by ic.column_position), ','), listagg('and t."'||ic.column_name||'" = tc."'||ic.column_name||'" ' )within group(order by ic.column_position) into l_cols, l_cols_pred from all_ind_columns ic where ic.index_owner = i_index_owner and ic.index_name = i_index_name and ic.table_owner = i_table_owner and ic.table_name = i_table_name; if l_cols is null then dbms_output.put_line('index not found'); return; end if; --dbms_output.put_line('l_cols = '||l_cols); --dbms_output.put_line('l_cols_pred = '||l_cols_pred); for rec in (select 1 from user_indexes where index_name = l_ind_name_cf) loop execute immediate 'drop index "'||l_ind_name_cf||'"'; dbms_output.put_line('index '||l_ind_name_cf ||' is droped'); end loop; for rec in (select 1 from user_procedures where procedure_name = l_fnc_name_cf) loop execute immediate 'drop function "'||l_fnc_name_cf||'"'; dbms_output.put_line('index '||l_fnc_name_cf ||' is droped'); end loop; for rec in (select 1 from user_tables where table_name = l_tbl_name_cf) loop execute immediate 'drop table "'||l_tbl_name_cf||'" purge'; dbms_output.put_line('table '||l_tbl_name_cf ||' is droped'); end loop; l_stmt_ctas := replace (l_stmt_ctas, '$tbl_name_cf$', l_tbl_name_cf); l_stmt_ctas := replace (l_stmt_ctas, '$tbl_owner$', i_table_owner); l_stmt_ctas := replace (l_stmt_ctas, '$tbl_name$', i_table_name); l_stmt_ctas := replace (l_stmt_ctas, '$col_list$', l_cols); --dbms_output.put_line(l_stmt_ctas); execute immediate l_stmt_ctas; execute immediate 'create index "'||l_ind_bno||'" on "'||l_tbl_name_cf||'"(bno)'; execute immediate 'select count(*) from "'||l_tbl_name_cf||'"' into l_cnt; -- dbms_output.put_line('table '||l_tbl_name_cf ||'is created with '||l_cnt||' rows'); if l_cnt = 0 then dbms_output.put_line('index can''t be improved'); return; end if; l_stmt_fnc := replace (l_stmt_fnc, '$pk_cols_arg$', l_pk_cols_arg); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_owner$', i_table_owner); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_name$', i_table_name); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_name_cf$', l_tbl_name_cf); l_stmt_fnc := replace (l_stmt_fnc, '$pk_cols_pred$', l_pk_cols_pred); l_stmt_fnc := replace (l_stmt_fnc, '$cols_pred$', l_cols_pred); l_stmt_fnc := replace (l_stmt_fnc, '$fnc_name_cf$', l_fnc_name_cf); --dbms_output.put_line(l_stmt_fnc); execute immediate l_stmt_fnc; execute immediate 'create index "'||l_ind_name_cf||'" on "' ||i_table_owner||'"."'||i_table_name||'" (' ||l_cols||', "'||l_fnc_name_cf||'"('||l_pk_cols||'))'; dbms_stats.gather_table_stats('"'||i_table_owner||'"', '"'||i_table_name||'"', estimate_percent => 100, cascade => true); for rec in (select i.owner, i.index_name, i.clustering_factor from all_indexes i where i.index_name in (l_ind_name_cf, i_index_name)) loop dbms_output.put_line(rpad(rec.owner||'.'||rec.index_name, 62, ' ')||rec.clustering_factor); end loop; end; /Comment by Valentin Nikotin — January 19, 2012 @ 11:06 pm UTC Jan 19,2012 |
Сorrect typos and slightly improve the function
create or replace procedure decrease_cf ( i_table_owner varchar2, i_table_name varchar2, i_index_owner varchar2, i_index_name varchar2 ) authid current_user is l_pk_cols varchar2(32767); l_pk_cols_arg varchar2(32767); l_pk_cols_pred varchar2(32767); l_cols varchar2(32767); l_cols_pred varchar2(32767); l_tbl_name_cf varchar2(30) := substr(i_index_name, 1, 23)||'_4INDCF'; l_ind_bno varchar2(30) := substr(l_tbl_name_cf,1,26)||'_BNO'; l_fnc_name_cf varchar2(30) := 'FNC_'||substr(i_index_name, 1, 19)||'_4INDCF'; l_ind_name_cf varchar2(30) := substr(i_index_name, 1, 27)||'_CF'; l_stmt_ind_cf varchar2(32767); l_cnt number; l_stmt_ctas varchar2(32767) := q'{ create table "$tbl_name_cf$" as with T as ( select dbms_rowid.rowid_block_number(rowid) bno, dense_rank()over(order by $col_list$) rn, min(dbms_rowid.rowid_block_number(rowid))over(partition by $col_list$) minbno, dense_rank()over(partition by $col_list$ order by dbms_rowid.rowid_block_number(rowid) desc) rn2, $col_list$ from "$tbl_owner$"."$tbl_name$" group by dbms_rowid.rowid_block_number(rowid), $col_list$ ) , T1 as ( select rn, max(decode(rn2, 2, bno)) maxbno2, max(decode(rn2, 1, bno)) maxbno1 from T group by rn ) , REQ(rn, bno, maxbno) as ( select rn, null bno, maxbno1 maxbno from T1 where rn = 1 union all select T1.rn, T.bno, decode(T.bno, T1.maxbno1, T1.maxbno2, T1.maxbno1) maxbno from REQ r join T1 on T1.rn = r.rn + 1 left join T on T.rn = T1.rn and T.bno = r.maxbno and T.bno <> T.minbno ) select $col_list$, bno from T where exists (select 1 from REQ where REQ.rn = T.rn and REQ.bno = T.bno) }'; l_stmt_fnc varchar2(32767) := q'{ create or replace function "$fnc_name_cf$" ($pk_cols_arg$) return number deterministic is l_res number; begin select 1 - count(*) into l_res from "$tbl_owner$"."$tbl_name$" t , "$tbl_name_cf$" tc where tc.bno = dbms_rowid.rowid_block_number(t.rowid) $pk_cols_pred$ $cols_pred$; return l_res; end; }'; begin select rtrim(listagg('"'||cc.column_name||'",' )within group(order by cc.position), ','), rtrim(listagg('"'||cc.column_name||'" '|| '"'||c.owner||'".'|| '"'||c.table_name||'".'|| '"'||cc.column_name||'"%type,' )within group(order by cc.position), ','), listagg('and t."'||cc.column_name||'" = "$fnc_name_cf$"."'||cc.column_name||'" ' )within group(order by cc.position) into l_pk_cols, l_pk_cols_arg, l_pk_cols_pred from all_constraints c , all_cons_columns cc where c.owner = cc.owner and c.table_name = cc.table_name and c.constraint_type = 'P' and c.owner = i_table_owner and c.table_name = i_table_name; if l_pk_cols is null then dbms_output.put_line('table hasn''t pk'); return; end if; --dbms_output.put_line('l_pk_cols = '||l_pk_cols); --dbms_output.put_line('l_pk_cols_arg = '||l_pk_cols_arg); --dbms_output.put_line('l_pk_cols_pred = '||l_pk_cols_pred); select rtrim(listagg('"'||ic.column_name||'",')within group(order by ic.column_position), ','), listagg('and t."'||ic.column_name||'" = tc."'||ic.column_name||'" ' )within group(order by ic.column_position) into l_cols, l_cols_pred from all_ind_columns ic where ic.index_owner = i_index_owner and ic.index_name = i_index_name and ic.table_owner = i_table_owner and ic.table_name = i_table_name; if l_cols is null then dbms_output.put_line('index not found'); return; end if; --dbms_output.put_line('l_cols = '||l_cols); --dbms_output.put_line('l_cols_pred = '||l_cols_pred); for rec in (select 1 from user_indexes where index_name = l_ind_name_cf) loop execute immediate 'drop index "'||l_ind_name_cf||'"'; dbms_output.put_line('index '||l_ind_name_cf ||' is droped'); end loop; for rec in (select 1 from user_procedures where procedure_name = l_fnc_name_cf) loop execute immediate 'drop function "'||l_fnc_name_cf||'"'; dbms_output.put_line('index '||l_fnc_name_cf ||' is droped'); end loop; for rec in (select 1 from user_tables where table_name = l_tbl_name_cf) loop execute immediate 'drop table "'||l_tbl_name_cf||'" purge'; dbms_output.put_line('table '||l_tbl_name_cf ||' is droped'); end loop; l_stmt_ctas := replace (l_stmt_ctas, '$tbl_name_cf$', l_tbl_name_cf); l_stmt_ctas := replace (l_stmt_ctas, '$tbl_owner$', i_table_owner); l_stmt_ctas := replace (l_stmt_ctas, '$tbl_name$', i_table_name); l_stmt_ctas := replace (l_stmt_ctas, '$col_list$', l_cols); --dbms_output.put_line(substr(l_stmt_ctas, 1, 1000)); --dbms_output.put_line(substr(l_stmt_ctas, 1001)); --return; execute immediate l_stmt_ctas; execute immediate 'create index "'||l_ind_bno||'" on "'||l_tbl_name_cf||'"(bno)'; execute immediate 'select count(*) from "'||l_tbl_name_cf||'"' into l_cnt; dbms_output.put_line('table '||l_tbl_name_cf ||' is created with '||l_cnt||' rows'); if l_cnt = 0 then dbms_output.put_line('index can''t be improved'); return; end if; l_stmt_fnc := replace (l_stmt_fnc, '$pk_cols_arg$', l_pk_cols_arg); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_owner$', i_table_owner); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_name$', i_table_name); l_stmt_fnc := replace (l_stmt_fnc, '$tbl_name_cf$', l_tbl_name_cf); l_stmt_fnc := replace (l_stmt_fnc, '$pk_cols_pred$', l_pk_cols_pred); l_stmt_fnc := replace (l_stmt_fnc, '$cols_pred$', l_cols_pred); l_stmt_fnc := replace (l_stmt_fnc, '$fnc_name_cf$', l_fnc_name_cf); --dbms_output.put_line(l_stmt_fnc); execute immediate l_stmt_fnc; l_stmt_ind_cf := 'create index "'||l_ind_name_cf||'" on "' ||i_table_owner||'"."'||i_table_name||'" (' ||l_cols||', "'||l_fnc_name_cf||'"('||l_pk_cols||'))'; execute immediate l_stmt_ind_cf; dbms_output.put_line('index created:'); dbms_output.put_line(l_stmt_ind_cf); dbms_stats.gather_table_stats('"'||i_table_owner||'"', '"'||i_table_name||'"', estimate_percent => 100, cascade => true); for rec in (select i.owner, i.index_name, i.clustering_factor from all_indexes i where i.index_name in (l_ind_name_cf, i_index_name)) loop dbms_output.put_line(rpad(rec.owner||'.'||rec.index_name, 62, ' ')||rec.clustering_factor); end loop; end; /Comment by Valentin Nikotin — January 20, 2012 @ 6:54 am UTC Jan 20,2012 |
For a bulk loaded table the CF for the 2-column table should be at least as good as the CF of the 3-column table.
If the data is not bulk loaded and the third column increases for each insert (for example insertion date) I would expect a better CF for the 3-column table.
Comment by Michael vR — January 20, 2012 @ 9:11 am UTC Jan 20,2012 |
Guess it’s all a question of math. 2 columns index, combination of two values. 3 Columns index, combination of 3 values, taken by three. There goes your clustering factor…the last column increases the number of combinations possible. Of course compression will save something,but compression is used (correct me if i’m wrong) for the prefix, not the sufix, meaning:
in a two column index, you can compress the first column
in a three column index (depending on the density of the second column) you can compress the first two columns.
Still, that will not be the deciding factor.
Fact is: third column value has to be stored in the index. Clustering factor increases due to the number of combinations.
Maybe i’m wrong, but that’s how i see it.
Alex
Comment by Alexandru Ersenie — January 24, 2012 @ 11:24 am UTC Jan 24,2012 |
Jonathan,
I’m working on a test case scenario (looks tricky!)
I posted something recently
http://srivenukadiyala.wordpress.com/2011/12/25/optimizer-might-ignore-a-more-suitable-superset-composite-index/
regards
srivenu
Comment by srivenu kadiyala — January 25, 2012 @ 11:16 am UTC Jan 25,2012 |
Srivenu,
Your blog highlights an interesting side-effect of the flaw in the optimizer’s model when it comes to using the clustering_factor. (It also fit quite nicely with a little posting I did recently about enhancing index statistics to include figures about every prefix combination.)
Comment by Jonathan Lewis — January 26, 2012 @ 11:00 pm UTC Jan 26,2012 |
I could think of a test case, where a 2 column global index on a partitioned table could have a higher clustering factor than a super set 3 column local index. (I dont have access to an oracle database right now to test it out).
srivenu
Comment by srivenu kadiyala — January 25, 2012 @ 12:59 pm UTC Jan 25,2012 |
Srivenu,
The mixture of local and global indexes is certainly a case where the side effects on clustering_factor can be very counter-intuitive.
On a completely different tack, creating a hash-partitioned index on a non-partitioned table could also result in less contention for popular blocks on insertion, which can avoid a bug that causes indexes to become much bigger than they need to. There’s lots of room for discusion when looking at partitioned indexes.
Comment by Jonathan Lewis — January 26, 2012 @ 10:56 pm UTC Jan 26,2012 |