(As copied by Oracle Corp. in Metalink note 989186.1)
rem
rem Script: index_est_proc_2.sql
rem Author: Jonathan Lewis
rem Dated: August 2005 (updated Apr 2009)
rem Purpose: Fast analysis of indexes to help identify
rem extreme degeneration.
rem
rem Last tested
rem 11.1.0.7
rem 10.2.0.3
rem 10.1.0.4
rem 9.2.0.8
rem 8.1.7.4
rem Not tested
rem 11.2.0.1
rem 10.2.0.4
rem
rem Usage:
rem Set the values in the "define" section
rem Log on with the privilege to see the "dba_" views
rem using SQL*Plus and run the script.
rem
rem Notes:
rem This script assumes that statistics have been collected in
rem the fairly recent past, and uses some approximations to
rem compare the number of leaf blocks with the number of leaf
rem blocks that ought to be needed to hold the data.
rem
rem There are various little oddities with the way that
rem (a) Oracle calculates average column lenght and
rem (b) I use the available data
rem that mean that at small sizes and in extreme cases the
rem numbers I produce can be wrong. In particular, for indexes
rem where a lot of the table data has nulls (so no entry in the
rem index), the estimated size can be significantly larger than
rem they finally turn out to be.
rem
rem
rem Targets
rem =======
rem Where the estimate is very much smaller than the actual, then
rem you may be looking at a "FIFO" index, emptying out in the past
rem and filling in the future. This type of index is a candidate for
rem a regular "coalesce" - although you may want to rebuild it once
rem to get it to the right starting size and release excess space
rem back to the tablespace.
rem
rem See http://jonathanlewis.wordpress.com/2008/09/26/index-analysis/
rem for an example and discussion on this type of index.
rem
rem Where the estimate is about half the size of the actual, then
rem it is worth checking whether there is any special treatment of
rem the data that is making this happen. 50% utilisation is fairly
rem common in RAC for indexes based on a sequence with a large cache
rem size, so it may be best to leave the indexes at that level.
rem However, you may find that rebuilding (perhaps just once) with
rem a pctfree in the region of 30% may give you a slightly more efficient
rem index in non-RAC systems.
rem
rem If your index is running at 50% and is not strongly sequence based
rem then you may be suffering from the concurrency/ITL bug and may want
rem to rebuild the index and force a maxtrans setting into the index.
rem
rem If the index is running at a fairly uniform 25%, it may be subject
rem to side effects of both sequencing and the concurrency effects.
rem
rem Usage:
rem ======
rem This script takes a username (table owner), percent usage, and
rem scaling factor. It reports the estimated leaf block count of
rem all simple indexes for that schema where the size of the index
rem would be smaller than the supplied fraction of the current size
rem when rebuilt at the supplied percentage utilisation. Current settings
rem are 90% (which equates to the default pctfree 10) and 0.6 which means
rem the index would be running at about 50% empty wastage - which is the
rem point at which it begins to be a possible target for investigation.
rem The script does not report any index smaller than 10,000 leaf blocks,
rem and assumes an 8KB block size.
rem
rem Technical notes:
rem ================
rem Don't need to add a length byte after using dbms_stats
rem Don't need a 'descending' byte because it's automatically included
rem Don't need to adjust for num_nulls because it's automatically included
rem Reverse key indexes don't affect column lengths
rem
rem Need to worry about COMPRESSED indexes. At present compression
rem may reduce the size of an index so that I don't notice it should
rem still be smaller than it is.
rem
rem Index types that can be used (with partitioned = 'NO')
rem NORMAL
rem NORMAL/REV
rem FUNCTION-BASED NORMAL
rem
rem Still needs enhancing for partitioned and subpartitioned indexes
rem Check dba_part_indexes for locality, partitioning_type, subpartitioning_type
rem But does handle global indexes on partitioned tables.
rem
rem To investigate
rem LOB
rem IOT - TOP
rem IOT - NESTED
rem SECONDARY
rem BITMAP (and BITMAP JOIN)
rem FUNCTION-BASED BITMAP
rem CLUSTER
rem ANSI ?
rem
rem Probably not possible
rem DOMAIN
rem FUNCTION-BASED DOMAIN
rem
rem Need to avoid partitioned, temporary, unusable and dropped indexes
rem
spool index_est_proc_2
set verify off
set serveroutput on size 1000000 format wrapped
define m_owner = '&m_schemaname'
define m_blocksize = 8192
define m_target_use = 90 -- equates to pctfree 10
define m_scale_factor = 0.6
define m_minimum = 10000
define m_overhead = 192 -- leaf block "lost" space in index_stats
declare
m_leaf_estimate number;
begin
for r in (
select
table_owner,
table_name,
owner index_owner,
index_name,
leaf_blocks
from
dba_indexes
where
owner = upper('&m_owner')
and index_type in (
'NORMAL',
'NORMAL/REV',
'FUNCTION-BASED NORMAL'
)
and partitioned = 'NO'
and temporary = 'N'
and dropped = 'NO'
and status = 'VALID'
and last_analyzed is not null
order by
owner, table_name, index_name
) loop
if r.leaf_blocks > &m_minimum then
select
round(
100 / &m_target_use * -- assumed packing efficiency
(
ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) +
sum(
(tc.avg_col_len) *
(tab.num_rows)
) -- column data bytes
) / (&m_blocksize - &m_overhead)
) index_leaf_estimate
into m_leaf_estimate
from
(
select /*+ no_merge */
table_name,
num_rows,
decode(partitioned,'YES',10,6) rowid_length
from
dba_tables
where
table_name = r.table_name
and owner = r.table_owner
) tab,
(
select /*+ no_merge */
index_name,
index_type,
num_rows,
decode(uniqueness,'UNIQUE',0,1) uniq_ind
from
dba_indexes
where
table_owner = r.table_owner
and table_name = r.table_name
and owner = r.index_owner
and index_name = r.index_name
) ind,
(
select /*+ no_merge */
column_name
from
dba_ind_columns
where
table_owner = r.table_owner
and index_owner = r.index_owner
and table_name = r.table_name
and index_name = r.index_name
) ic,
(
select /*+ no_merge */
column_name,
avg_col_len
from
dba_tab_cols
where
owner = r.table_owner
and table_name = r.table_name
) tc
where
tc.column_name = ic.column_name
group by
ind.num_rows,
ind.uniq_ind,
tab.rowid_length
;
if m_leaf_estimate < &m_scale_factor * r.leaf_blocks then
dbms_output.put_line(
to_char(sysdate,'hh24:mi:ss') || ': ' ||
trim(r.table_name) || ' - ' ||
trim(r.index_name)
);
dbms_output.put_line(
'Current Leaf blocks: ' ||
to_char(r.leaf_blocks,'999,999,999') ||
' Target size: ' ||
to_char(m_leaf_estimate,'999,999,999')
);
dbms_output.new_line;
end if;
end if;
end loop;
end;
/
set verify on
spool off
set doc off
doc
A recent sample from my SYS schema (with a minimum of 100 blocks)
15:19:30: HISTGRM$ - I_H_OBJ#_COL#
Current Leaf blocks: 110 Target size: 51
15:19:30: OBJAUTH$ - I_OBJAUTH2
Current Leaf blocks: 107 Target size: 58
15:19:30: WRI$_OPTSTAT_HISTHEAD_HISTORY - I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST
Current Leaf blocks: 180 Target size: 99
#

[...] a script to describe the indexes on a single table – column and statistical information Oversized Indexes: a data dictionary scan for simple B-tree indexes that may be unreasonably large for the data [...]
Pingback by Simple scripts « Oracle Scratchpad — February 28, 2010 @ 3:38 pm UTC Feb 28,2010 |
[...] A little while ago, while applying the concept on a client site and implementing a mechanism that would allow the DBAs to do the same check as regularly as needed, I discovered a couple of flaws in the original code – so I had to spend a little time correcting it, and thought I’d pass on the latest version. [...]
Pingback by Index Efficiency 2 « Oracle Scratchpad — February 28, 2010 @ 3:40 pm UTC Feb 28,2010 |
Note that SELECT .. INTO deriving “m_leaf_estimate” appears to fail with ‘ORA-01403: No data found’ for indexes created on object type attributes because “COLUMN_NAME2 in “DBA_IND_COLUMNS” is not a simple identifier and hence does not join to column_name in “DBA_TAB_COLS”.
However I believe this identifier is reflected in “DBA_TAB_COLS”.”QUALIFIED_COL_NAME” and so indexes of this type might be supported by joining to “DBA_TAB_COLS” on the same – although I do not know for sure whether the “AVG_COL_LEN” would truly reflect the data contained and hence be valid for this purpose.
Having said that indexes on object type columns are relatively few and far between and those implemented in the data dictionary are unlikely to be considered by this report unless heavily using Workspace Manager (WM_SYS schema), XML (XDB schema) or the Data Pump (SYS schema) and running it against the schemas concerned.
As an aside I was surprised to note that the index on “SYS”.”AQ$_KUPC$DATAPUMP_QUETAB_P” (TREAT(“USER_DATA” AS “KUPC$_WORKER_LOG_ENTRY”).”WIP”) was regarded as a ‘NORMAL’ index and not a ‘FUNCTION-BASED’ one.
Comment by padders — March 1, 2010 @ 3:16 pm UTC Mar 1,2010 |
Padders,
Thanks for the extra information; it’s always useful to hear about the special cases. I’ll leave it as an exercise for someone else to test, though.
One thought, if anyone does want to follow this – my general strategy for handling “rare” cases is to write a separate script; it’s much safer and simpler than trying to write a “one script does everything … but no-one dare change it” type of thing.
Comment by Jonathan Lewis — March 2, 2010 @ 8:48 pm UTC Mar 2,2010 |
[...] explain plan for every index in your system – so in a future note I’ll be showing you a simple piece of code that you might feel happy to run against every (simple, B-tree) index in your database. Eco World [...]
Pingback by Index Size « Oracle Scratchpad — August 26, 2011 @ 6:31 am UTC Aug 26,2011 |
[...] article also includes some related links. The first link is to a script that checks all the indexes for a spesific schema. By the way – DO READ the notes in the [...]
Pingback by jcon.no: Oracle Blog - When to rebuild an Oracle index? — January 5, 2012 @ 10:27 am UTC Jan 5,2012 |
Jonathan,
I want to ask two questions related with B*TREE indexes :
1. What are the things/statistics which tells /indicate that an index need to be rebuild. There are rare cases as I have learned where index rebuild can actually help, can you please give few cases where it really helps.
2. What are the statistics using which optimizer decide to use one index over another in a table.
Thanks
Ajeet
Comment by Ajeet — January 28, 2012 @ 5:18 pm UTC Jan 28,2012 |
Ajeet,
1. I have an entire category on rebuilding indexes: http://jonathanlewis.wordpress.com/category/oracle/indexing/index-rebuilds/
2. The best answer to that question is in my book Cost Based Oracle – Fundamentals: I don’t have space to repeat it all in a comment, but as a starting point here’s a link to a short article I wrote several years ago:http://www.jlcomp.demon.co.uk/12_using_index_i.html
Comment by Jonathan Lewis — January 28, 2012 @ 8:18 pm UTC Jan 28,2012 |
Hello Jonathan,
On line 165 above, instead of tab.num_rows, I think we should have ind.num_rows. Rationale is to account for NULLs. Let’s say index is on single column and half the table rows have a null on this column, thus index rows would be half of those on table. Similar with a composite index when all columns have nulls. What do you think?
Thanks — Carlos
Comment by Carlos Sierra — March 30, 2012 @ 9:44 pm UTC Mar 30,2012 |
Carlos,
The choice of tab.num_rows was deliberate, and based on the fact the avg_col_len already makes allowance for nulls.
Every index entry has a rowid, lock byte, directory entry etc. and that’s why I have a component that uses ind.num_rows at line 162; but the “data” content of the index is dictated by the data content of the table, irrepective of how the columns are distributed across index entries. It’s probably best to see this through an example (engineered to be an extreme case):
create table t1(v1 char(6), v2 char(6)); insert into t1 select decode(mod(rownum,2),0,'xxxxxx'), 'xxxxxx' from all_objects where rownum <= 1000 ; execute dbms_stats.gather_table_stats(user,'t1') select column_name, avg_col_len from user_tab_cols where table_name = 'T1' ; COLUMN_NAME AVG_COL_LEN -------------------- ----------- V1 4 V2 7 3 rows selected.Although every populated column has a data content length of 6 bytes, the average (content) is 3 for v1 because half the rows are null. The reported length includes the one extra byte required for the length byte that precedes the column content which is why we see 4 and 7 rather than 3 and 6 in the column stats.
If I had an index on just v1, then ind.num_rows would be 50 because only half the table rows have a value for v1; so if I multiplied the average column length by the number of index entries I would have applied the “half the rows in the table” factor twice. By using tab.num_rows I scale the average column length back up to “total space taken by column” in both the table segment and index segment.
The choice does introduce an error, of course, in as the average column length is rounded up to an integer, leading to an over-estimate. There’s a further error due to the addition of the length byte which I could correct if I used (avg_col_len – 1) multipled tab.num_rows then re-introduced as ind.num_rows * “number of columns in index”.
Comment by Jonathan Lewis — March 31, 2012 @ 8:40 am UTC Mar 31,2012 |
Jonathan,
It makes perfect sense now. I had overlooked that avg_col_len already considered nulls. Thanks for the clarification!
Carlos
Comment by carlossierrausa — April 1, 2012 @ 11:57 am UTC Apr 1,2012 |
> ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4)
Why ind.uniq_ind is not multiplied by tab.rowid_length?
Comment by Morp — January 18, 2013 @ 12:27 pm UTC Jan 18,2013 |
Morp,
The extra byte appears only once per index entry (it’s the byte that says the rowid is 6, or 10, bytes long), so we add it to the accumulated length just once.
Comment by Jonathan Lewis — January 18, 2013 @ 12:59 pm UTC Jan 18,2013 |