Index Sizing

(As copied by Oracle Corp. in Metalink note 989186.1)

(The comment above was prompted after a DBA reported to his manager that the script I had suggested running to identify suspect indexes was “just something copied from Metalink” For a few years the note did acknowledge the intellectual property rights.)

Make sure you read the notes.

rem
rem     Script:     index_est_proc_2.sql
rem     Author:     Jonathan Lewis
rem     Dated:      August 2005 (updated Apr 2009)
rem     Purpose:    Fast analysis of indexes to help identify
rem                 extreme degeneration.
rem
rem     Last tested
rem             11.1.0.7
rem             10.2.0.3
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem     Not tested
rem             11.2.0.1
rem             10.2.0.4
rem
rem     Usage:
rem     Set the values in the "define" section
rem     Log on with the privilege to see the "dba_" views
rem     using SQL*Plus and run the script.
rem
rem     Notes:
rem     This script assumes that statistics have been collected in
rem     the fairly recent past, and uses some approximations to
rem     compare the number of leaf blocks with the number of leaf
rem     blocks that ought to be needed to hold the data.
rem
rem     There are various little oddities with the way that
rem         (a) Oracle calculates average column lenght and
rem         (b) I use the available data
rem     that mean that at small sizes and in extreme cases the
rem     numbers I produce can be wrong.  In particular, for indexes
rem     where a lot of the table data has nulls (so no entry in the
rem     index), the estimated size can be significantly larger than
rem     they finally turn out to be.
rem
rem
rem     Targets
rem     =======
rem     Where the estimate is very much smaller than the actual, then
rem     you may be looking at a "FIFO" index, emptying out in the past
rem     and filling in the future. This type of index is a candidate for
rem     a regular "coalesce" - although you may want to rebuild it once
rem     to get it to the right starting size and release excess space
rem     back to the tablespace.
rem
rem     See https://jonathanlewis.wordpress.com/2008/09/26/index-analysis/
rem     for an example and discussion on this type of index.
rem
rem     Where the estimate is about half the size of the actual, then
rem     it is worth checking whether there is any special treatment of
rem     the data that is making this happen. 50% utilisation is fairly
rem     common in RAC for indexes based on a sequence with a large cache
rem     size, so it may be best to leave the indexes at that level. 
rem     However, you may find that rebuilding (perhaps just once) with
rem     a pctfree in the region of 30% may give you a slightly more efficient
rem     index in non-RAC systems.
rem
rem     If your index is running at 50% and is not strongly sequence based
rem     then you may be suffering from the concurrency/ITL bug and may want
rem     to rebuild the index and force a maxtrans setting into the index.
rem
rem     If the index is running at a fairly uniform 25%, it may be subject
rem     to side effects of both sequencing and the concurrency effects.
rem
rem     Usage:
rem     ======
rem     This script takes a username (table owner), percent usage, and
rem     scaling factor.  It reports the estimated leaf block count of
rem     all simple indexes for that schema where the size of the index
rem     would be smaller than the supplied fraction of the current size
rem     when rebuilt at the supplied percentage utilisation. Current settings
rem     are 90% (which equates to the default pctfree 10) and 0.6 which means
rem     the index would be running at about 50% empty wastage - which is the
rem     point at which it begins to be a possible target for investigation.
rem     The script does not report any index smaller than 10,000 leaf blocks,
rem     and assumes an 8KB block size.
rem
rem     Technical notes:
rem     ================
rem     Don't need to add a length byte after using dbms_stats
rem     Don't need a 'descending' byte because it's automatically included
rem     Don't need to adjust for num_nulls because it's automatically included
rem     Reverse key indexes don't affect column lengths
rem
rem     Need to worry about COMPRESSED indexes. At present compression
rem     may reduce the size of an index so that I don't notice it should
rem     still be smaller than it is.
rem
rem     Index types that can be used (with partitioned = 'NO')
rem         NORMAL
rem         NORMAL/REV
rem         FUNCTION-BASED NORMAL
rem
rem     Still needs enhancing for partitioned and subpartitioned indexes
rem     Check dba_part_indexes for locality, partitioning_type, subpartitioning_type
rem     But does handle global indexes on partitioned tables.
rem
rem     To investigate
rem         LOB
rem         IOT - TOP
rem         IOT - NESTED
rem         SECONDARY
rem         BITMAP  (and BITMAP JOIN)
rem         FUNCTION-BASED BITMAP
rem         CLUSTER
rem         ANSI ?
rem
rem     Probably not possible
rem         DOMAIN
rem         FUNCTION-BASED DOMAIN
rem
rem     Need to avoid partitioned, temporary, unusable and dropped indexes
rem

spool index_est_proc_2

set verify off
set serveroutput on size 1000000 format wrapped

define m_owner        = '&m_schemaname'
define m_blocksize    = 8192
define m_target_use   = 90 -- equates to pctfree 10
define m_scale_factor = 0.6
define m_minimum      = 10000
define m_overhead     = 192 -- leaf block "lost" space in index_stats

declare
    m_leaf_estimate number;
begin
    for r in (
        select
            table_owner,
            table_name,
            owner       index_owner,
            index_name,
            leaf_blocks
        from
            dba_indexes
        where
            owner = upper('&m_owner')
        and index_type in (
                'NORMAL',
                'NORMAL/REV',
                'FUNCTION-BASED NORMAL'
            )
        and partitioned = 'NO'
        and temporary = 'N'
        and dropped = 'NO'
        and status = 'VALID'
        and last_analyzed is not null
        order by
            owner, table_name, index_name
    ) loop

        if r.leaf_blocks > &m_minimum then
            select
                round(
                    100 / &m_target_use *       -- assumed packing efficiency
                    (
                        ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) +
                        sum(
                            (tc.avg_col_len) *
                            (tab.num_rows)
                        )           -- column data bytes
                    ) / (&m_blocksize - &m_overhead)
                )               index_leaf_estimate
                into    m_leaf_estimate
            from
                (
                select  /*+ no_merge */
                    table_name,
                    num_rows,
                    decode(partitioned,'YES',10,6) rowid_length
                from
                    dba_tables
                where
                    table_name  = r.table_name
                and owner       = r.table_owner
                )               tab,
                (
                select  /*+ no_merge */
                    index_name,
                    index_type,
                    num_rows,
                    decode(uniqueness,'UNIQUE',0,1) uniq_ind
                from
                    dba_indexes
                where
                    table_owner = r.table_owner
                and table_name  = r.table_name
                and owner       = r.index_owner
                and index_name  = r.index_name
                )               ind,
                (
                select  /*+ no_merge */
                    column_name
                from
                    dba_ind_columns
                where
                    table_owner = r.table_owner
                and index_owner = r.index_owner
                and table_name  = r.table_name
                and index_name  = r.index_name
                )               ic,
                (
                select  /*+ no_merge */
                    column_name,
                    avg_col_len
                from
                    dba_tab_cols
                where
                    owner       = r.table_owner
                and table_name  = r.table_name
                )               tc
            where
                tc.column_name = ic.column_name
            group by
                ind.num_rows,
                ind.uniq_ind,
                tab.rowid_length
            ;

            if m_leaf_estimate < &m_scale_factor * r.leaf_blocks then

                dbms_output.put_line(
                    to_char(sysdate,'hh24:mi:ss') || ': ' ||
                    trim(r.table_name) || ' - ' ||
                    trim(r.index_name)
                );

                dbms_output.put_line(
                    'Current Leaf blocks: ' ||
                    to_char(r.leaf_blocks,'999,999,999') ||
                    '         Target size: ' ||
                    to_char(m_leaf_estimate,'999,999,999')
                );

                dbms_output.new_line;

            end if;
        end if;
    end loop;
end;
/
set verify on

spool off

set doc off
doc
A recent sample from my SYS schema (with a minimum of 100 blocks)

15:19:30: HISTGRM$ - I_H_OBJ#_COL#
Current Leaf blocks:          110         Target size:           51

15:19:30: OBJAUTH$ - I_OBJAUTH2
Current Leaf blocks:          107         Target size:           58

15:19:30: WRI$_OPTSTAT_HISTHEAD_HISTORY - I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST
Current Leaf blocks:          180         Target size:           99
#

Comments (24)

24 Comments »

[…] a script to describe the indexes on a single table – column and statistical information Oversized Indexes: a data dictionary scan for simple B-tree indexes that may be unreasonably large for the data […]

Pingback by Simple scripts « Oracle Scratchpad — February 28, 2010 @ 3:38 pm GMT Feb 28,2010 | Reply
[…] A little while ago, while applying the concept on a client site and implementing a mechanism that would allow the DBAs to do the same check as regularly as needed, I discovered a couple of flaws in the original code – so I had to spend a little time correcting it, and thought I’d pass on the latest version. […]

Pingback by Index Efficiency 2 « Oracle Scratchpad — February 28, 2010 @ 3:40 pm GMT Feb 28,2010 | Reply
Note that SELECT .. INTO deriving “m_leaf_estimate” appears to fail with ‘ORA-01403: No data found’ for indexes created on object type attributes because “COLUMN_NAME2 in “DBA_IND_COLUMNS” is not a simple identifier and hence does not join to column_name in “DBA_TAB_COLS”.

However I believe this identifier is reflected in “DBA_TAB_COLS”.”QUALIFIED_COL_NAME” and so indexes of this type might be supported by joining to “DBA_TAB_COLS” on the same – although I do not know for sure whether the “AVG_COL_LEN” would truly reflect the data contained and hence be valid for this purpose.

Having said that indexes on object type columns are relatively few and far between and those implemented in the data dictionary are unlikely to be considered by this report unless heavily using Workspace Manager (WM_SYS schema), XML (XDB schema) or the Data Pump (SYS schema) and running it against the schemas concerned.

As an aside I was surprised to note that the index on “SYS”.”AQ$_KUPC$DATAPUMP_QUETAB_P” (TREAT(“USER_DATA” AS “KUPC$_WORKER_LOG_ENTRY”).”WIP”) was regarded as a ‘NORMAL’ index and not a ‘FUNCTION-BASED’ one.

Comment by padders — March 1, 2010 @ 3:16 pm GMT Mar 1,2010 | Reply
Padders,

Thanks for the extra information; it’s always useful to hear about the special cases. I’ll leave it as an exercise for someone else to test, though.

One thought, if anyone does want to follow this – my general strategy for handling “rare” cases is to write a separate script; it’s much safer and simpler than trying to write a “one script does everything … but no-one dare change it” type of thing.

Comment by Jonathan Lewis — March 2, 2010 @ 8:48 pm GMT Mar 2,2010 | Reply
[…] explain plan for every index in your system – so in a future note I’ll be showing you a simple piece of code that you might feel happy to run against every (simple, B-tree) index in your database. Eco World […]

Pingback by Index Size « Oracle Scratchpad — August 26, 2011 @ 6:31 am BST Aug 26,2011 | Reply
[…] article also includes some related links. The first link is to a script that checks all the indexes for a spesific schema. By the way – DO READ the notes in the […]

Pingback by jcon.no: Oracle Blog - When to rebuild an Oracle index? — January 5, 2012 @ 10:27 am GMT Jan 5,2012 | Reply
Jonathan,

I want to ask two questions related with B*TREE indexes :

1. What are the things/statistics which tells /indicate that an index need to be rebuild. There are rare cases as I have learned where index rebuild can actually help, can you please give few cases where it really helps.
2. What are the statistics using which optimizer decide to use one index over another in a table.

Thanks
Ajeet

Comment by Ajeet — January 28, 2012 @ 5:18 pm GMT Jan 28,2012 | Reply
- Ajeet,
  
  1. I have an entire category on rebuilding indexes: https://jonathanlewis.wordpress.com/category/oracle/indexing/index-rebuilds/
  
  2. The best answer to that question is in my book Cost Based Oracle – Fundamentals: I don’t have space to repeat it all in a comment, but as a starting point here’s a link to a short article I wrote several years ago: https://web.archive.org/web/20200509033755/www.jlcomp.demon.co.uk/12_using_index_i.html
  
  Comment by Jonathan Lewis — January 28, 2012 @ 8:18 pm GMT Jan 28,2012 | Reply
Hello Jonathan,
On line 165 above, instead of tab.num_rows, I think we should have ind.num_rows. Rationale is to account for NULLs. Let’s say index is on single column and half the table rows have a null on this column, thus index rows would be half of those on table. Similar with a composite index when all columns have nulls. What do you think?
Thanks — Carlos

Comment by Carlos Sierra — March 30, 2012 @ 9:44 pm BST Mar 30,2012 | Reply
- Carlos,
  
  The choice of tab.num_rows was deliberate, and based on the fact the avg_col_len already makes allowance for nulls.
  
  Every index entry has a rowid, lock byte, directory entry etc. and that’s why I have a component that uses ind.num_rows at line 162; but the “data” content of the index is dictated by the data content of the table, irrepective of how the columns are distributed across index entries. It’s probably best to see this through an example (engineered to be an extreme case):
```
create table t1(v1 char(6), v2 char(6));
insert into t1
select 
        decode(mod(rownum,2),0,'xxxxxx'),
        'xxxxxx'
from
        all_objects
where   rownum <= 1000
;

execute dbms_stats.gather_table_stats(user,'t1')

select  column_name, avg_col_len
from    user_tab_cols
where   table_name = 'T1'
;

COLUMN_NAME          AVG_COL_LEN
-------------------- -----------
V1                             4
V2                             7

3 rows selected.
```
  Although every populated column has a data content length of 6 bytes, the average (content) is 3 for v1 because half the rows are null. The reported length includes the one extra byte required for the length byte that precedes the column content which is why we see 4 and 7 rather than 3 and 6 in the column stats.
  
  If I had an index on just v1, then ind.num_rows would be 50 because only half the table rows have a value for v1; so if I multiplied the average column length by the number of index entries I would have applied the “half the rows in the table” factor twice. By using tab.num_rows I scale the average column length back up to “total space taken by column” in both the table segment and index segment.
  
  The choice does introduce an error, of course, in as the average column length is rounded up to an integer, leading to an over-estimate. There’s a further error due to the addition of the length byte which I could correct if I used (avg_col_len – 1) multipled tab.num_rows then re-introduced as ind.num_rows * “number of columns in index”.
  
  Comment by Jonathan Lewis — March 31, 2012 @ 8:40 am BST Mar 31,2012 | Reply
Jonathan,

It makes perfect sense now. I had overlooked that avg_col_len already considered nulls. Thanks for the clarification!

Carlos

Comment by carlossierrausa — April 1, 2012 @ 11:57 am BST Apr 1,2012 | Reply
> ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4)

Why ind.uniq_ind is not multiplied by tab.rowid_length?

Comment by Morp — January 18, 2013 @ 12:27 pm GMT Jan 18,2013 | Reply
- Morp,
  
  The extra byte appears only once per index entry (it’s the byte that says the rowid is 6, or 10, bytes long), so we add it to the accumulated length just once.
  
  Comment by Jonathan Lewis — January 18, 2013 @ 12:59 pm GMT Jan 18,2013 | Reply
And what about partitioned indexes? Is there a the way to estimate column size for index partition?
Thnx.

Comment by morp — July 21, 2013 @ 6:30 am BST Jul 21,2013 | Reply
- morp,
  
  Looking through my library I see I have a matching set of 4 scripts, one each for:
  * global indexes
  * globally partitioned indexes
  * local (simple) partitioned indexes
  * local composite partitioned indexes
  
  I’ll have to read through them (the notes are a bit sparse) to check what they do. In simple terms the local partitioning requires you do apply the same sort of logic to one (sub)partition of the table and the matching (sub)partition of the index at the same time. The globally partitioned index is the tricky one because you have to do some sort of approximation relating the relative sizes of the partition to the total size of the table.
  
  Do you have a specific type that you’re interested – I’ll see if I can find some time over the next week or two to start publishing the scripts.
  
  Comment by Jonathan Lewis — July 24, 2013 @ 2:32 pm BST Jul 24,2013 | Reply
[…] used Jonathan Lewis script to locate degenerated indexes –-or indexes that are occupying more space than they should–. […]

Pingback by Index Efficiency | Mohamed Houri’s Oracle Notes — May 12, 2015 @ 8:03 am BST May 12,2015 | Reply
I can’t understand.
Unique index differs from non-unique by the presence additional rowid, right?
Why you are adding row_id length _in_bytes_ with uniq_ind _flag_? Why are you not multiplying rowid_length by uniq_ind+1 ?
And what is the magic constant “4”?
Thank you.

Comment by teamer — July 13, 2016 @ 7:30 am BST Jul 13,2016 | Reply
- Teamer,
  
  The 4 is the row overhead made up of two bytes for the row header plus 2 bytes for the row directory entry.
  
  Plus 1 because the rowid appears in the index entry whether it’s unique or non-unique, but for unique indexes it’s “carried” as a fixed (known) size data item at the start of the entry while for non-unique indexes it appears as if it were an extra column in the index and is therefore preceded by a “length byte” like every other column in an index.
  
  Regards
  Jonathan Lewis
  
  Comment by Jonathan Lewis — July 13, 2016 @ 6:05 pm BST Jul 13,2016 | Reply
Sorry.
… non-unique differs from unique …

Comment by teamer — July 13, 2016 @ 7:31 am BST Jul 13,2016 | Reply
[…] you don’t want to go through this by hand there’s a script on my blog which will do the work for all the indexes in a schema assuming you’ve recently collected […]

Pingback by Investigating Indexes – All Things Oracle — September 28, 2016 @ 3:15 pm BST Sep 28,2016 | Reply
[…] ought to. Jonathan Lewis has helpfully anticipated this need by creating and sharing his Index Efficiency script, which I have been using extensively this last couple of years. This article aims to […]

Pingback by Oracle Partitioned Index Efficiency – All Things Oracle — February 9, 2017 @ 1:39 pm GMT Feb 9,2017 | Reply
[…] That’s probably an indication of an exploded index due to the delete and its pattern. Let’s check this using a little bit enhanced Jonathan Lewis script […]

Pingback by Exploded index causes a wrong execution plan | Mohamed Houri’s Oracle Notes — September 11, 2018 @ 7:57 pm BST Sep 11,2018 | Reply
[…] B-tree Sizing (Feb 2010): report the size a (non-compressed) B-tree index would be if rebuilt at a given pctfree. […]

Pingback by Indexing Catalogue | Oracle Scratchpad — September 26, 2023 @ 11:56 am BST Sep 26,2023 | Reply
“And boss, what has that Jonathon Lewis guy ever contributed that’s original, anyway?” lol

We’ve got what you call doers, and then we’ve got what you call complainers!

Comment by Anonymous — September 26, 2023 @ 6:40 pm BST Sep 26,2023 | Reply