Oracle Scratchpad

Index Sizing

(As copied by Oracle Corp. in Metalink note 989186.1)

rem
rem     Script:     index_est_proc_2.sql
rem     Author:     Jonathan Lewis
rem     Dated:      August 2005 (updated Apr 2009)
rem     Purpose:    Fast analysis of indexes to help identify
rem                 extreme degeneration.
rem
rem     Last tested
rem             11.1.0.7
rem             10.2.0.3
rem             10.1.0.4
rem              9.2.0.8
rem              8.1.7.4
rem     Not tested
rem             11.2.0.1
rem             10.2.0.4
rem
rem     Usage:
rem     Set the values in the "define" section
rem     Log on with the privilege to see the "dba_" views
rem     using SQL*Plus and run the script.
rem
rem     Notes:
rem     This script assumes that statistics have been collected in
rem     the fairly recent past, and uses some approximations to
rem     compare the number of leaf blocks with the number of leaf
rem     blocks that ought to be needed to hold the data.
rem
rem     There are various little oddities with the way that
rem         (a) Oracle calculates average column lenght and
rem         (b) I use the available data
rem     that mean that at small sizes and in extreme cases the
rem     numbers I produce can be wrong.  In particular, for indexes
rem     where a lot of the table data has nulls (so no entry in the
rem     index), the estimated size can be significantly larger than
rem     they finally turn out to be.
rem
rem
rem     Targets
rem     =======
rem     Where the estimate is very much smaller than the actual, then
rem     you may be looking at a "FIFO" index, emptying out in the past
rem     and filling in the future. This type of index is a candidate for
rem     a regular "coalesce" - although you may want to rebuild it once
rem     to get it to the right starting size and release excess space
rem     back to the tablespace.
rem
rem     See http://jonathanlewis.wordpress.com/2008/09/26/index-analysis/
rem     for an example and discussion on this type of index.
rem
rem     Where the estimate is about half the size of the actual, then
rem     it is worth checking whether there is any special treatment of
rem     the data that is making this happen. 50% utilisation is fairly
rem     common in RAC for indexes based on a sequence with a large cache
rem     size, so it may be best to leave the indexes at that level. 
rem     However, you may find that rebuilding (perhaps just once) with
rem     a pctfree in the region of 30% may give you a slightly more efficient
rem     index in non-RAC systems.
rem
rem     If your index is running at 50% and is not strongly sequence based
rem     then you may be suffering from the concurrency/ITL bug and may want
rem     to rebuild the index and force a maxtrans setting into the index.
rem
rem     If the index is running at a fairly uniform 25%, it may be subject
rem     to side effects of both sequencing and the concurrency effects.
rem
rem     Usage:
rem     ======
rem     This script takes a username (table owner), percent usage, and
rem     scaling factor.  It reports the estimated leaf block count of
rem     all simple indexes for that schema where the size of the index
rem     would be smaller than the supplied fraction of the current size
rem     when rebuilt at the supplied percentage utilisation. Current settings
rem     are 90% (which equates to the default pctfree 10) and 0.6 which means
rem     the index would be running at about 50% empty wastage - which is the
rem     point at which it begins to be a possible target for investigation.
rem     The script does not report any index smaller than 10,000 leaf blocks,
rem     and assumes an 8KB block size.
rem
rem     Technical notes:
rem     ================
rem     Don't need to add a length byte after using dbms_stats
rem     Don't need a 'descending' byte because it's automatically included
rem     Don't need to adjust for num_nulls because it's automatically included
rem     Reverse key indexes don't affect column lengths
rem
rem     Need to worry about COMPRESSED indexes. At present compression
rem     may reduce the size of an index so that I don't notice it should
rem     still be smaller than it is.
rem
rem     Index types that can be used (with partitioned = 'NO')
rem         NORMAL
rem         NORMAL/REV
rem         FUNCTION-BASED NORMAL
rem
rem     Still needs enhancing for partitioned and subpartitioned indexes
rem     Check dba_part_indexes for locality, partitioning_type, subpartitioning_type
rem     But does handle global indexes on partitioned tables.
rem
rem     To investigate
rem         LOB
rem         IOT - TOP
rem         IOT - NESTED
rem         SECONDARY
rem         BITMAP  (and BITMAP JOIN)
rem         FUNCTION-BASED BITMAP
rem         CLUSTER
rem         ANSI ?
rem
rem     Probably not possible
rem         DOMAIN
rem         FUNCTION-BASED DOMAIN
rem
rem     Need to avoid partitioned, temporary, unusable and dropped indexes
rem

spool index_est_proc_2

set verify off
set serveroutput on size 1000000 format wrapped

define  m_owner         = '&m_schemaname'
define m_blocksize = 8192
define  m_target_use    = 90 -- equates to pctfree 10
define  m_scale_factor  = 0.6
define m_minimum = 10000
define m_overhead = 192 -- leaf block "lost" space in index_stats

declare
    m_leaf_estimate number;
begin
    for r in (
        select
            table_owner,
            table_name,
            owner       index_owner,
            index_name,
            leaf_blocks
        from
            dba_indexes
        where
            owner = upper('&m_owner')
        and index_type in (
                'NORMAL',
                'NORMAL/REV',
                'FUNCTION-BASED NORMAL'
            )
        and partitioned = 'NO'
        and temporary = 'N'
        and dropped = 'NO'
        and status = 'VALID'
        and last_analyzed is not null
        order by
            owner, table_name, index_name
    ) loop

        if r.leaf_blocks > &m_minimum then
            select
                round(
                    100 / &m_target_use *       -- assumed packing efficiency
                    (
                        ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) +
                        sum(
                            (tc.avg_col_len) *
                            (tab.num_rows)
                        )           -- column data bytes
                    ) / (&m_blocksize - &m_overhead)
                )               index_leaf_estimate
                into    m_leaf_estimate
            from
                (
                select  /*+ no_merge */
                    table_name,
                    num_rows,
                    decode(partitioned,'YES',10,6) rowid_length
                from
                    dba_tables
                where
                    table_name  = r.table_name
                and owner       = r.table_owner
                )               tab,
                (
                select  /*+ no_merge */
                    index_name,
                    index_type,
                    num_rows,
                    decode(uniqueness,'UNIQUE',0,1) uniq_ind
                from
                    dba_indexes
                where
                    table_owner = r.table_owner
                and table_name  = r.table_name
                and owner       = r.index_owner
                and index_name  = r.index_name
                )               ind,
                (
                select  /*+ no_merge */
                    column_name
                from
                    dba_ind_columns
                where
                    table_owner = r.table_owner
                and index_owner = r.index_owner
                and table_name  = r.table_name
                and index_name  = r.index_name
                )               ic,
                (
                select  /*+ no_merge */
                    column_name,
                    avg_col_len
                from
                    dba_tab_cols
                where
                    owner       = r.table_owner
                and table_name  = r.table_name
                )               tc
            where
                tc.column_name = ic.column_name
            group by
                ind.num_rows,
                ind.uniq_ind,
                tab.rowid_length
            ;

            if m_leaf_estimate < &m_scale_factor * r.leaf_blocks then

                dbms_output.put_line(
                    to_char(sysdate,'hh24:mi:ss') || ': ' ||
                    trim(r.table_name) || ' - ' ||
                    trim(r.index_name)
                );

                dbms_output.put_line(
                    'Current Leaf blocks: ' ||
                    to_char(r.leaf_blocks,'999,999,999') ||
                    '         Target size: ' ||
                    to_char(m_leaf_estimate,'999,999,999')
                );

                dbms_output.new_line;

            end if;
        end if;
    end loop;
end;
/
set verify on

spool off

set doc off
doc
A recent sample from my SYS schema (with a minimum of 100 blocks)

15:19:30: HISTGRM$ - I_H_OBJ#_COL#
Current Leaf blocks:          110         Target size:           51

15:19:30: OBJAUTH$ - I_OBJAUTH2
Current Leaf blocks:          107         Target size:           58

15:19:30: WRI$_OPTSTAT_HISTHEAD_HISTORY - I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST
Current Leaf blocks:          180         Target size:           99
#

15 Comments »

  1. [...] a script to describe the indexes on a single table – column and statistical information Oversized Indexes: a data dictionary scan for simple B-tree indexes that may be unreasonably large for the data [...]

    Pingback by Simple scripts « Oracle Scratchpad — February 28, 2010 @ 3:38 pm BST Feb 28,2010 | Reply

  2. [...] A little while ago, while applying the concept on a client site and implementing a mechanism that would allow the DBAs to do the same check as regularly as needed, I discovered a couple of flaws in the original code – so I had to spend a little time correcting it, and thought I’d pass on the latest version. [...]

    Pingback by Index Efficiency 2 « Oracle Scratchpad — February 28, 2010 @ 3:40 pm BST Feb 28,2010 | Reply

  3. Note that SELECT .. INTO deriving “m_leaf_estimate” appears to fail with ‘ORA-01403: No data found’ for indexes created on object type attributes because “COLUMN_NAME2 in “DBA_IND_COLUMNS” is not a simple identifier and hence does not join to column_name in “DBA_TAB_COLS”.

    However I believe this identifier is reflected in “DBA_TAB_COLS”.”QUALIFIED_COL_NAME” and so indexes of this type might be supported by joining to “DBA_TAB_COLS” on the same – although I do not know for sure whether the “AVG_COL_LEN” would truly reflect the data contained and hence be valid for this purpose.

    Having said that indexes on object type columns are relatively few and far between and those implemented in the data dictionary are unlikely to be considered by this report unless heavily using Workspace Manager (WM_SYS schema), XML (XDB schema) or the Data Pump (SYS schema) and running it against the schemas concerned.

    As an aside I was surprised to note that the index on “SYS”.”AQ$_KUPC$DATAPUMP_QUETAB_P” (TREAT(“USER_DATA” AS “KUPC$_WORKER_LOG_ENTRY”).”WIP”) was regarded as a ‘NORMAL’ index and not a ‘FUNCTION-BASED’ one.

    Comment by padders — March 1, 2010 @ 3:16 pm BST Mar 1,2010 | Reply

  4. Padders,

    Thanks for the extra information; it’s always useful to hear about the special cases. I’ll leave it as an exercise for someone else to test, though.

    One thought, if anyone does want to follow this – my general strategy for handling “rare” cases is to write a separate script; it’s much safer and simpler than trying to write a “one script does everything … but no-one dare change it” type of thing.

    Comment by Jonathan Lewis — March 2, 2010 @ 8:48 pm BST Mar 2,2010 | Reply

  5. [...] explain plan for every index in your system – so in a future note I’ll be showing you a simple piece of code that you might feel happy to run against every (simple, B-tree) index in your database. Eco World [...]

    Pingback by Index Size « Oracle Scratchpad — August 26, 2011 @ 6:31 am BST Aug 26,2011 | Reply

  6. [...] article also includes some related links. The first link is to a script that checks all the indexes for a spesific schema. By the way – DO READ the notes in the [...]

    Pingback by jcon.no: Oracle Blog - When to rebuild an Oracle index? — January 5, 2012 @ 10:27 am BST Jan 5,2012 | Reply

  7. Jonathan,

    I want to ask two questions related with B*TREE indexes :

    1. What are the things/statistics which tells /indicate that an index need to be rebuild. There are rare cases as I have learned where index rebuild can actually help, can you please give few cases where it really helps.
    2. What are the statistics using which optimizer decide to use one index over another in a table.

    Thanks
    Ajeet

    Comment by Ajeet — January 28, 2012 @ 5:18 pm BST Jan 28,2012 | Reply

  8. Hello Jonathan,
    On line 165 above, instead of tab.num_rows, I think we should have ind.num_rows. Rationale is to account for NULLs. Let’s say index is on single column and half the table rows have a null on this column, thus index rows would be half of those on table. Similar with a composite index when all columns have nulls. What do you think?
    Thanks — Carlos

    Comment by Carlos Sierra — March 30, 2012 @ 9:44 pm BST Mar 30,2012 | Reply

    • Carlos,

      The choice of tab.num_rows was deliberate, and based on the fact the avg_col_len already makes allowance for nulls.

      Every index entry has a rowid, lock byte, directory entry etc. and that’s why I have a component that uses ind.num_rows at line 162; but the “data” content of the index is dictated by the data content of the table, irrepective of how the columns are distributed across index entries. It’s probably best to see this through an example (engineered to be an extreme case):

      create table t1(v1 char(6), v2 char(6));
      insert into t1
      select 
              decode(mod(rownum,2),0,'xxxxxx'),
              'xxxxxx'
      from
              all_objects
      where   rownum <= 1000
      ;
      
      execute dbms_stats.gather_table_stats(user,'t1')
      
      select  column_name, avg_col_len
      from    user_tab_cols
      where   table_name = 'T1'
      ;
      
      COLUMN_NAME          AVG_COL_LEN
      -------------------- -----------
      V1                             4
      V2                             7
      
      3 rows selected.
      

      Although every populated column has a data content length of 6 bytes, the average (content) is 3 for v1 because half the rows are null. The reported length includes the one extra byte required for the length byte that precedes the column content which is why we see 4 and 7 rather than 3 and 6 in the column stats.

      If I had an index on just v1, then ind.num_rows would be 50 because only half the table rows have a value for v1; so if I multiplied the average column length by the number of index entries I would have applied the “half the rows in the table” factor twice. By using tab.num_rows I scale the average column length back up to “total space taken by column” in both the table segment and index segment.

      The choice does introduce an error, of course, in as the average column length is rounded up to an integer, leading to an over-estimate. There’s a further error due to the addition of the length byte which I could correct if I used (avg_col_len – 1) multipled tab.num_rows then re-introduced as ind.num_rows * “number of columns in index”.

      Comment by Jonathan Lewis — March 31, 2012 @ 8:40 am BST Mar 31,2012 | Reply

  9. Jonathan,

    It makes perfect sense now. I had overlooked that avg_col_len already considered nulls. Thanks for the clarification!

    Carlos

    Comment by carlossierrausa — April 1, 2012 @ 11:57 am BST Apr 1,2012 | Reply

  10. > ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4)

    Why ind.uniq_ind is not multiplied by tab.rowid_length?

    Comment by Morp — January 18, 2013 @ 12:27 pm BST Jan 18,2013 | Reply

    • Morp,

      The extra byte appears only once per index entry (it’s the byte that says the rowid is 6, or 10, bytes long), so we add it to the accumulated length just once.

      Comment by Jonathan Lewis — January 18, 2013 @ 12:59 pm BST Jan 18,2013 | Reply

  11. And what about partitioned indexes? Is there a the way to estimate column size for index partition?
    Thnx.

    Comment by morp — July 21, 2013 @ 6:30 am BST Jul 21,2013 | Reply

    • morp,

      Looking through my library I see I have a matching set of 4 scripts, one each for:
      * global indexes
      * globally partitioned indexes
      * local (simple) partitioned indexes
      * local composite partitioned indexes

      I’ll have to read through them (the notes are a bit sparse) to check what they do. In simple terms the local partitioning requires you do apply the same sort of logic to one (sub)partition of the table and the matching (sub)partition of the index at the same time. The globally partitioned index is the tricky one because you have to do some sort of approximation relating the relative sizes of the partition to the total size of the table.

      Do you have a specific type that you’re interested – I’ll see if I can find some time over the next week or two to start publishing the scripts.

      Comment by Jonathan Lewis — July 24, 2013 @ 2:32 pm BST Jul 24,2013 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,102 other followers