Oracle Scratchpad

May 22, 2009

Index Size

Filed under: Indexing,Infrastructure,Oracle,Tuning — Jonathan Lewis @ 7:52 pm BST May 22,2009

How do you find indexes that might be worth the effort of rebuilding.

One option is to report indexes that take up much more space than they need to – and there are some fairly easy ways to find those indexes. 10g, for example, gave us the create_index_cost procedure in the dbms_space package so, for a single index, we can write a little routine that does something like the following example – running on a 10.2.0.3 database:


rem
rem     Script:         index_space.sql
rem     Dated:          Aug 2008
rem     Author:         J.P.Lewis
rem

declare
        m_used  number;
        m_alloc number;
begin
        dbms_space.create_index_cost(
                ddl             => 'create index t2_i1 on t2(n1, n2)',
                used_bytes      => m_used,
                alloc_bytes     => m_alloc
        );
        dbms_output.put_line(
                'Used bytes: ' || m_used ||
                '...allocated bytes: ' || m_alloc);
end;
/


Used bytes: 24000...allocated bytes: 131072

You may wonder why there is such a difference between used bytes and allocated bytes. There are two parts to the explanation, and to give you a clue to one part of the reason I’ll repeat the test but change the DDL slightly:

declare
        m_used  number;
        m_alloc number;
begin
        dbms_space.create_index_cost(
                ddl             => 'create index t2_i1 on t2(n1, n2) TABLESPACE TEST_8K',
                used_bytes      => m_used,
                alloc_bytes     => m_alloc
        );
        dbms_output.put_line(
                'Used bytes: ' || m_used ||
                '...allocated bytes: ' || m_alloc);
end;
/

Used bytes: 24000...allocated bytes: 1048576

After including the tablespace name in the create index statement the ‘allocated bytes’ has jumped from 128KB to 1MB.

Part of the explanation of the difference between allocated and used comes from tablespace extent sizing – after working out the space that the index needs Oracle then works out the extent allocation needed in the relevant tablespace to hold that index … and different tablespaces may have different space management definitions.

But here’s an unexpected detail – if you don’t specify a tablespace the code uses the default_tablespace of the sys schema as the target tablepace! This is a little odd; you might have expected the default to be the default_tablespace of the (nominal) owner of the index – especially since the dbms_space package is defined as authid current_user. This doesn’t change even if you include the index owner explicitly in the DDL. So make sure you specify the tablespace (and any other storage parameters you expect to use) in your create index statement or the allocated bytes may turn out to be a poor estimate of the final segment size.

But there’s more

My first test gave 24,000 bytes as the used bytes – and that should fit comfortably inside a single 64KB extent in the system tablespace – so why was the allocated space reported as 128KB / two extents?

Another little demonstration, using a relatively new feature of explain plan,  gives us a clue:

explain plan for
create index t2_i1 on t2(n1, n2)
;

select * from table(dbms_xplan.display(null,null,'-cost'));

PLAN_TABLE_OUTPUT
----------------------------
Plan hash value: 2685098699

-------------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Bytes | Time     |
-------------------------------------------------------------------
|   0 | CREATE INDEX STATEMENT |       |  3000 | 24000 | 00:00:01 |
|   1 |  INDEX BUILD NON UNIQUE| T2_I1 |       |       |          |
|   2 |   SORT CREATE INDEX    |       |  3000 | 24000 |          |
|   3 |    TABLE ACCESS FULL   | T2    |  3000 | 24000 | 00:00:01 |
-------------------------------------------------------------------

Note
-----
   - estimated index size: 131K bytes

14 rows selected.

Notice the value in the Bytes column of operation 3; this is the value reported as used bytes by the create_index_cost procedure – and if you enable sql_trace when running that procedure you will find that it really does call explain plan to generate the figures it reports. (There is a fourth optional parameter to the procedure that can be used to specify the plan_table that should be used to do this.)

This figure is simply: dba_tables.num_rows * sum(dba_tab_columns.avg_col_len). It doesn’t allow for the various overheads of an index entry – i.e. the stored rowid, the lock byte, the flag byte, the length byte(s), or the two byte pointer to each row in the leaf block’s row directory – but it looks as if this information is included as a step in the final extent calculations.

This arithmetic introduces another pair of errors: first, the sizing estimate doesn’t allow for index compression, [update: even in 19c] so the estimate can be much larger than it really should be; secondly the avg_col_len is calculated as a weighted average that allows for nulls then rounds up to the nearest whole number so in cases where there are lots of nulls in a column it’s easy for the final estimate to be too big.

Conclusion:

Oracle offers us a packaged procedure that makes it easy to find out roughly how big an index ought to be, and how much space it ought to take up – but it has a number of flaws and should be treated with a little caution.

Of course, it’s also a little heavy handed to call this procedure (and the implicit explain plan for every index in your system – so in a future note I’ll be showing you a simple piece of code that you might feel happy to run against every (simple, B-tree) index in your database.

9 Comments »

  1. Jonathan,

    I think the EXPLAIN PLAN on CREATE INDEX functionality is really quite limited. It starts with the dependency on the available statistics, not considering the number of nulls defined for the column (at least for the “ROWS” shown, may be indirectly for the size by using the AVG_COL_LEN) and doesn’t end with no reasonable support for function-based indexes and their expressions (which is challenging I have to admit).

    Not to mention the shortcomings you’ve pointed out: No compression considered and the “tablespace” oddity.

    Some details about my findings in this regard: http://oracle-randolf.blogspot.com/2009/02/explain-plan-on-ddls.html

    Regards,
    Randolf

    Comment by Randolf Geist — May 22, 2009 @ 8:45 pm BST May 22,2009 | Reply

  2. For estimating new indexes, I’d welcome a good guesstimate.

    But I am weary of any (space-, size-)esitmate.
    Reality tends to invalidate any size-estimate by roughly a factor of 10 (pi-square rounded to a whole number) before the year is over.

    Index-Compression (of suitable indexes) is an under-used feature and a bit of promotion for that would be good.

    But for existing indexes,
    what about the old-fashioned:

    SQL> analyze index abc_pk VALIDATE STRUCTURE ;

    Followed by a select from INDEX_STATS ?

    (also a bit of work, as it needs to be done one-at-a-time, and I’m fairly lazy myself)

    Comment by PdV — May 23, 2009 @ 7:06 pm BST May 23,2009 | Reply

    • The most obvious problem with that analyze call is that it tries to lock the table in share mode (mode 4) using the nowait option. If anyone is updating the table the analyze fails with Oracle error ORA-00054; if the lock is acquired then anyone attempting to update the table has to wait for the analyze to complete.

      Comment by Jonathan Lewis — May 26, 2009 @ 11:07 am BST May 26,2009 | Reply

  3. […] first is by Jonathan Lewis who discusses in a post called Index Size how one might determine which indexes to perhaps consider for an index rebuild by using the […]

    Pingback by Two Excellent Index Related Blog Posts « Richard Foote’s Oracle Blog — May 25, 2009 @ 11:15 am BST May 25,2009 | Reply

  4. unlike create_table_cost, create_index_cost doesn’t contain a parameter row_count , so how does estimation work for different row count?

    Comment by george — June 2, 2009 @ 5:03 pm BST Jun 2,2009 | Reply

    • George,

      The important difference between create_table_cost and create_index_cost is that you supply the correct DDL for creating an index on an existing table to create_index_cost – and Oracle uses the table-related stats in the data dictionary to do its arithmetic – whereas you simply provide a list of column types and sizes (or just a row size) with some other numbers to use create_table_cost.

      It shouldn’t be too difficult for Oracle to include a routine to take a DDL statement for ‘create table’ and do the same sort of thing. (In fact, I think I’ve seen one such routine but I can’t remember where). Of course, you could just take the top line output from dbms_xplan.display() for ‘explain for create table…’ and use the Rows and Bytes columns to work out the space requirement by hand.

      Comment by Jonathan Lewis — June 3, 2009 @ 6:44 pm BST Jun 3,2009 | Reply

  5. […] a follow-up to a post I did some time ago about estimating the size of an index before you create it. The note describes dbms_stats.create_index_cost() procedure, and how it […]

    Pingback by Index size bug « Oracle Scratchpad — January 4, 2012 @ 5:32 pm GMT Jan 4,2012 | Reply

  6. […] next following days). So the question is “What indexes should be rebuilt?”. I just read this article from Jonathan Lewis that might just give us a possible answer. In 10g Oracle introduced the […]

    Pingback by jcon.no: Oracle Blog - When to rebuild an Oracle index? — January 5, 2012 @ 10:26 am GMT Jan 5,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Comments and related questions are welcome.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.