(As copied by Oracle Corp. in Metalink note 989186.1)
rem rem Script: index_est_proc_2.sql rem Author: Jonathan Lewis rem Dated: August 2005 (updated Apr 2009) rem Purpose: Fast analysis of indexes to help identify rem extreme degeneration. rem rem Last tested rem 11.1.0.7 rem 10.2.0.3 rem 10.1.0.4 rem 9.2.0.8 rem 8.1.7.4 rem Not tested rem 11.2.0.1 rem 10.2.0.4 rem rem Usage: rem Set the values in the "define" section rem Log on with the privilege to see the "dba_" views rem using SQL*Plus and run the script. rem rem Notes: rem This script assumes that statistics have been collected in rem the fairly recent past, and uses some approximations to rem compare the number of leaf blocks with the number of leaf rem blocks that ought to be needed to hold the data. rem rem There are various little oddities with the way that rem (a) Oracle calculates average column lenght and rem (b) I use the available data rem that mean that at small sizes and in extreme cases the rem numbers I produce can be wrong. In particular, for indexes rem where a lot of the table data has nulls (so no entry in the rem index), the estimated size can be significantly larger than rem they finally turn out to be. rem rem rem Targets rem ======= rem Where the estimate is very much smaller than the actual, then rem you may be looking at a "FIFO" index, emptying out in the past rem and filling in the future. This type of index is a candidate for rem a regular "coalesce" - although you may want to rebuild it once rem to get it to the right starting size and release excess space rem back to the tablespace. rem rem See https://jonathanlewis.wordpress.com/2008/09/26/index-analysis/ rem for an example and discussion on this type of index. rem rem Where the estimate is about half the size of the actual, then rem it is worth checking whether there is any special treatment of rem the data that is making this happen. 50% utilisation is fairly rem common in RAC for indexes based on a sequence with a large cache rem size, so it may be best to leave the indexes at that level. rem However, you may find that rebuilding (perhaps just once) with rem a pctfree in the region of 30% may give you a slightly more efficient rem index in non-RAC systems. rem rem If your index is running at 50% and is not strongly sequence based rem then you may be suffering from the concurrency/ITL bug and may want rem to rebuild the index and force a maxtrans setting into the index. rem rem If the index is running at a fairly uniform 25%, it may be subject rem to side effects of both sequencing and the concurrency effects. rem rem Usage: rem ====== rem This script takes a username (table owner), percent usage, and rem scaling factor. It reports the estimated leaf block count of rem all simple indexes for that schema where the size of the index rem would be smaller than the supplied fraction of the current size rem when rebuilt at the supplied percentage utilisation. Current settings rem are 90% (which equates to the default pctfree 10) and 0.6 which means rem the index would be running at about 50% empty wastage - which is the rem point at which it begins to be a possible target for investigation. rem The script does not report any index smaller than 10,000 leaf blocks, rem and assumes an 8KB block size. rem rem Technical notes: rem ================ rem Don't need to add a length byte after using dbms_stats rem Don't need a 'descending' byte because it's automatically included rem Don't need to adjust for num_nulls because it's automatically included rem Reverse key indexes don't affect column lengths rem rem Need to worry about COMPRESSED indexes. At present compression rem may reduce the size of an index so that I don't notice it should rem still be smaller than it is. rem rem Index types that can be used (with partitioned = 'NO') rem NORMAL rem NORMAL/REV rem FUNCTION-BASED NORMAL rem rem Still needs enhancing for partitioned and subpartitioned indexes rem Check dba_part_indexes for locality, partitioning_type, subpartitioning_type rem But does handle global indexes on partitioned tables. rem rem To investigate rem LOB rem IOT - TOP rem IOT - NESTED rem SECONDARY rem BITMAP (and BITMAP JOIN) rem FUNCTION-BASED BITMAP rem CLUSTER rem ANSI ? rem rem Probably not possible rem DOMAIN rem FUNCTION-BASED DOMAIN rem rem Need to avoid partitioned, temporary, unusable and dropped indexes rem spool index_est_proc_2 set verify off set serveroutput on size 1000000 format wrapped define m_owner = '&m_schemaname' define m_blocksize = 8192 define m_target_use = 90 -- equates to pctfree 10 define m_scale_factor = 0.6 define m_minimum = 10000 define m_overhead = 192 -- leaf block "lost" space in index_stats declare m_leaf_estimate number; begin for r in ( select table_owner, table_name, owner index_owner, index_name, leaf_blocks from dba_indexes where owner = upper('&m_owner') and index_type in ( 'NORMAL', 'NORMAL/REV', 'FUNCTION-BASED NORMAL' ) and partitioned = 'NO' and temporary = 'N' and dropped = 'NO' and status = 'VALID' and last_analyzed is not null order by owner, table_name, index_name ) loop if r.leaf_blocks > &m_minimum then select round( 100 / &m_target_use * -- assumed packing efficiency ( ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4) + sum( (tc.avg_col_len) * (tab.num_rows) ) -- column data bytes ) / (&m_blocksize - &m_overhead) ) index_leaf_estimate into m_leaf_estimate from ( select /*+ no_merge */ table_name, num_rows, decode(partitioned,'YES',10,6) rowid_length from dba_tables where table_name = r.table_name and owner = r.table_owner ) tab, ( select /*+ no_merge */ index_name, index_type, num_rows, decode(uniqueness,'UNIQUE',0,1) uniq_ind from dba_indexes where table_owner = r.table_owner and table_name = r.table_name and owner = r.index_owner and index_name = r.index_name ) ind, ( select /*+ no_merge */ column_name from dba_ind_columns where table_owner = r.table_owner and index_owner = r.index_owner and table_name = r.table_name and index_name = r.index_name ) ic, ( select /*+ no_merge */ column_name, avg_col_len from dba_tab_cols where owner = r.table_owner and table_name = r.table_name ) tc where tc.column_name = ic.column_name group by ind.num_rows, ind.uniq_ind, tab.rowid_length ; if m_leaf_estimate < &m_scale_factor * r.leaf_blocks then dbms_output.put_line( to_char(sysdate,'hh24:mi:ss') || ': ' || trim(r.table_name) || ' - ' || trim(r.index_name) ); dbms_output.put_line( 'Current Leaf blocks: ' || to_char(r.leaf_blocks,'999,999,999') || ' Target size: ' || to_char(m_leaf_estimate,'999,999,999') ); dbms_output.new_line; end if; end if; end loop; end; / set verify on spool off set doc off doc A recent sample from my SYS schema (with a minimum of 100 blocks) 15:19:30: HISTGRM$ - I_H_OBJ#_COL# Current Leaf blocks: 110 Target size: 51 15:19:30: OBJAUTH$ - I_OBJAUTH2 Current Leaf blocks: 107 Target size: 58 15:19:30: WRI$_OPTSTAT_HISTHEAD_HISTORY - I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST Current Leaf blocks: 180 Target size: 99 #
[…] a script to describe the indexes on a single table – column and statistical information Oversized Indexes: a data dictionary scan for simple B-tree indexes that may be unreasonably large for the data […]
Pingback by Simple scripts « Oracle Scratchpad — February 28, 2010 @ 3:38 pm GMT Feb 28,2010 |
[…] A little while ago, while applying the concept on a client site and implementing a mechanism that would allow the DBAs to do the same check as regularly as needed, I discovered a couple of flaws in the original code – so I had to spend a little time correcting it, and thought I’d pass on the latest version. […]
Pingback by Index Efficiency 2 « Oracle Scratchpad — February 28, 2010 @ 3:40 pm GMT Feb 28,2010 |
Note that SELECT .. INTO deriving “m_leaf_estimate” appears to fail with ‘ORA-01403: No data found’ for indexes created on object type attributes because “COLUMN_NAME2 in “DBA_IND_COLUMNS” is not a simple identifier and hence does not join to column_name in “DBA_TAB_COLS”.
However I believe this identifier is reflected in “DBA_TAB_COLS”.”QUALIFIED_COL_NAME” and so indexes of this type might be supported by joining to “DBA_TAB_COLS” on the same – although I do not know for sure whether the “AVG_COL_LEN” would truly reflect the data contained and hence be valid for this purpose.
Having said that indexes on object type columns are relatively few and far between and those implemented in the data dictionary are unlikely to be considered by this report unless heavily using Workspace Manager (WM_SYS schema), XML (XDB schema) or the Data Pump (SYS schema) and running it against the schemas concerned.
As an aside I was surprised to note that the index on “SYS”.”AQ$_KUPC$DATAPUMP_QUETAB_P” (TREAT(“USER_DATA” AS “KUPC$_WORKER_LOG_ENTRY”).”WIP”) was regarded as a ‘NORMAL’ index and not a ‘FUNCTION-BASED’ one.
Comment by padders — March 1, 2010 @ 3:16 pm GMT Mar 1,2010 |
Padders,
Thanks for the extra information; it’s always useful to hear about the special cases. I’ll leave it as an exercise for someone else to test, though.
One thought, if anyone does want to follow this – my general strategy for handling “rare” cases is to write a separate script; it’s much safer and simpler than trying to write a “one script does everything … but no-one dare change it” type of thing.
Comment by Jonathan Lewis — March 2, 2010 @ 8:48 pm GMT Mar 2,2010 |
[…] explain plan for every index in your system – so in a future note I’ll be showing you a simple piece of code that you might feel happy to run against every (simple, B-tree) index in your database. Eco World […]
Pingback by Index Size « Oracle Scratchpad — August 26, 2011 @ 6:31 am BST Aug 26,2011 |
[…] article also includes some related links. The first link is to a script that checks all the indexes for a spesific schema. By the way – DO READ the notes in the […]
Pingback by jcon.no: Oracle Blog - When to rebuild an Oracle index? — January 5, 2012 @ 10:27 am GMT Jan 5,2012 |
Jonathan,
I want to ask two questions related with B*TREE indexes :
1. What are the things/statistics which tells /indicate that an index need to be rebuild. There are rare cases as I have learned where index rebuild can actually help, can you please give few cases where it really helps.
2. What are the statistics using which optimizer decide to use one index over another in a table.
Thanks
Ajeet
Comment by Ajeet — January 28, 2012 @ 5:18 pm GMT Jan 28,2012 |
Ajeet,
1. I have an entire category on rebuilding indexes: https://jonathanlewis.wordpress.com/category/oracle/indexing/index-rebuilds/
2. The best answer to that question is in my book Cost Based Oracle – Fundamentals: I don’t have space to repeat it all in a comment, but as a starting point here’s a link to a short article I wrote several years ago: https://web.archive.org/web/20200509033755/www.jlcomp.demon.co.uk/12_using_index_i.html
Comment by Jonathan Lewis — January 28, 2012 @ 8:18 pm GMT Jan 28,2012 |
Hello Jonathan,
On line 165 above, instead of tab.num_rows, I think we should have ind.num_rows. Rationale is to account for NULLs. Let’s say index is on single column and half the table rows have a null on this column, thus index rows would be half of those on table. Similar with a composite index when all columns have nulls. What do you think?
Thanks — Carlos
Comment by Carlos Sierra — March 30, 2012 @ 9:44 pm BST Mar 30,2012 |
Carlos,
The choice of tab.num_rows was deliberate, and based on the fact the avg_col_len already makes allowance for nulls.
Every index entry has a rowid, lock byte, directory entry etc. and that’s why I have a component that uses ind.num_rows at line 162; but the “data” content of the index is dictated by the data content of the table, irrepective of how the columns are distributed across index entries. It’s probably best to see this through an example (engineered to be an extreme case):
Although every populated column has a data content length of 6 bytes, the average (content) is 3 for v1 because half the rows are null. The reported length includes the one extra byte required for the length byte that precedes the column content which is why we see 4 and 7 rather than 3 and 6 in the column stats.
If I had an index on just v1, then ind.num_rows would be 50 because only half the table rows have a value for v1; so if I multiplied the average column length by the number of index entries I would have applied the “half the rows in the table” factor twice. By using tab.num_rows I scale the average column length back up to “total space taken by column” in both the table segment and index segment.
The choice does introduce an error, of course, in as the average column length is rounded up to an integer, leading to an over-estimate. There’s a further error due to the addition of the length byte which I could correct if I used (avg_col_len – 1) multipled tab.num_rows then re-introduced as ind.num_rows * “number of columns in index”.
Comment by Jonathan Lewis — March 31, 2012 @ 8:40 am BST Mar 31,2012 |
Jonathan,
It makes perfect sense now. I had overlooked that avg_col_len already considered nulls. Thanks for the clarification!
Carlos
Comment by carlossierrausa — April 1, 2012 @ 11:57 am BST Apr 1,2012 |
> ind.num_rows * (tab.rowid_length + ind.uniq_ind + 4)
Why ind.uniq_ind is not multiplied by tab.rowid_length?
Comment by Morp — January 18, 2013 @ 12:27 pm GMT Jan 18,2013 |
Morp,
The extra byte appears only once per index entry (it’s the byte that says the rowid is 6, or 10, bytes long), so we add it to the accumulated length just once.
Comment by Jonathan Lewis — January 18, 2013 @ 12:59 pm GMT Jan 18,2013 |
And what about partitioned indexes? Is there a the way to estimate column size for index partition?
Thnx.
Comment by morp — July 21, 2013 @ 6:30 am BST Jul 21,2013 |
morp,
Looking through my library I see I have a matching set of 4 scripts, one each for:
* global indexes
* globally partitioned indexes
* local (simple) partitioned indexes
* local composite partitioned indexes
I’ll have to read through them (the notes are a bit sparse) to check what they do. In simple terms the local partitioning requires you do apply the same sort of logic to one (sub)partition of the table and the matching (sub)partition of the index at the same time. The globally partitioned index is the tricky one because you have to do some sort of approximation relating the relative sizes of the partition to the total size of the table.
Do you have a specific type that you’re interested – I’ll see if I can find some time over the next week or two to start publishing the scripts.
Comment by Jonathan Lewis — July 24, 2013 @ 2:32 pm BST Jul 24,2013 |
[…] used Jonathan Lewis script to locate degenerated indexes –-or indexes that are occupying more space than they should–. […]
Pingback by Index Efficiency | Mohamed Houri’s Oracle Notes — May 12, 2015 @ 8:03 am BST May 12,2015 |
I can’t understand.
Unique index differs from non-unique by the presence additional rowid, right?
Why you are adding row_id length _in_bytes_ with uniq_ind _flag_? Why are you not multiplying rowid_length by uniq_ind+1 ?
And what is the magic constant “4”?
Thank you.
Comment by teamer — July 13, 2016 @ 7:30 am BST Jul 13,2016 |
Teamer,
The 4 is the row overhead made up of two bytes for the row header plus 2 bytes for the row directory entry.
Plus 1 because the rowid appears in the index entry whether it’s unique or non-unique, but for unique indexes it’s “carried” as a fixed (known) size data item at the start of the entry while for non-unique indexes it appears as if it were an extra column in the index and is therefore preceded by a “length byte” like every other column in an index.
Regards
Jonathan Lewis
Comment by Jonathan Lewis — July 13, 2016 @ 6:05 pm BST Jul 13,2016 |
Sorry.
… non-unique differs from unique …
Comment by teamer — July 13, 2016 @ 7:31 am BST Jul 13,2016 |
[…] you don’t want to go through this by hand there’s a script on my blog which will do the work for all the indexes in a schema assuming you’ve recently collected […]
Pingback by Investigating Indexes – All Things Oracle — September 28, 2016 @ 3:15 pm BST Sep 28,2016 |
[…] ought to. Jonathan Lewis has helpfully anticipated this need by creating and sharing his Index Efficiency script, which I have been using extensively this last couple of years. This article aims to […]
Pingback by Oracle Partitioned Index Efficiency – All Things Oracle — February 9, 2017 @ 1:39 pm GMT Feb 9,2017 |
[…] That’s probably an indication of an exploded index due to the delete and its pattern. Let’s check this using a little bit enhanced Jonathan Lewis script […]
Pingback by Exploded index causes a wrong execution plan | Mohamed Houri’s Oracle Notes — September 11, 2018 @ 7:57 pm BST Sep 11,2018 |