Here’s a follow-up to a post I did some time ago about estimating the size of an index before you create it. The note describes dbms_stats.create_index_cost() procedure, and how it depends on the results of a call to explain plan. A recent question on the OTN database forum highlighted a bug in explain plan, however, which I can demonstrate very easily. I’ll start with a small amount of data to demonstrate the basic content that is used to calculate the index cost.
rem rem Script: index_create_cost.sql rem Author: Jonathan Lewis rem Dated: Jan 2012 rem create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 -- > comment to avoid wordpress format issue ) select rownum n1, case when mod(rownum,2) = 0 then rownum end n2, lpad(rownum,10,'0') v1, case when mod(rownum,2) = 0 then rpad('x',10) end v2, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1e5 -- > comment to avoid wordpress format issue ; begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T1', method_opt =>'for all columns size 1' ); end; / explain plan for create index t1_v1 on t1(v1); select * from table(dbms_xplan.display);
You’re likely to get some variation on the results below, particularly in the note about the estimated index size, but here’s the result I got last time I ran the test:
Explained. PLAN_TABLE_OUTPUT -------------------------------------------------------------------- Plan hash value: 2170349088 ---------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ---------------------------------------------------------------- | 0 | CREATE INDEX STATEMENT | | 100K| 1074K| 286 | | 1 | INDEX BUILD NON UNIQUE| T1_V1 | | | | | 2 | SORT CREATE INDEX | | 100K| 1074K| | | 3 | TABLE ACCESS FULL | T1 | 100K| 1074K| 278 | ---------------------------------------------------------------- Note ----- - cpu costing is off (consider enabling it) - estimated index size: 3145K bytes 15 rows selected.
As you can see, I’ve got an accurate estimate of the rows that will go into the index, with a reasonably accurate estimate of the total volume of data in the rows (column lengths only) that will go into the index. The estimated index size then allows for things like rowids, row overhead, pctfree, space management blocks, and extent sizes, resulting in a larger figure (significantly larger in this case given the relatively small length of the column used in the index).
Let’s repeat the experiment a few times with increasing numbers of rows in the table. Rather than give you the full output for each test, I’ll just produced the Rows and Bytes figure and the estimated size – the last three sets of figures are a little clue about what’s gone wrong:
Rows Bytes Est. Size ------ ------ --------- 1M 10MB 24MB 10M 104MB 243MB 100M 1049MB 2415MB 1000M 10GB 2483MB 10000M 102GB 2818MB ------ ------ --------- 176M 1846MB 4227MB 177M 1856MB 0B 180M 1888MB 67MB ------ ------ ---------
It looks like the estimated size is captured as a 32 bit number, so it rolls over to zero at roughly 4.3 billion. The bug is still there in 11.2.0.2, I haven’t yet checked 10.2.0.5 or 11.2.0.3 [But see comments below.]
Footnote:
Just in case you’re wondering, I didn’t actually create a table with 10 billion rows in it, I just used dbms_stats.set_table_stats() to tell Oracle that the table has 10 billion rows.
wow
great catch
larry and the team owe you $ for doing QA on their RDBMS product
especially in the age or “big data” and operational DW DBs, this really could be a problem, not just trivia for internals-DBAs
Comment by oracleman consulting — January 4, 2012 @ 6:37 pm GMT Jan 4,2012 |
I tried it on 11.2.0.3.0 64bit Linux.. it seems to be fixed.
Comment by Radoslav Golian — January 7, 2012 @ 8:12 am GMT Jan 7,2012 |
looks like it has been fixed in 10.2.0.3
Comment by raova — January 25, 2012 @ 2:28 pm GMT Jan 25,2012 |
Comment by raova — January 25, 2012 @ 2:34 pm GMT Jan 25,2012 |
Thanks for all the updates;
There’s a very good reason for that – the test I ran was on a 32-bit version of Oracle. Although it ought to be possible to write code that copes with working out and reporting larger numbers on a 32-bit system, the code doesn’t cope.
Comment by Jonathan Lewis — January 26, 2012 @ 9:09 pm GMT Jan 26,2012 |