If you’ve run the scripts from Analyze This, I hope you found that the query gave you two different execution plans. This is my output from the test (with a little cosmetic tidying):
=============
Using Analyze
=============
Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 3964K| | 344 |
|* 1 | HASH JOIN | | 10000 | 3964K| 3506K| 344 |
| 2 | TABLE ACCESS FULL| T1 | 10000 | 1982K| | 74 |
| 3 | TABLE ACCESS FULL| T2 | 10000 | 1982K| | 70 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
================
Using dbms_stats
================
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
-------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 4189K| | 360 |
|* 1 | HASH JOIN | | 10000 | 4189K| 3662K| 360 |
| 2 | TABLE ACCESS FULL| T2 | 10000 | 2001K| | 70 |
| 3 | TABLE ACCESS FULL| T1 | 10000 | 2187K| | 74 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T1"."ID"="T2"."ID")
The order of the tables in the hash join has changed, and the cost has changed because I switched from using the analyze command to the dbms_stats package. And this was on an example that had no indexes, and used a compute / 100% sample. So we can’t blame the change in plan on differences in the statistics the two calls were supposed to collect.
What’s happened ? Take a close look at the column and row statistics we get from the two commands.
Using Analyze:
select
table_name, avg_row_len
from user_tables
where table_name in ('T1','T2')
order by
table_name
;
TABLE_NAME AVG_ROW_LEN
-------------------- -----------
T1 332
T2 313
select
table_name, column_name, avg_col_len
from
user_tab_columns
where
table_name in ('T1','T2')
order by
table_name, column_id
;
TABLE_NAME COLUMN_NAME AVG_COL_LEN
-------------------- -------------------- -----------
T1 ID 3
N1 3
V00 10
V01 10
V02 10
V03 10
V04 10
V05 10
V06 10
V07 10
V08 10
V09 10
V10 10
V11 10
V12 10
V13 10
V14 10
V15 10
V16 10
V17 10
V18 10
V19 10
PADDING 100
T2 ID 3
N1 3
V100 200
PADDING 100
And the results of the queries after calling dbms_stats:
TABLE_NAME AVG_ROW_LEN
-------------------- -----------
T1 328
T2 309
TABLE_NAME COLUMN_NAME AVG_COL_LEN
-------------------- -------------------- -----------
T1 ID 4
N1 4
V00 11
V01 11
V02 11
V03 11
V04 11
V05 11
V06 11
V07 11
V08 11
V09 11
V10 11
V11 11
V12 11
V13 11
V14 11
V15 11
V16 11
V17 11
V18 11
V19 11
PADDING 101
T2 ID 4
N1 4
V100 201
PADDING 101
The avg_col_len includes the length byte(s) for dbms_stats, but not for analyze.
Conversely, the avg_row_len includes the in-line row overhead for analyze, but not for dbms_stats.
Since some of the optimizer arithmetic uses the avg_row_len, and some of it uses the avg_col_len, you can find that changing your statistics collection method can change some “borderline” execution plans.
Of course, you should have switched to using the dbms_stats package to collect statistics a few years ago – but not everyone did, and some people found they had to switch back fairly promptly because too many things seemed to go wrong … for no obvious reason.
Footnote: I’ve just rediscovered another note I’ve written about the difference between dbms_stats and analyze, and the effect that migrated or chained rows can have on plans if you vary your usage.

Nice post about that difference.
Just some feedback for you Jonathan. It just so happens I was presenting on “gathering stats” at a UKOUG SIG earlier this week and I asked the audience some questions, including who still use ANALYZE.
The audience was small, only 25 or so people, but of them 2 still used Oracle 9 as their main DB, about 4 or 5 used V10.1 and everyone else used 10.2, so about 70%. No one used 11 yet as their main platform but about 50% had some 11g systems.
As for gathering stats, no one was using ANALYZE anymore, which is the first time I’ve found no one still using ANALYZE as a matter of course. Even the V9 people had swapped to DBMS_STATS I am glad to say.
Several of us did admit to using ANALYZE for “quick tests” though :-)
Martin
Comment by mwidlake — April 2, 2010 @ 12:55 pm UTC Apr 2,2010 |
Martin,
Glad to hear the results of your poll. (And sorry to read about the impact of your presentation on stats)
Comment by Jonathan Lewis — April 3, 2010 @ 10:25 am UTC Apr 3,2010 |
Jonathan
Is there a difference between (or is there any possibility second one can behave different) ;
create index ix_t1_n1 on t1(n1)
and
create index ix_t1_n1 on t1(n1) compute statistics
I could not find additional/different work in 10046 trace for both operation and also data in user_indexes are same.
Comment by coskan — May 4, 2010 @ 11:08 pm UTC May 4,2010 |
Starting with 10g CREATE INDEX automatically collects optimizer statistics. The behavior is controlled with _optimizer_compute_index_stats parameter which is true by default in 10g.
Comment by Timur Akhmadeev — May 5, 2010 @ 6:46 am UTC May 5,2010 |
Timur,
Actually that change is my reason to ask. What I want to be sure is if compute stats any different than dbms_stats.gathar_index_stats and if it is which one is used default for index creation ?
Comment by coskan — May 5, 2010 @ 7:49 am UTC May 5,2010 |
I think it’s best to ask Oracle support for clarification of what parameters are used for gathering index statistics automatically on CREATE INDEX. I don’t know, but I’d guess it is probably the same as calling gather_index_stats without additional parameters specified. I prefer not to trust automatic statistics gathering and run dbms_stats.gather_table_stats with cascade=>true and parameters I want such as no_invalidate=>false, etc.
Comment by Timur Akhmadeev — May 5, 2010 @ 8:33 am UTC May 5,2010
I did another test
1- disable the _optimizer_compute_index_stats by setting false
2- create the index
3- gather stats with analyze index compute statistics
4- drop index
5- create index again
6- gather statistics with dbms_stats.gather_index_stats(username,indexname)
What I see on trace is dbms_stats using different sqls then compute stats but populate dba_indexes with same values at the end. Documentation of dbms_stats also says “estimate_percent : Percentage of rows to estimate (NULL means compute).”
I think dbms_stats might differ when it comes to partitioned indexes but it needs more testing which I am not able to do now.
Comment by coskan — May 5, 2010 @ 8:40 am UTC May 5,2010 |
[...] 6-Why to use DBMS_STATS instead of analyze ? Jonathan Lewis-Analyze this Jonathan Lewis-Analyze this 2 [...]
Pingback by Blogroll Report 26/03 /2010 – 02/04/2010 « Coskan’s Approach to Oracle — May 23, 2010 @ 10:21 pm UTC May 23,2010 |
[...] Another difference between analyze and dbms_stats Anomalies with counting continued row activity Share this:TwitterLike this:LikeBe the first to like this post. Leave a Comment [...]
Pingback by I Wish « Oracle Scratchpad — December 16, 2011 @ 6:32 pm UTC Dec 16,2011 |