If you’ve run the scripts from Analyze This, I hope you found that the query gave you two different execution plans. This is my output from the test (with a little cosmetic tidying):
============= Using Analyze ============= Execution Plan ---------------------------------------------------------- Plan hash value: 1838229974 ------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 3964K| | 344 | |* 1 | HASH JOIN | | 10000 | 3964K| 3506K| 344 | | 2 | TABLE ACCESS FULL| T1 | 10000 | 1982K| | 74 | | 3 | TABLE ACCESS FULL| T2 | 10000 | 1982K| | 70 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."ID"="T2"."ID") ================ Using dbms_stats ================ Execution Plan ---------------------------------------------------------- Plan hash value: 2959412835 ------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10000 | 4189K| | 360 | |* 1 | HASH JOIN | | 10000 | 4189K| 3662K| 360 | | 2 | TABLE ACCESS FULL| T2 | 10000 | 2001K| | 70 | | 3 | TABLE ACCESS FULL| T1 | 10000 | 2187K| | 74 | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("T1"."ID"="T2"."ID")
The order that the tables appear in the execution plan has changed, and the cost of the plan has changed because I switched from using the analyze command to the dbms_stats package. And this was on an example that had no indexes, and used a compute / 100% sample. So we can’t blame the change in plan on differences in the statistics the two calls were supposed to collect.
What’s happened ? Take a close look at the column and row statistics we get from the two commands.
Using Analyze:
select table_name, avg_row_len from user_tables where table_name in ('T1','T2') order by table_name ; TABLE_NAME AVG_ROW_LEN -------------------- ----------- T1 332 T2 313 select table_name, column_name, avg_col_len from user_tab_columns where table_name in ('T1','T2') order by table_name, column_id ; TABLE_NAME COLUMN_NAME AVG_COL_LEN -------------------- -------------------- ----------- T1 ID 3 N1 3 V00 10 V01 10 V02 10 V03 10 V04 10 V05 10 V06 10 V07 10 V08 10 V09 10 V10 10 V11 10 V12 10 V13 10 V14 10 V15 10 V16 10 V17 10 V18 10 V19 10 PADDING 100 T2 ID 3 N1 3 V100 200 PADDING 100
Using dbms_stats:
TABLE_NAME AVG_ROW_LEN -------------------- ----------- T1 328 T2 309 TABLE_NAME COLUMN_NAME AVG_COL_LEN -------------------- -------------------- ----------- T1 ID 4 N1 4 V00 11 V01 11 V02 11 V03 11 V04 11 V05 11 V06 11 V07 11 V08 11 V09 11 V10 11 V11 11 V12 11 V13 11 V14 11 V15 11 V16 11 V17 11 V18 11 V19 11 PADDING 101 T2 ID 4 N1 4 V100 201 PADDING 101
The avg_col_len includes the length byte(s) for dbms_stats, but not for analyze.
Conversely, the avg_row_len includes the in-line row overhead (flag byte, lock byte, column count) for analyze but not for dbms_stats. (The row lengths for dbms_stats are one less than they should be in this report, but that’s a detail that’s been corrected in 11g.)
Since some of the optimizer arithmetic uses the avg_row_len, and some of it uses the avg_col_len, you can find that changing your statistics collection method can actually result in changes of execution plans in borderline cases.
Of course, you should have switched to using the dbms_stats package to collect statistics a few years ago – but not everyone did, and some people found they had to switch back fairly promptly because too many things seemed to go wrong … for no obvious reason.
Footnote:
I’ve just rediscovered another note I’ve written about the difference between dbms_stats and analyze, and the effect that migrated or chained rows can have on plans if you vary your usage.
Footnote 2 (hash join table order):
A side point in this particular case is the demonstration that in a hash join the optimizer chooses the table that returns the smaller data volume (i.e. it’s looking at bytes not rows) as the build (first child) table even if the cost of acquiring that data is larger than the cost of acquiring the data from the probe (second child) table. This would have been a little more obvious if, for example, I’d generated the v00 column in table t1 as lpad(rownum,5,’0′) in which case the plan from the analyze call would have shown t1 returning 1933K bytes instead of 1982K bytes; or if I’d omitted one of the small columns (e.g. v19) from the query.
Nice post about that difference.
Just some feedback for you Jonathan. It just so happens I was presenting on “gathering stats” at a UKOUG SIG earlier this week and I asked the audience some questions, including who still use ANALYZE.
The audience was small, only 25 or so people, but of them 2 still used Oracle 9 as their main DB, about 4 or 5 used V10.1 and everyone else used 10.2, so about 70%. No one used 11 yet as their main platform but about 50% had some 11g systems.
As for gathering stats, no one was using ANALYZE anymore, which is the first time I’ve found no one still using ANALYZE as a matter of course. Even the V9 people had swapped to DBMS_STATS I am glad to say.
Several of us did admit to using ANALYZE for “quick tests” though :-)
Martin
Comment by mwidlake — April 2, 2010 @ 12:55 pm BST Apr 2,2010 |
Martin,
Glad to hear the results of your poll. (And sorry to read about the impact of your presentation on stats)
Comment by Jonathan Lewis — April 3, 2010 @ 10:25 am BST Apr 3,2010 |
Jonathan
Is there a difference between (or is there any possibility second one can behave different) ;
create index ix_t1_n1 on t1(n1)
and
create index ix_t1_n1 on t1(n1) compute statistics
I could not find additional/different work in 10046 trace for both operation and also data in user_indexes are same.
Comment by coskan — May 4, 2010 @ 11:08 pm BST May 4,2010 |
Starting with 10g CREATE INDEX automatically collects optimizer statistics. The behavior is controlled with _optimizer_compute_index_stats parameter which is true by default in 10g.
Comment by Timur Akhmadeev — May 5, 2010 @ 6:46 am BST May 5,2010 |
Timur,
Actually that change is my reason to ask. What I want to be sure is if compute stats any different than dbms_stats.gathar_index_stats and if it is which one is used default for index creation ?
Comment by coskan — May 5, 2010 @ 7:49 am BST May 5,2010 |
I think it’s best to ask Oracle support for clarification of what parameters are used for gathering index statistics automatically on CREATE INDEX. I don’t know, but I’d guess it is probably the same as calling gather_index_stats without additional parameters specified. I prefer not to trust automatic statistics gathering and run dbms_stats.gather_table_stats with cascade=>true and parameters I want such as no_invalidate=>false, etc.
Comment by Timur Akhmadeev — May 5, 2010 @ 8:33 am BST May 5,2010
I did another test
1- disable the _optimizer_compute_index_stats by setting false
2- create the index
3- gather stats with analyze index compute statistics
4- drop index
5- create index again
6- gather statistics with dbms_stats.gather_index_stats(username,indexname)
What I see on trace is dbms_stats using different sqls then compute stats but populate dba_indexes with same values at the end. Documentation of dbms_stats also says “estimate_percent : Percentage of rows to estimate (NULL means compute).”
I think dbms_stats might differ when it comes to partitioned indexes but it needs more testing which I am not able to do now.
Comment by coskan — May 5, 2010 @ 8:40 am BST May 5,2010 |
[…] 6-Why to use DBMS_STATS instead of analyze ? Jonathan Lewis-Analyze this Jonathan Lewis-Analyze this 2 […]
Pingback by Blogroll Report 26/03 /2010 – 02/04/2010 « Coskan’s Approach to Oracle — May 23, 2010 @ 10:21 pm BST May 23,2010 |
[…] Another difference between analyze and dbms_stats Anomalies with counting continued row activity Share this:TwitterLike this:LikeBe the first to like this post. Leave a Comment […]
Pingback by I Wish « Oracle Scratchpad — December 16, 2011 @ 6:32 pm GMT Dec 16,2011 |
[…] neither piece of code (yet) allows for the fact that the column is virtual. (If you use the analyze command it, I think it takes the row length from the block by comparing address offsets within the […]
Pingback by Virtual Stats | Oracle Scratchpad — January 16, 2014 @ 8:34 am GMT Jan 16,2014 |
[…] command to gather stats on the two tables you’ll find that the avg_row_len of table t1 then allows for the LOB locator […]
Pingback by Quiz Night | Oracle Scratchpad — February 12, 2016 @ 8:52 am GMT Feb 12,2016 |
[…] sys_op_opnsize() values rounded up plus 1 (for the “column count” byte); and – because the number of columns in the table is small – the avg_row_len is very similar to the sum of the […]
Pingback by LOB length | Oracle Scratchpad — May 2, 2019 @ 11:32 am BST May 2,2019 |
[…] correctly, which results in a massive under-estimate for the avg_row_len (which is basically the sum of avg_col_len for the table). Here’s an example (run on 12.1.0.2, based on some code to create an SDO […]
Pingback by Spatial space | Oracle Scratchpad — October 14, 2019 @ 11:10 am BST Oct 14,2019 |
[…] the average length of the varchar2() column is 21 – remember that the avg_col_len includes the length byte(s) when we collect stats with the dbms_stats […]
Pingback by In-row LOBs | Oracle Scratchpad — November 30, 2020 @ 11:27 am GMT Nov 30,2020 |