Oracle Scratchpad

April 1, 2010

Analyze This – 2

Filed under: Execution plans,Infrastructure,Statistics — Jonathan Lewis @ 7:58 pm BST Apr 1,2010

If you’ve run the scripts from Analyze This, I hope you found that the query gave you two different execution plans. This is my output from the test (with a little cosmetic tidying):

=============
Using Analyze
=============

Execution Plan
----------------------------------------------------------
Plan hash value: 1838229974

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |  3964K|       |   344 |
|*  1 |  HASH JOIN         |      | 10000 |  3964K|  3506K|   344 |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |  1982K|       |    74 |
|   3 |   TABLE ACCESS FULL| T2   | 10000 |  1982K|       |    70 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="T2"."ID")

================
Using dbms_stats
================

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 10000 |  4189K|       |   360 |
|*  1 |  HASH JOIN         |      | 10000 |  4189K|  3662K|   360 |
|   2 |   TABLE ACCESS FULL| T2   | 10000 |  2001K|       |    70 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |  2187K|       |    74 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="T2"."ID")

The order of the tables in the hash join has changed, and the cost has changed because I switched from using the analyze command to the dbms_stats package. And this was on an example that had no indexes, and used a compute / 100% sample. So we can’t blame the change in plan on differences in the statistics the two calls were supposed to collect.

What’s happened ? Take a close look at the column and row statistics we get from the two commands.
Using Analyze:

select
	table_name, avg_row_len
from	user_tables
where	table_name in ('T1','T2')
order by
	table_name
;

TABLE_NAME           AVG_ROW_LEN
-------------------- -----------
T1                           332
T2                           313

select
	table_name, column_name, avg_col_len
from
	user_tab_columns
where
	table_name in ('T1','T2')
order by
	table_name, column_id
;

TABLE_NAME           COLUMN_NAME          AVG_COL_LEN
-------------------- -------------------- -----------
T1                   ID                             3
                     N1                             3
                     V00                           10
                     V01                           10
                     V02                           10
                     V03                           10
                     V04                           10
                     V05                           10
                     V06                           10
                     V07                           10
                     V08                           10
                     V09                           10
                     V10                           10
                     V11                           10
                     V12                           10
                     V13                           10
                     V14                           10
                     V15                           10
                     V16                           10
                     V17                           10
                     V18                           10
                     V19                           10
                     PADDING                      100

T2                   ID                             3
                     N1                             3
                     V100                         200
                     PADDING                      100

And the results of the queries after calling dbms_stats:

TABLE_NAME           AVG_ROW_LEN
-------------------- -----------
T1                           328
T2                           309

TABLE_NAME           COLUMN_NAME          AVG_COL_LEN
-------------------- -------------------- -----------
T1                   ID                             4
                     N1                             4
                     V00                           11
                     V01                           11
                     V02                           11
                     V03                           11
                     V04                           11
                     V05                           11
                     V06                           11
                     V07                           11
                     V08                           11
                     V09                           11
                     V10                           11
                     V11                           11
                     V12                           11
                     V13                           11
                     V14                           11
                     V15                           11
                     V16                           11
                     V17                           11
                     V18                           11
                     V19                           11
                     PADDING                      101

T2                   ID                             4
                     N1                             4
                     V100                         201
                     PADDING                      101

The avg_col_len includes the length byte(s) for dbms_stats, but not for analyze.
Conversely, the avg_row_len includes the in-line row overhead for analyze, but not for dbms_stats.

Since some of the optimizer arithmetic uses the avg_row_len, and some of it uses the avg_col_len, you can find that changing your statistics collection method can change some “borderline” execution plans.

Of course, you should have switched to using the dbms_stats package to collect statistics a few years ago – but not everyone did, and some people found they had to switch back fairly promptly because too many things seemed to go wrong … for no obvious reason.

Footnote: I’ve just rediscovered another note I’ve written about the difference between dbms_stats and analyze, and the effect that migrated or chained rows can have on plans if you vary your usage.

10 Comments »

  1. Nice post about that difference.

    Just some feedback for you Jonathan. It just so happens I was presenting on “gathering stats” at a UKOUG SIG earlier this week and I asked the audience some questions, including who still use ANALYZE.

    The audience was small, only 25 or so people, but of them 2 still used Oracle 9 as their main DB, about 4 or 5 used V10.1 and everyone else used 10.2, so about 70%. No one used 11 yet as their main platform but about 50% had some 11g systems.

    As for gathering stats, no one was using ANALYZE anymore, which is the first time I’ve found no one still using ANALYZE as a matter of course. Even the V9 people had swapped to DBMS_STATS I am glad to say.

    Several of us did admit to using ANALYZE for “quick tests” though :-)

    Martin

    Comment by mwidlake — April 2, 2010 @ 12:55 pm BST Apr 2,2010 | Reply

  2. Martin,

    Glad to hear the results of your poll. (And sorry to read about the impact of your presentation on stats)

    Comment by Jonathan Lewis — April 3, 2010 @ 10:25 am BST Apr 3,2010 | Reply

  3. Jonathan

    Is there a difference between (or is there any possibility second one can behave different) ;

    create index ix_t1_n1 on t1(n1)

    and

    create index ix_t1_n1 on t1(n1) compute statistics

    I could not find additional/different work in 10046 trace for both operation and also data in user_indexes are same.

    Comment by coskan — May 4, 2010 @ 11:08 pm BST May 4,2010 | Reply

    • Starting with 10g CREATE INDEX automatically collects optimizer statistics. The behavior is controlled with _optimizer_compute_index_stats parameter which is true by default in 10g.

      Comment by Timur Akhmadeev — May 5, 2010 @ 6:46 am BST May 5,2010 | Reply

      • Timur,

        Actually that change is my reason to ask. What I want to be sure is if compute stats any different than dbms_stats.gathar_index_stats and if it is which one is used default for index creation ?

        Comment by coskan — May 5, 2010 @ 7:49 am BST May 5,2010 | Reply

        • I think it’s best to ask Oracle support for clarification of what parameters are used for gathering index statistics automatically on CREATE INDEX. I don’t know, but I’d guess it is probably the same as calling gather_index_stats without additional parameters specified. I prefer not to trust automatic statistics gathering and run dbms_stats.gather_table_stats with cascade=>true and parameters I want such as no_invalidate=>false, etc.

          Comment by Timur Akhmadeev — May 5, 2010 @ 8:33 am BST May 5,2010

  4. I did another test

    1- disable the _optimizer_compute_index_stats by setting false
    2- create the index
    3- gather stats with analyze index compute statistics
    4- drop index
    5- create index again
    6- gather statistics with dbms_stats.gather_index_stats(username,indexname)

    What I see on trace is dbms_stats using different sqls then compute stats but populate dba_indexes with same values at the end. Documentation of dbms_stats also says “estimate_percent : Percentage of rows to estimate (NULL means compute).”

    I think dbms_stats might differ when it comes to partitioned indexes but it needs more testing which I am not able to do now.

    Comment by coskan — May 5, 2010 @ 8:40 am BST May 5,2010 | Reply

  5. [...] 6-Why to use DBMS_STATS instead of analyze ? Jonathan Lewis-Analyze this Jonathan Lewis-Analyze this 2 [...]

    Pingback by Blogroll Report 26/03 /2010 – 02/04/2010 « Coskan’s Approach to Oracle — May 23, 2010 @ 10:21 pm BST May 23,2010 | Reply

  6. [...] Another difference between analyze and dbms_stats Anomalies with counting continued row activity Share this:TwitterLike this:LikeBe the first to like this post. Leave a Comment [...]

    Pingback by I Wish « Oracle Scratchpad — December 16, 2011 @ 6:32 pm BST Dec 16,2011 | Reply

  7. […] neither piece of code (yet) allows for the fact that the column is virtual.  (If you use the analyze command it, I think it takes the row length from the block by comparing address offsets within the […]

    Pingback by Virtual Stats | Oracle Scratchpad — January 16, 2014 @ 8:34 am BST Jan 16,2014 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.

Join 4,173 other followers