Oracle Scratchpad

March 31, 2010

Analyze this

Filed under: Statistics,Troubleshooting,Tuning — Jonathan Lewis @ 7:50 pm BST Mar 31,2010

Here’s a little script I wrote a few years ago to make a point about using the dbms_stats package. I’ve just re-run it on 10.2.0.3 to see if it still behaves the way it used to – and it does. If you want to be just a little bit baffled, set up a database with an 8KB blocks size, a tablespace that is locally managed, uniform extent size of 1MB, using freelist management, then run the script:


drop table t2;
drop table t1;

begin
	begin		execute immediate 'purge recyclebin';
	exception	when others then null;
	end;

	begin		execute immediate 'execute dbms_stats.delete_system_stats';
	exception	when others then null;
	end;

	begin		execute immediate 'alter session set "_optimizer_cost_model"=io';
	exception	when others then null;
	end;

end;
/

alter session set workarea_size_policy = manual;
alter session set hash_area_size = 1048576;

create table t1
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 5000
)
select
	rownum			id,
	rownum			n1,
	lpad(rownum,10,'0')	v00,
	lpad(rownum,10,'0')	v01,
	lpad(rownum,10,'0')	v02,
	lpad(rownum,10,'0')	v03,
	lpad(rownum,10,'0')	v04,
	lpad(rownum,10,'0')	v05,
	lpad(rownum,10,'0')	v06,
	lpad(rownum,10,'0')	v07,
	lpad(rownum,10,'0')	v08,
	lpad(rownum,10,'0')	v09,
	lpad(rownum,10,'0')	v10,
	lpad(rownum,10,'0')	v11,
	lpad(rownum,10,'0')	v12,
	lpad(rownum,10,'0')	v13,
	lpad(rownum,10,'0')	v14,
	lpad(rownum,10,'0')	v15,
	lpad(rownum,10,'0')	v16,
	lpad(rownum,10,'0')	v17,
	lpad(rownum,10,'0')	v18,
	lpad(rownum,10,'0')	v19,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

create table t2
as
with generator as (
	select	--+ materialize
		rownum 	id
	from	all_objects
	where	rownum <= 5000
)
select
	rownum			id,
	rownum			n1,
	lpad(rownum,200,'0')	v100,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 10000
;

analyze table t1 compute statistics;
analyze table t2 compute statistics;

set autotrace traceonly explain

prompt	=============
prompt	Using Analyze
prompt	=============

select
	t1.v00,
	t1.v01,
	t1.v02,
	t1.v03,
	t1.v04,
	t1.v05,
	t1.v06,
	t1.v07,
	t1.v08,
	t1.v09,
	t1.v10,
	t1.v11,
	t1.v12,
	t1.v13,
	t1.v14,
	t1.v15,
	t1.v16,
	t1.v17,
	t1.v18,
	t1.v19,
	t2.v100
from
	t1,
	t2
where
	t1.id = t2.id
;

set autotrace off

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'t1',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'t2',
		estimate_percent => 100,
		method_opt 	 => 'for all columns size 1'
	);
end;
/

set autotrace traceonly explain

prompt	================
prompt	Using dbms_stats
prompt	================

select
	t1.v00,
	t1.v01,
	t1.v02,
	t1.v03,
	t1.v04,
	t1.v05,
	t1.v06,
	t1.v07,
	t1.v08,
	t1.v09,
	t1.v10,
	t1.v11,
	t1.v12,
	t1.v13,
	t1.v14,
	t1.v15,
	t1.v16,
	t1.v17,
	t1.v18,
	t1.v19,
	t2.v100
from
	t1,
	t2
where
	t1.id = t2.id
;

set autotrace off

Note that the script runs the same query twice – check the two execution plans carefully, they may not be the same.

7 Comments »

  1. Isn’t the analyze command deprecated for gathering stats for the CBO?

    Comment by Colin 't Hart — March 31, 2010 @ 9:01 pm BST Mar 31,2010 | Reply

  2. I tested on my machine:

    10.1.0.2.0 on Windows 32bit:

    Analyze
    ----------------------------------------------------------                      
       0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=349 Card=10000 Bytes=4060000) 
       1    0   HASH JOIN (Cost=349 Card=10000 Bytes=4060000)
       2    1     TABLE ACCESS (FULL) OF 'T1' (TABLE) (Cost=76 Card=10000 Bytes=2030000) 
       3    1     TABLE ACCESS (FULL) OF 'T2' (TABLE) (Cost=73 Card=10000 Bytes=2030000)
                     
    DBMS_STATS
    ----------------------------------------------------------                      
       0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=365 Card=10000 Bytes=4290000)
       1    0   HASH JOIN (Cost=365 Card=10000 Bytes=4290000)
       2    1     TABLE ACCESS (FULL) OF 'T2' (TABLE) (Cost=73 Card=10000 Bytes=2050000)
       3    1     TABLE ACCESS (FULL) OF 'T1' (TABLE) (Cost=76 Card=10000 Bytes=2240000)
    
    

    So looks like for analyze CBO will think the row length in the two tables are same but for DBMS_STATS will also consider the columns count, right?

    Comment by dhhb — April 1, 2010 @ 6:08 am BST Apr 1,2010 | Reply

  3. ANALYZE is not counting the length bytes?
    The point being, bad statistics can have real, detrimental impact (in this case, slightly wrong hash join order) on execution plan quality even if we don’t use CPU costing?
    By the way, extent sizing policy (auto vs. 1M) and free space management (ASSM vs. freelist) didn’t make any difference in my test (10.2.0.3 under RedHat).

    Cheers,
    Flado

    Comment by Flado — April 1, 2010 @ 8:50 am BST Apr 1,2010 | Reply

  4. [...] Statistics — Jonathan Lewis @ 7:58 pm UTC Apr 1,2010  If you’ve run the scripts from Analyze This, I hope you found that the query gave you two different execution plans. This is my output from the [...]

    Pingback by Analyze This – 2 « Oracle Scratchpad — April 1, 2010 @ 8:00 pm BST Apr 1,2010 | Reply

  5. I do this test on my test db.

    os: Red Hat Enterprise Linux Server release 5.2 (Tikanga)
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 – Production

    --After analyze
    -------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
    -------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      | 10000 |  3964K|       |   292 |
    |*  1 |  HASH JOIN         |      | 10000 |  3964K|  3506K|   292 |
    |   2 |   TABLE ACCESS FULL| T1   | 10000 |  1982K|       |    47 |
    |   3 |   TABLE ACCESS FULL| T2   | 10000 |  1982K|       |    45 |
    -------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("T1"."ID"="T2"."ID")
    
    Note
    -----
       - cpu costing is off (consider enabling it)
    
    --After using dbms_stats
    -------------------------------------------------------------------
    | Id  | Operation          | Name | Rows  | Bytes |TempSpc| Cost  |
    -------------------------------------------------------------------
    |   0 | SELECT STATEMENT   |      | 10000 |  4189K|       |   308 |
    |*  1 |  HASH JOIN         |      | 10000 |  4189K|  3662K|   308 |
    |   2 |   TABLE ACCESS FULL| T2   | 10000 |  2001K|       |    45 |
    |   3 |   TABLE ACCESS FULL| T1   | 10000 |  2187K|       |    47 |
    -------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - access("T1"."ID"="T2"."ID")
    
    Note
    -----
       - cpu costing is off (consider enabling it)
    

    Comment by jametong — April 3, 2010 @ 1:18 pm BST Apr 3,2010 | Reply

  6. [...] to use DBMS_STATS instead of analyze ? Jonathan Lewis-Analyze this Jonathan Lewis-Analyze this [...]

    Pingback by Blogroll Report 26/03 /2010 – 02/04/2010 « Coskan’s Approach to Oracle — May 23, 2010 @ 10:21 pm BST May 23,2010 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,876 other followers