Oracle Scratchpad

July 27, 2012

Compression Units – 3

Filed under: Exadata,HCC,Oracle — Jonathan Lewis @ 5:08 pm BST Jul 27,2012

For those who have read the previous posting of how I engineered an Exadata disaster and want to reproduce it, here’s the script I used to generate the data.

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 1e4
)
select
	rownum					id,
	lpad(rownum,10,'0')			v1,
	mod(rownum,10000)			n1,
	trunc(rownum/100)			n2,
	trunc(dbms_random.value(0,262144))	n_256K,
	trunc(dbms_random.value(0,131072))	n_128K,
	trunc(dbms_random.value(0,8192))	n_8k,
	rpad(dbms_random.string('U',3),60)	padding
from
	generator	v1,
	generator	v2
where
	rownum <= 1048576
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

create table t1_qh
compress for query high
as
select * from t1
;

create table t1_ah
compress for archive high
as
select * from t1
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1_QH',
		method_opt	 => 'for all columns size 1'
	);

	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1_AH',
		method_opt	 => 'for all columns size 1'
	);
end;
/

create bitmap index t1_bi on t1(n_128K);
create bitmap index t1_qh_bi on t1_qh(n_128K);
create bitmap index t1_ah_bi on t1_ah(n_128K);

select
        max(padding)
from
        t1_ah
where
        n_128k between 1000 and 1999
;

You’ll need about 5 GB of space to create all these objects – but that shouldn’t be too difficult on the typical Exadata system. As you can see, I’ve created the data to cover a variety of tests and experiments, including comparing the side effects of different levels of compression (or not).

When you run the query, you may find that the optimizer arithmetic picks the tablescan option automatically – the choice will depend basically on your setting for the db_file_multiblock_read_count and/or the settings for the various system stats (sreadtim, mreadtim, MBRC and CPUspeed, or the equivalent no-workload settings); so, to see the performance impact, you may have to hint the query. (It’s slightly worrying that when you set up a (datawarehouse) system on Exadata it might be a good idea – at least in the short term – to set the db_file_multiblock_read_count to 128 because you really do want Oracle to think that tablescans are, in general, a pretty good fast option – we’ve only just got over persuading people that you shouldn’t set this parameter in any version of 10g or above !

After running the query twice on each table (to get all the data cached somewhere, if possible) the most recent timings I got were as follows for the indexed access path (the 10% difference between this test and the time I reported in my last test could possibly be explained by the fact that this machine had seen some serious volume testing while the other machine was so new that I had probably been using the outer edge of every disc):

  • No compression – 0.7 seconds.
  • Query high – 77.24 seconds
  • Archive high -  3022.83 seconds

The number of different ways you can test against just this set of data is quite surprising – so don’t feel you have to stop with just the one demonstration.

Footnote – prompted by Kerry Osborne’s note, I’ve realised that my comment about disks and outer edges etc. is completely irrelevant. The entire time spent was CPU time, and the data was all cached when I ran these tests. I should have checked the machine types more closely – the slower machine was a V2, the faster was an X2.

 

6 Comments »

  1. Interesting you mention using MBRC setting as a way to encourage full scans. The fact that the optimizer doesn’t know a few things about how Exadata does it’s business is indeed a challenge at the moment. Maria was not particularly happy with me suggesting that changing parameters to encourage full scans might be a reasonable thing to do in some cases at Hotsos. I’ll be very interested to hear what she has to say on the subject at E4 in a couple of weeks.

    I’d be interested to know what percentage of the time was actually due to decompression vs. the fact that every row was basically spread across a 32 block CU. Also, it might be interesting to show your work on how you determined it was a 32 block CU. ;)

    Comment by Kerry Osborne — July 28, 2012 @ 3:10 pm BST Jul 28,2012 | Reply

    • Kerry,
      The optimizer is going to have problems with Exadata, and I think it’s going to require some faking (at least in the short term) because there are so many “dirty tricks” at the back end that allow the results from a very large number of scanned blocks to be returned faster than Oracle can acquire data from a single block that any “reasonable” approach – short of creating indexes only for very biased cases, perhaps – is not likely to work. I’m also very interested to hear what Maria’s solution is.

      As far as I could tell, the entire time was CPU spent on decompressing a couple of columns of a CU of every row I needed. I’m just about to write a short note about determining the CU size and row limits.

      Comment by Jonathan Lewis — August 7, 2012 @ 4:35 pm BST Aug 7,2012 | Reply

  2. Apart from setting multiblock read count, IIRC Oracle recommends (or suggests) using optimizer_index_cost_adj=200 on Exadata. I tried to check MOS about this, but looks like it’s down right now, so I might be wrong.

    Comment by Timur Akhmadeev — July 31, 2012 @ 4:24 pm BST Jul 31,2012 | Reply

    • Timur,

      I can see the argument for doing that – but I really would try to avoid it because of the other inconsistencies it would introduce.

      Comment by Jonathan Lewis — August 7, 2012 @ 4:36 pm BST Aug 7,2012 | Reply

  3. [...] rows into a single block ! Using a slightly less extreme data set (see, for example, the code from my previous CU posting) I ran the same query to show that rather than every reporting rowids in every consecutive block I [...]

    Pingback by Compression Units – 4 « Oracle Scratchpad — August 7, 2012 @ 5:24 pm BST Aug 7,2012 | Reply

  4. [...] The patch that is currently available introduces a new option for dbms_stats.gather_system_stats(). Pass in the parameter value ‘Exadata’ and (along with any other steps) the code will set the MBRC stored statistic to 128.  If this looks like a familiar idea, check this comment on an earlier post: [...]

    Pingback by Compression Units – 5 « Oracle Scratchpad — August 19, 2012 @ 6:03 pm BST Aug 19,2012 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,454 other followers