Oracle Scratchpad

March 30, 2011

ASSM wreck

Filed under: ASSM,Bugs,Index Rebuilds,Indexing,Oracle,Troubleshooting — Jonathan Lewis @ 5:25 pm BST Mar 30,2011

Yesterday I introduced a little framework I use to avoid the traps inherent in writing PL/SQL loops when modelling a session that does lots of simple calls to the database. I decided to publish the framework because I had recently come across an example where a series of SQL statements gives a very different result from a single PL/SQL block.

 

The model starts with a simple data set – which in this case is created in a tablespace using ASSM (automatic segment space management), an 8KB block size and 1MB uniform extents (in a locally management tablespace).


create table t1
tablespace test_8k_assm
as
select
	trunc((rownum-1)/100)	n1,
	lpad('x',40)		v1,
	rpad('x',100)		padding
from
	dual
connect by
	rownum ;

create index t1_i1 on t1(n1, v1)
tablespace test_8k_assm
;

validate index t1_i1;
execute print_table('select * from index_stats');

You can see that the n1 column is defined to have 100 rows for each of 200 different values, and that each set of one hundred rows is stored (at least initially) in a very small  cluster of table blocks.

With the data set in place I am now going to pick a set of one hundred rows at random, delete it, re-insert it, and commit; and I’m going to repeat that process 1,000 times.

declare
	rand	number(3);
begin
	for i in 1..1000 loop

		rand := trunc(dbms_random.value(0,200));

		delete from t1
		where n1 = rand
		;

		insert into t1
		select
			rand,
			lpad('x',40),
			rpad('x',100)
		from
			dual
		connect by
			rownum 		;

		commit;

	end loop;
end;
/

validate index t1_i1;
execute print_table('select * from index_stats');

You might think that this piece of code is a little strange – but it models a process that I saw recently on a client site, and it has crossed my mind that it might appear in a number of systems hidden under the covers of dbms_job. So what does it do to the index ?

Given the delay that usually appears between the time an index entry is marked as deleted and the time that the space can be reused, and given the way I’ve engineered my data so that the space needed for the 100 entries for each key value is little more than half an index leaf block (an important feature of this case – because it means the first delete/insert is likely to cause a leaf block split), I wouldn’t have been surprised if the index had stabilised at roughly twice its original size. But that’s not what happened to my example running under ASSM. Here are the “before and after” results from my test:


                       Before         After
LF_ROWS                20,000        70,327
LF_BLKS                   156           811
LF_ROWS_LEN         1,109,800     3,877,785
BR_ROWS                   155           810
BR_BLKS                     3            10
BR_ROWS_LEN             8,903        45,732
DEL_LF_ROWS                 0        50,327
DEL_LF_ROWS_LEN             0     2,767,985
DISTINCT_KEYS             200           190
MOST_REPEATED_KEY         100         1,685
BTREE_SPACE         1,272,096     6,568,320
USED_SPACE          1,118,703     3,923,517
PCT_USED                   88            60
ROWS_PER_KEY              100           370
BLKS_GETS_PER_ACCESS       54           189

It’s a small disaster – our index leaf block count has grown in size by a factor of about five and we have more deleted rows than “real” rows. (Note, by the way, that the awful state of the index is NOT  reliably indicated  by the PCT_USED figure – a statistic that  is often proposed as an indicator of the state of an index).

Unfortunately this is the type of problem that doesn’t surprise me when using ASSM; it’s supposed to help with highly concurrent OLTP activity (typified by a large number of very small transactions) but runs into odd timing problems updating free space bitmaps whenever you get into “batch-like” activity.

However, there is a special consideration in play here – I’ve run the entire operation as a single pl/sql loop. Would the same problem appear if I ran each delete/insert cycle as a completely independent SQL script using the “start_1000.sql” script from my previous note ?

To test the effect of running 1,000 separate tasks, rather than executing a single pl/sql loop, I wrote the following code into the start_1.sql script that I described in the article before running start_1000.sql:


declare
	rand	number(3);
begin

	rand := trunc(dbms_random.value(0,200));

	delete from t1
	where n1 = rand
	;

	insert into t1
	select
		rand,
		lpad('x',40),
		rpad('x',100)
	from
		dual
	connect by
		rownum 	;

	commit;

end;
/

The impact was dramatically different. (Still very wasteful, but quite a lot closer to the scale of the results that you might expect from freelist management).


                       Before         After
                    ---------     ---------
LF_ROWS                20,000        39,571
LF_BLKS                   156           479
LF_ROWS_LEN         1,109,800     2,196,047
BR_ROWS                   155           478
BR_BLKS                     3             6
BR_ROWS_LEN             8,903        26,654
DEL_LF_ROWS                 0        19,571
DEL_LF_ROWS_LEN             0     1,086,247
DISTINCT_KEYS             200           199
MOST_REPEATED_KEY         100           422
BTREE_SPACE         1,272,096     3,880,192
USED_SPACE          1,118,703     2,222,701
PCT_USED                   88            58
ROWS_PER_KEY              100           199
BLKS_GETS_PER_ACCESS       54           102

I haven’t yet investigated why the pl/sql loop should have produced such a damaging effect – although I suspect that it might be a side effect of the pinning of bitmap space management blocks that takes place within a single database call. It’s possible that the repeated database calls from SQL*Plus keep “rediscovering” bitmap blocks that show free space while the pinning effects stop the pl/sql from “going back” to bitmap blocks that have just been modified to identify recently freed space.

Interestingly the impact of using ASSM was dramatically reduced if one object used freelists management and the other used ASSM – and with my specific example the combination of a freelist table with an ASSM index even did better than the expected 50% usage from the “traditional” option of using freelists for both the table and index.

Note – the purpose of this note is NOT to suggest that you should avoid using ASSM in general; but if you can identify code in your system that is doing something similar to the model then it’s worth checking the related indexes (see my index efficiency note) to see if any of them are displaying the same problem as this test case. If they are you may want to do one of two things: think about a schedule for coalescing or even rebuilding problem indexes on a regular basis, or see if you can move the table, index, or both, into a tablespace using freelist management.

5 Comments »

  1. Interesting, if that excessive space waste can also be connected to bug 8286901 “SPACE SEARCH INEFFICIENCY DURING INDEX SPLIT.” We’ve stumbled upon that bug in ASSM enabled database, mostly on “FIFO” (i.e. queue like) tables. The bug manifests itself in long “enq: TX – index contention” waits.
    The bug fix sounds interesting: “Whenever the fix is enabled, root block splits examine no more than 5 empty blocks for reclamation before attempting a segment extension.”

    Comment by Laimis N — March 31, 2011 @ 6:02 am BST Mar 31,2011 | Reply

    • Laimis N,

      I think it depends on what you mean by “connected”. The bug you’re referring to relates to a problem Oracle had trying to find an appropriate empty block on a block split so it’s not directly connected.

      However the excessive waste of space does do two things – first it means that the index gets much bigger than it should, so you are more likely to see a root block split; secondly you do get a lot of empty leaf blocks, so Oracle is more likely to attempt to reuse an empty block which is not a legal choice.

      It’s possible, by the way, that one of my clients was the source of that bug report (though it may have been a very similar one) because we uncovered exactly the equivalent of your “enq: TX – index contention” type of problem on a very high speed FIFO index a few years ago. (Some related details here: https://jonathanlewis.wordpress.com/2008/09/29/index-analysis-2/ )

      Comment by Jonathan Lewis — March 31, 2011 @ 4:02 pm BST Mar 31,2011 | Reply

  2. [...] Jonathan Lewis introduces a framework that helps avoiding the traps inherent in writing PL/SQL loops when modelling a session that does lots of simple calls to the database. [...]

    Pingback by Log Buffer #215, A Carnival of the Vanities for DBAs | The Pythian Blog — April 1, 2011 @ 10:13 pm BST Apr 1,2011 | Reply

  3. Jonathan,

    Can you also let us know What was the database version ?

    I saw worst output from 11.2.0.2 with factor of 8 which is very big difference to be honest.

    My after resultset was like below with same resultset for before image (don’t think it can effect but I am running on OEL6 which is not certified yet)

    HEIGHT                        : 3
    BLOCKS                        : 1152
    NAME                          : T1_I1
    PARTITION_NAME                :
    LF_ROWS                       : 82501
    LF_BLKS                       : 1008
    LF_ROWS_LEN                   : 4577764
    LF_BLK_LEN                    : 7996
    BR_ROWS                       : 1007
    BR_BLKS                       : 11
    BR_ROWS_LEN                   : 57369
    BR_BLK_LEN                    : 8028
    DEL_LF_ROWS                   : 62501
    DEL_LF_ROWS_LEN               : 3467964
    DISTINCT_KEYS                 : 178
    MOST_REPEATED_KEY             : 1936
    BTREE_SPACE                   : 8148276
    USED_SPACE                    : 4635133
    PCT_USED                      : 57
    ROWS_PER_KEY                  : 463.488764044943820224719101123595505618
    BLKS_GETS_PER_ACCESS          : 235.244382022471910112359550561797752809
    PRE_ROWS                      : 0
    PRE_ROWS_LEN                  : 0
    OPT_CMPR_COUNT                : 2
    OPT_CMPR_PCTSAVE              : 79
    
    

    Comment by coskan gundogar — April 26, 2011 @ 11:43 pm BST Apr 26,2011 | Reply

    • Coskan,

      Thanks for that – I thought I’d tried it on 11.2.0.2 but the note in my script header says I’ve only tested it on 10.2.0.3. (That’s not 100% guaranteed, but I’m usually quite careful about keeping the records up to date.)

      Comment by Jonathan Lewis — April 27, 2011 @ 4:43 pm BST Apr 27,2011 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,453 other followers