Resizing the SGA

December 4, 2006

Resizing the SGA

Filed under: Infrastructure,Troubleshooting — Jonathan Lewis @ 6:32 pm GMT Dec 4,2006

I’m still a bit twitchy (i.e. nervous) about using the automatic SGA tuning – which means setting the sga_target and letting Oracle decide when to move memory between (typically) the db cache and the shared pool. I’ve come across four systems so far – two by direct contact, two by indirect report – which have had hanging problems which magically disappeared when the sga_target was disabled. It has to be said that all four systems had large SGAs – 8GB to 64GB – and they were all busy with lots of active sessions, so small office systems are probably quite safe using the sga_target.

Whilst it’s hard to be specific about why the problems occur, it seems likely that it is possible that some sort of race condition occurs where the code decides that the db cache needs to grow moments after it has started to shrink it to allow the shared pool to grow. (Or vice versa, of course).

If you think it’s worth keeping an eye what the sga_target is doing to your system, you might want to run a quick query against the v$sga_resize_ops view from time to time, to see if there are times when a couple of the heaps swap memory back and forth in rapid sequence. The following SQL was a quick script I hacked together to do this on the last site where random hangs were occurring:

  

column component format a20
column parameter format a30  

column initial_size format 99,999,999,999
column target_size format 99,999,999,999
column final_size format 99,999,999,999

set linesize 90
set pagesize 60  

spool sga_ops  

select
        component,
        oper_type,
        oper_mode,
        parameter,
        initial_size,
        target_size,
        final_size,
        status,
        to_char(start_time,'dd-mon hh24:mi:ss') start_time,
        to_char(end_time,'dd-mon hh24:mi:ss')   end_time
from
        v$sga_resize_ops
order by
        start_time
;  

spool off

Note: I only realised some days after I wrote this note that my output was only appearing in the correct order because of lucky timing. The order by clause is using my column alias of start_time not the underlying table (view) column of the same name. If I had had any data from the previous month in v$sga_resize_ops then my choice of character conversion would have made it appear at the end of the output.

At the linesize of 90 I set, this produced a two-line per row output – the following is from my laptop, not from one of the big systems:

COMPONENT            OPER_TYPE     OPER_MODE PARAMETER
-------------------- ------------- --------- ------------------------------
   INITIAL_SIZE     TARGET_SIZE      FINAL_SIZE STATUS    START_TIME      END_TIME
--------------- --------------- --------------- --------- --------------- ---------------
java pool            GROW          IMMEDIATE java_pool_size
      4,194,304       8,388,608       8,388,608 COMPLETE  04-dec 04:05:22 04-dec 04:05:22

DEFAULT buffer cache SHRINK        IMMEDIATE db_cache_size
    402,653,184     398,458,880     398,458,880 COMPLETE  04-dec 04:05:22 04-dec 04:05:22

DEFAULT buffer cache SHRINK        IMMEDIATE db_cache_size
    398,458,880     394,264,576     394,264,576 COMPLETE  04-dec 04:05:24 04-dec 04:05:24

java pool            GROW          IMMEDIATE java_pool_size
      8,388,608      12,582,912      12,582,912 COMPLETE  04-dec 04:05:24 04-dec 04:05:24

Another symptom that you might watch out for at the same time is memory in the shared pool being used for data blocks – as Tanel Poder has pointed out, when v$sgastat is showing memory in the class “KGH: NO ACCESS”, this is a case of shared pool granules partially freed for use by the db cache and waiting for the rest of the granule content to be unpinned so that the entire granule can become a cache granule.

[More on resizing the SGA]

Comments (17)

17 Comments »

I’ve been using automatic SGA tuning in all of my databases (10.1.0.3, 10.1.0.5, 10.2.0.2) for a couple of years now and have not had a problem with it. Typically there are no SGA size changes for a few weeks, then a flurry of about a dozen small changes over a couple of minutes, then another month with the same configuration. My DB’s rarely have more than about 100 connections, though.

I have had an entire DB hang when changing the value of SGA_TARGET, scope=both, so I don’t do that anymore.

Comment by Steven Ensslen — December 4, 2006 @ 7:49 pm GMT Dec 4,2006 | Reply
I would imagine that there would be a hidden parameter somewhere that controls the interval at which the potential benefits of resizing are evaluated. Nothing sprang out at me though.

Comment by David Aldridge — December 4, 2006 @ 10:59 pm GMT Dec 4,2006 | Reply
Nice tip. I’ve never had any problem with ASMM unless max_sga_size was set in such a way as to allow Oracle to add to the SGA at runtime. That was always a nightmare.

I’ll be keeping a close eye on this now.

As an aside, ASMM is another feature not used in TPC benchmarks (at least not the TPC-C) result published on 11/30/06 as I wrote about in my blog – http://kevinclosson.wordpress.com

Comment by Kevinclosson — December 5, 2006 @ 12:28 am GMT Dec 5,2006 | Reply
This is an interesting post as at a previous place that I worked we saw strange activity to do with ASMM.

Basically it looked like, in certain situations, it would try and grow the shared pool which obviously meant that it had to take memory from some other pool. By looking at v$sga_resize_ops we could see that when we had poor performance coincided with resizing operations.

As far as I could see it looked like it reached a point where it was trying to grow the shared pool but was unable to decrease another pool, at this stage the database would appear to hang.

We managed to get around it by setting lower limits for each of the pools so that it would not try to take it below this threshold. This seemed to work for us.

Regards,

Gareth.

Comment by Gareth — December 5, 2006 @ 9:30 am GMT Dec 5,2006 | Reply
Hi there,

We also saw problems with our 10.2.0.1 system immediately after migrating from 8i. Even though we did masses of testing before migration and had ASMM switched on during the testing, we didnt see any problems until we went live.

It looked like we migrated, overnight batch ran, and then the shared-pool shrunk right down while the buffer cache grew and grew. Our shared pool got stuck at a size that was 50% of what we needed when we were running 8i! Then the OLTP daytime users came on line and the pools couldnt successfully resize; Result was that our parse rates went through the roof and performance was very very bad. This wasnt the only issue we had mind but it is one to be careful of. I would point out that most of our other systems with smaller workload, fewer attached sessions ( thanks again for the help.

-Bob Mycroft

Comment by Bob Mycroft — December 10, 2006 @ 11:00 am GMT Dec 10,2006 | Reply
[…] Jonathan Lewis discusses his experience with automatic SGA sizing. I have an Oracle10g database that utilizes automatic SGA sizing. I have not experienced any problems to date, however this installation used by a development team is small and surely doesn’t see a high amount of activity other systems might. […]

Pingback by Oracle10g - Automatic SGA Sizing « Mike R’s Blog — December 20, 2006 @ 7:00 pm GMT Dec 20,2006 | Reply
I’m wondering if someone could shed some light on the values of both the STATUS and OPER_MODE columns, in context, within v$sga_resize_ops.

I’ve been doing some load testing using ASMM. I see both the shared pool and the db buffer caches resize themselves as I would anticipate, under load, but I’m never sure if they actually do resize as the OPER_MODE column often indicates ‘defered’ while STATUS indicates ‘complete’. This seems a little incongruent to the ignorant.

Thank you for your insight so far.

PB

Comment by paul berger — March 2, 2007 @ 6:10 pm GMT Mar 2,2007 | Reply
We have ha d problems with ASMM. We had over 900 grows and shrinks per day, leaving effectively the pools the same. But there were many ora-4031. We have set the shared_pool manually. Increase of 20% The ora-4031 seem to have disappeared.
The grows and shrinks are persistent (600) per day. I can’t see the benefit of constantly exchanging 16M of memory. In other databases I see similar behouviour. Should you advice to turn ASMM off?

Comment by Albert-Frank Drenth — March 9, 2007 @ 12:26 pm GMT Mar 9,2007 | Reply
Albert-Frank, I would raise an SR with Oracle about this and give them a few days to diagnose the problem – especially since you seem to have avoided the critical issue. But if you get any clue that this constant fluctuation is causing a performance problem, I would turn the feature off to see if it made any noticeable difference.

Comment by Jonathan Lewis — March 9, 2007 @ 1:28 pm GMT Mar 9,2007 | Reply
[…] Infrastructure — Jonathan Lewis @ 8:29 pm UTC Apr 16,2007 Some time ago, I wrote about resizing the SGA and the problems of automatic SGA management. In the follow-up, one of of the suggestions was to […]

Pingback by SGA Resizing « Oracle Scratchpad — April 16, 2007 @ 8:30 pm BST Apr 16,2007 | Reply
Just found this post and thought I’d let you know that we are getting intermittent ORA-04031 errors using ASMM on 10.1.0.3 on Solaris. This always coincides with memory being moved from the buffer cache to the large pool (we’re using shared server too). I’m intending to turn off ASMM to resolve this.

Comment by Andy Catchpole — May 16, 2007 @ 1:06 pm BST May 16,2007 | Reply
We are running 10.2.0.3 HP Itanium and will have to support a few thousands sessions in production. Since 2 years we are using automatic memory management. I did not have SGA issues with a limited amount of sessions and a limted SGA of let’ s say in the 500Mb – 800Mb range. We scaled up our development SGA to support thousands of connections by configuring shared server. Since we are running shared server mode I configured a SGA of 2Gb with threshold values for both shared pool, large pool and buffer cache. Today I faced an ORA 4031 at the shared pool level at that moment my shared pool was about 900Mb however trace files shows me the “KGH: NO ACCESS” component of the shared pool was about 600 Mb. If the above assumption “this is a case of shared pool granules partially freed for use by the db cache and waiting for the rest of the granule content to be unpinned so that the entire granule can become a cache granule” is true – what I would like to believe – I conclude there was a lack of free contiguous memory in my shared pool because of part of it was not -yet- available for it. More and more I ‘ve been thinking of turning the automatic SGA (re)sizing off and to move back to the old fashionned manual sizing of the various components. I wonder where I can get some info of the various shared and library cache components to get some in dept knowledge of it.
Regards
Guy

Comment by Guy Lambregts — May 31, 2007 @ 2:02 pm BST May 31,2007 | Reply
Metalink Note:396940.1 mentions Bug 5045507 related to “KGH : NO ACCESS” increase in v$sgastat.
The only possible workaround is to disable ASMM, what I will do.
Personally I think there could be an interference between the likelyhood of occurence and a non zero value for session_cached_cursors combined with a huge amount of sessions. ASMM has never been my favourite 10G feature, this experience moves ASMM to my least appreciated features of 10G
Regards
Guy

Comment by Guy Lambregts — May 31, 2007 @ 6:50 pm BST May 31,2007 | Reply
I have to confirm that in highly turbulent db environments (highly concurrent, many transactions, many sessions, so quite typical OLTP) ASMM is unstable.
We set it after moving to 10gR2, but had to return back to “static” memory settings. There were quite critical problems (few times some of databases crash – traces rather laconic, sometimes one or more ORA-04031 errors, actually there was nothing to send to Oracle Support) – after change of parameters we enjoy stable work (no mysterious crashes no more)

Comment by Remigiusz Sokolowski — July 10, 2007 @ 9:02 am BST Jul 10,2007 | Reply
We suspect issues in our heavy OLTP application with AMM in regards to dynamic shared pool shrinks particularly. When AMM was turned on we had nearly constant adjustments and under heavy load we were getting lots of shared pool contention in the form of “Cursor: Pin S wait on X” waits. These waits would eventually render our DB unoperable due to excessive CPU utilization. We found flushing shared pool alleviated the issue temporarily. Worked a tickey with Oracle for a few months and started monitoring sql area, shared_pool free and dynamic shrinks. Turns out that our debilitating events were corresponding to successive shared pool shrinks. We’ve fixed the shared_pool and db_cache_size by setting sga_target = 0 and our issues seem to have disappeared.

Comment by Dave McKinney — September 6, 2007 @ 5:44 pm BST Sep 6,2007 | Reply
Is it still the same in 11gR2 ?
Do you suggest as a “best practice” to set minimum values for the different pools ?

Comment by gilles — May 10, 2012 @ 1:09 pm BST May 10,2012 | Reply
- Gilles,
  
  I don’t like the expression “best practice” – a better term would probably be “commonly used defense mechanism”.
  
  It’s been 6 years since I wrote the note, so things have moved on and the threat is diminished – but I would still be inclined to include some minimum figures for the shared_pool_size and db_cache_size if using the sga_target.
  
  11gR2 has introduced the memory_target, of course – and my reluctance to use that matches my previous reluctance to use the sga_target.
  
  Comment by Jonathan Lewis — May 24, 2012 @ 10:31 am BST May 24,2012 | Reply