PGA leaks

June 7, 2009

PGA leaks

Filed under: Infrastructure,trace files,Troubleshooting — Jonathan Lewis @ 7:53 pm BST Jun 7,2009

Jump to 2019 update

Here’s a simple script that I created a short time ago while investigating a memory problem on a client site. The purpose of writing the script was, as always, to create a model that stripped the client’s code to the bare minimum in an attempt to work out the root cause of a problem. (Warning: if you want to run this script, your Oracle shadow process will grab about 1GB of PGA memory)

rem
rem     Script:         pga_leak_02.sql
rem     Author:         J P Lewis
rem     Dated@          June 2009
rem
rem     Last tested
rem             11.1.0.6        Mostly fixed
rem             10.2.0.3        Huge memory impact
rem

define m_string_length = 200

drop table t1 purge;
create table t1(v1 varchar2( &m_string_length ));

create or replace procedure p1 (
        i_rowcount      in number       default 1000000,
        i_bulk_pause    in number       default 0,
        i_forall_pause  in number       default 0,
        i_free_pause    in number       default 0
)
as

        type w_type is table of varchar2( &m_string_length );
        w_list          w_type := w_type();
        w_free          w_type := w_type();

begin
        for i in 1..i_rowcount loop
                w_list.extend;
                w_list(i) := rpad('x', &m_string_length );
        end loop;

        dbms_lock.sleep(i_bulk_pause);

        forall i in 1..w_list.count
        insert into t1 values(w_list(i));

        dbms_lock.sleep(i_forall_pause);

        commit;

        w_list := w_free;
        dbms_session.free_unused_user_memory;

        dbms_lock.sleep(i_free_pause);

end;
/

The procedure simply fills a pl/sql array with data then uses the forall insert syntax to write the whole lot into a table. An earlier version of my code used a select with bulk collect to load the array (emulating the job run by the client more closely), but I wanted to make the code even simpler which is why it’s just a simple loop.

After creating the table and procedure I simply called the procedure and, in a slightly more complex version of the code, introduced some long wait times (using dbms_lock.sleep) between steps. Then I used another session (logged on as SYS) to monitor what demands this procedure made PGA memory as it ran.

This is the code I ran from the monitoring session (supplying the SID from the first session as the input parameter):


rem
rem     Script:         pga_check.sql
rem     Author:         J P Lewis
rem     Dated:          June 2009
rem
rem     Last tested
rem             11.1.0.6
rem             10.2.0.3
rem
rem     Simple script to cross check for a given SID
rem             v$process
rem             v$process_memory
rem             v$sesstat
rem

column name format a40

column  value           format  999,999,999,999

column  category        format  a10
column  allocated       format  999,999,999,999
column  used            format  999,999,999,999
column  max_allocated   format  999,999,999,999

column  pga_used_mem            format  999,999,999,999
column  pga_alloc_mem           format  999,999,999,999
column  pga_freeable_mem        format  999,999,999,999
column  pga_max_mem             format  999,999,999,999

define m_sid = &1

select
        name, value
from
        v$sesstat ss,
        v$statname sn
where
        sn.name like '%ga memory%'
and     ss.statistic# = sn.statistic#
and     ss.sid = &m_sid
;


select 
        pid,
        category,
        allocated,
        used,
        max_allocated
from 
        v$process_memory 
where
        pid = (
                select  pid 
                from    v$process 
                where   addr = (
                                select  paddr 
                                from    V$session 
                                where   sid = &m_sid
                                )
                )
;


select
        pid,
        pga_used_mem,
        pga_alloc_mem,
        pga_freeable_mem,
        pga_max_mem
from
        v$process 
where   addr = (
                select  paddr 
                from    V$session 
                where   sid = &m_sid
                )
;

The figures I’m after are just the session and process memory from v$sesstat, and the two different ways of reporting process memory from v$process and v$process_memory. These are the results I got on a database running 10.2.0.3. The set immediately below reports the state of PGA memory just after after the creation of the array, the set of results further down the page reports the same information just after the call to the procedure has ended.

NAME                                                VALUE
---------------------------------------- ----------------
session uga memory                                221,824
session uga memory max                            221,824
session pga memory                            313,843,284
session pga memory max                        313,843,284


CATEGORY          ALLOCATED             USED    MAX_ALLOCATED
---------- ---------------- ---------------- ----------------
SQL                   1,008              104           55,276
PL/SQL               22,412           17,848           22,412
Other           627,269,497                       627,269,497


    PGA_USED_MEM    PGA_ALLOC_MEM PGA_FREEABLE_MEM      PGA_MAX_MEM
---------------- ---------------- ---------------- ----------------
     235,870,025      627,292,917                0      627,292,917

You’ll notice several key points.

First (obviously) v$sesstat tells us that we’ve used a lot of memory – which isn’t too surprising since we generated 1,000,000 character strings of length 200, so we know we’re going to see at least a couple of hundred megabytes of memory being used.

But there’s much more to see in these three result sets. The report from v$process_memory says we’ve allocated more than 600MB in the “Other” category – which is a little surprising since (a) we know that we’ve used the memory for a pl/sql array – so why is it in “Other” – and (b) it’s twice as much memory as reported in v$sesstat and (c) we don’t see any of it in the “Used” column.

In fact, a dump of the pga and uga heaps (oradebug dump heapdump 5) shows that we have 313mb of RAM in the pga heap, of which 77MB is marked as free (but not yet released from the heap). From this we we can probably infer that the code maintaining v$process_memory is broken. I suspect that we really ought to see the 313MB in both the “Allocated” and “Used” columns with, quite possibly, the 77MB appearing in a separate row (not visible in this report) for the category “Freeable”.

Note that the report from v$process is closer to the actual state given by the heap dump. It shows pga_used_mem at 236M – which means that the 77M free from the heap dump is correctly subtracted from the 313MB total, but possibly that 77MB should be reported as “PGA_Freeable_Mem”, rather than disappearing completely. This view still has a problem, though: the pga_alloc_mem and pga_max_mem are both displaying the same doubling effect as v$process_memory.

Bottom line: v$sesstat seems to show you most of the truth, with v$process helping you to seperate the freeable memory out from the currently allocated memory, but both v$process and v$process_memory are not to be trusted. (Some of these anomalies are still present in 11.1.0.6 by the way).

The next bit of output shows you the report results after the procedure call had completed:


NAME                                                VALUE
---------------------------------------- ----------------
session uga memory                            498,681,064
session uga memory max                        498,681,064
session pga memory                            598,662,740
session pga memory max                        911,924,820


CATEGORY          ALLOCATED             USED    MAX_ALLOCATED
---------- ---------------- ---------------- ----------------
SQL                   3,024              516           55,276
PL/SQL          488,711,516        4,222,224      488,711,516
Other           110,135,753                       736,607,661


    PGA_USED_MEM    PGA_ALLOC_MEM PGA_FREEABLE_MEM      PGA_MAX_MEM
---------------- ---------------- ---------------- ----------------
     498,885,505      598,850,293                0    1,225,374,453

This is extraordinary. The session has allocated – and is still holding – more than 488MB in the PL/SQL category just because of that one million row insert. (My client was processing 3.8 million rows, and they “lost” 7.7GB of memory to this operation – the memory loss gives the appearance of growing geometrically with the number of rows).

In fact the penality is more than that 488MB; the memory that had been allocated for the bulk collect in category “Other” has been released and the 110MB still remaining in that categiry is also a side effect of the forall insert.

Note, by the way, that the pga_max_mem from v$process_memory is still 313M higher than it should be, as is the max_allocated in the “Other” category in v$process_memory.

When I first saw this behaviour I decided it was obviously a bug and spent a good 20 minutes searching Metalink for possible matches without success (I was looking for “forall insert”, if I’d tried “bulk insert” I would have had more luck) – so I forwarded my test case to the rest of the Oak Table to see if they had any thoughts about it and got a fairly prompt reply from Tanel Poder that this was bug 5866410. (He’d done a pga heapdump, and searched Metalink for the label of the memory type that was using most of the space – like all good strategies it was so obvious after it had been explained !)

The bug is a pl/sql memory leak in “forall insert…” It’s fixed in 11.1.0.6 and 10.2.0.5, with backports available to 10.2.0.3 and 10.2.0.4 on a couple of platforms so far (Solaris and IBM ZLinux as I write).

So if your code does very large “forall insert” calls and you see Oracle error ORA-04030 from time to time, or workarea operations dumping to disc unexpectedly, you may temporarily be losing large amounts of memory to this bug – in which case check Metalink for the bug number and see if there’s a patch that’s good for your platform. (And if you’re on 10.2.0.3 or 10.2.0.4 and there isn’t a patch for your platform yet, there’s a reasonable chance you can get one created since it’s already been done on a couple of other platforms.)

Update Sept 2019

I’ve just discovered a nice looking note on MoS describing how to populate a dynamic performance view called v$process_memory_detail, which will allow you to break down the details of where in your PGA Oracle is allocating (and, possibly, leaking) memory. The document is: How To Find Where The Memory Is Growing For A Process (Doc ID 822527.1), the basic command to populate the view for a process is:

alter session set events'immediate trace name PGA_DETAIL_GET level {PID}';

The data is dumped into a structure that’s stored in the SGA, not into a local structure so can be seen across all sessions if different users start dumping the data for different process IDs. The data for the process can be updated by further calls of the same event, and to clear the data from the view you can finally issue:

alter session set events'immediate trace name PGA_DETAIL_CANCEL level {PID}';

Comments (37)

37 Comments »

Tanel’s heapdump_analyzer is really useful in such cases.

Sometimes PGA leaks may not be so obvious, but result in performance problems. Here is recent case where PGA memory leak led to performance problems with DML on a table with trigger.

Comment by Timur Akhmadeev — June 8, 2009 @ 6:49 am BST Jun 8,2009 | Reply
- Timur,
  
  Thanks, that’s a pair of interesting links.
  
  Comment by Jonathan Lewis — June 9, 2009 @ 10:44 pm BST Jun 9,2009 | Reply
Interesting stuff as always.

One thing to comment.

If we do pga heap dump after the procedure completes, all we get would be a bunch of “free chunks”, not certain type of memory. This coincides with this part of the metalink note.

A heapdump will show lot of free memory in the free lists which is not used but instead fresh allocations are made.

Comment by Dion Cho — June 8, 2009 @ 8:55 am BST Jun 8,2009 | Reply
- Dion,
  
  You’re right abou the memory usage.
  
  I didn’t say where, or how often, in the process Tanel dumped the pga – but his first suggestion to me addressed a related problem in the same example which he had identified from the allocation type.
  
  When he sent me the second suggestion (which is the bug number above) I simply assumed that he had followed the same trail a little further.
  
  Comment by Jonathan Lewis — June 9, 2009 @ 6:25 am BST Jun 9,2009 | Reply
Thanks a lot for your post, the last week I has been worked on a ORA-4030 problem. I think that is not related to this bug but your information was very useful for make me understand the problem. I think that my problem was related to a PGA+SGA configuration vs the 2gb limits on a 32bits windows.

regards!

Comment by carcasco — June 8, 2009 @ 3:13 pm BST Jun 8,2009 | Reply
- Carcasco,
  
  It’s becoming more common for people running 32-bit Windows to run into ORA-04030 errors as their databases get bigger and (more importantly) busier – the single Oracle process grows beyond the basic 2GB limit imposed by Windows implementation on 32 bits.
  
  You’ve probably read up on things like the 3GB switch, AWE for memory windowing, and PAE for the extensions – but none of the options for getting past the 32 bit limit are really ideal for a busy system.
  
  Even though the bug may not be relevant, some code to take regular snapshots of the v$sessstat figures may help you track down memory hogs before they cause other sessions to fail.
  
  Comment by Jonathan Lewis — June 9, 2009 @ 6:31 am BST Jun 9,2009 | Reply
[…] a comment » Jonathan Lewis shared interesting and practical experience on PGA memory leak here. Here he mentions on the memory leaks by forall batch insert […]

Pingback by Dion Cho – Oracle Performance Storyteller — June 9, 2009 @ 5:35 am BST Jun 9,2009 | Reply
Hi Jonathan,

I don’t know whether it is related an interesting enough.
I noticed that if you do a dbms_sql.parse followed by a dbms_sql.describe_columns in a loop your pga increases all the time.

declare statement varchar2(100) := 'select * from dba_source'; l_theCursor integer default dbms_sql.open_cursor; l_colCnt number default 0; l_descTbl dbms_sql.desc_tab; begin for i in 1 .. 2000 loop dbms_sql.parse( l_theCursor, statement, dbms_sql.native ); dbms_sql.describe_columns( l_theCursor, l_colCnt, l_descTbl ); end loop; end; /

I know it is not a useful script but it was just to generate a lot of parses.

regards Hans-Peter

Comment by Hans-Peter Sloot — June 9, 2009 @ 11:18 am BST Jun 9,2009 | Reply
- Hans-Peter,
  
  That’s a nice example – but on the plus side, the memory does get released when the call ends.
  
  I quite understand the point of creating a script that demonstrates the threat in the minimum number of lines even if the script itself looks unreal – and totally approve – you should send this one to Oracle to see if this is necessary, or whether it’s a leak.
  
  One of the things I like about the blog mechanism is how it can pull related bits of information together – here’s another example of potentially nasty PGA leakage that I can link to because Tom Kyte pointed one of his threads to this blog.
  
  Comment by Jonathan Lewis — June 9, 2009 @ 10:08 pm BST Jun 9,2009 | Reply
- Maybe you are hitting bug #6011182. I wasn’t able to reproduce your case in 11.1.0.7.
  
  Comment by Timur Akhmadeev — June 10, 2009 @ 8:17 pm BST Jun 10,2009 | Reply
Hi Jonathan,

A colleague of mine pointed out your article to me re. PGA leaks. Since it’s something that I’ve done quite a bit of work on – and was quite interesting work, I thought I would add some comments.

We are running Oracle 10g AIX and have had quite a few issues with memory leaks involving processes selecting into PL/SQL memory objects with Bulk Collect (We are running a high speed Trading System).

Under load test conditions we could only run for a few hours before consuming all the server memory (in the region of 18 gb!).

What I discovered was that the problem was exacerbated hugely by having hash joins enabled and workarea_size_policy set to AUTO. We had to go live with hash joins disabled and manually setting our memory work areas.

I performed some in-depth memory analysis on this problem. AIX is terrific in this case, since it provides the ‘svmon -P’ command that allows you to get a detailed process memory heap dump.

I ended up writing some code that performed a time series analysis of all the processes we had running on our server – it took ‘svmon -P’ reports for each process, and compared the memory usage for each type of memory heap for that process over time. The results were very interesting…. .

What it showed was that Oracle was allocating large shared memory areas and then ‘sub-allocating’ blocks of memory to processes as required. Over time (and because of a bug in the memory heap management code) this shared block of memory would fill, and Oracle would then need to allocate a new block.

This had an interesting implication – which we noticed on our systems. That is, killing the sessions (in our case, a Java process which created a number of sessions onto the database) did not always return all of the RAM – and, in fact, we could never be quite sure how much RAM we would recover from a Java process restart.

This makes sense, since Oracle is ‘sub-allocating’ from a large shared memory segment, there would be other processes, which had not been killed, which would still be ‘registering’ an ‘interest’ in that memory block.

We found that the best way to retrieve all our RAM in a clean way was to bounce the instance.

As you discussed, the v$ views did not report on the situation correctly, but I guess that comes hand-in-hand with the bug in the memory heap management code.

In the end, Oracle provided a patch for us (on 10.2.0.3). They included the fix as part of 10.2.0.4 release (well, for AIX anyway).

As an aside, we have also noticed slow but gradual memory leaks + performance degradation with long lived dedicated connections executing large numbers of ‘Select *’ type SQL’s, even when the resulting datasets are fairly small. We have consequently banned the use of ‘Select*’ from our codebase. ;)

Hope this was interesting.

Cheers,

Adrian.

Comment by Adrian Nakon — June 10, 2009 @ 10:00 am BST Jun 10,2009 | Reply
- Adrian,
  
  Thanks, very interesting.
  
  Two things struck me in particular. It sounds as if AIX has some very specific ways of using memory that don’t appear with other Unix flavours. (Or possibly what you’re describing is a consequence of the type of connection pooling that you used.)
  
  The other thing is that you’ve turned Hans-Peter’s “artificial” demonstration into a live problem – his leak was on the “describe”, and that’s what Oracle has to do if you give it a ‘select *’.
  
  I am wondering about the “kill the java process” having variable success – if your code populated the program, application, or module columns of v$session (with the Java equivalent of dbms_application_info) you may have been able to identify all the sessions associated with the Java and done an ‘alter system kill session’ on them to release the memory.
  
  Comment by Jonathan Lewis — June 10, 2009 @ 7:22 pm BST Jun 10,2009 | Reply
Regarding Dion’s comment about having free chunks only after db call ends:

It may not always be the case. There are cases of memory leaks where all memory is not freed in the end of db call. But yes I took the PGA heapdump with oradebug when the problem was ongoing – the db call was still running. And that should be obvious anyway – you can’t systematically diagnose a problem if you don’t gather evidence DURING its occurrence.

Regarding the AIX memory allocation: It should work like other standard unixes. I havent used svmon for such troubleshooting but procmap instead, it also shows you the breakdown of memory segments in a process address space. The PGA memory allocated shouldn’t really be shared but rather its mmap()’ed from /dev/zero (with private flags) but svmon may think its shared.

AIX should take care of releasing all process memory once the process exists – are you using shared servers perhaps? This would explain why killing a session/or java client wouldn’t necessarily release all memory as the server processes (with their leaked PGAs) still remain running.

Comment by Tanel Poder — June 12, 2009 @ 6:53 pm BST Jun 12,2009 | Reply
- Hi Guys,
  
  Thanks for your replies on this.
  
  From the ‘svmon’ information, and the behaviour we saw when stop/starting
  Java processes (ie. not retrieving all the RAM), my impression was that
  memory was being taken from ‘shared memory’ segments by the processes.
  
  Tanel – in answer your question – no, we are not using shared servers.
  You are quite right – I would not expect process private PGA memory to
  be shared either. In this case, perhaps this ‘operation’ is not
  intended to be a ‘private’ one.
  
  If that were the case, then perhaps this had something to do with an error
  in the management of cursor memory in a region where Oracle needs that
  information to be ‘viewable’ by all processes (shared data structures etc).
  That would explain some of the behaviour, but whether that’s the answer is
  another question!
  
  Cheers,
  
  Adrian.
  
  Comment by Adrian Nakon — June 16, 2009 @ 9:48 am BST Jun 16,2009 | Reply
[…] to deal with memory leaks by using memory dumps. (PGA leak bug) Jonathan Lewis – PGA Leaks Dion Cho – Troubleshooting PGA leak with PGA heap […]

Pingback by Blogroll Report 06/06/09 – 12/06/09 « Coskan’s Approach to Oracle — June 12, 2009 @ 7:33 pm BST Jun 12,2009 | Reply
Hi Jonathan,

Do you think that this bug could be hit when using impdp?
Perhaps it makes use of pl/sql arrays too?

Regards Hans-Peter

Comment by Hans-Peter Sloot — August 20, 2009 @ 8:51 am BST Aug 20,2009 | Reply
Hans-Peter,
There are two or three different mechanisms that impdp uses for the bulk of the data – but I wouldn’t expect them to use pl/sql. On the other hand it might use pl/sql for some of the metadata processing.

It’s probably worth checking on a small system with system level tracing enabled. Try using expdp/impdp on a complicated schema with only a little data.

If you’ve got what seems to be a memory leak on an import, though, perhaps it’s simply some OCI leak on the data load.

Comment by Jonathan Lewis — August 20, 2009 @ 9:31 pm BST Aug 20,2009 | Reply
Hello Jonathan,

I faced memory leak from HASH JOINS year ago. Oracle Support didn’t want accept this bug and they told my proces allocated extra space for different operations (for example my PL/SQL AREA etc.). We are using 1GB “ulimit” for private memory for each process on OS level.
The DUMP od PGA+UGA (alter session set events ‘4030 trace name heapdump level 5’) showed as only there was was something like 900GB of memory allocated to kxs-heap-w SUBHEAP. So to show more I’d to start dumping of subheaps (with the help of book of Steve Adams) and it showed that 890GB was allocated in “QERHJ hash-joi” subheap and Oracle Support accpted bug.

The “PLS non-lib hp ” subheap can be dumped of course in your example. The addres of the subheap can be found in PGA+UGA HEAPDUMP TRACE

(EXTENT 0 addr=14d34fea0
Chunk 14d34feb0 sz= 448704 freeable “PLS non-lib hp ” ds=11047e998)

So the addres is in HEXA “11047e998″ (of course handler of my session, will be different on differnet systems). It can be also querried from x$ view from KERNEL SERVICES LEYER, from x$ksmup, the column KSMCHPAR shows the result

SELECT KSMCHCOM, KSMCHPAR FROM x$ksmup WHERE KSMCHCOM = ‘PLS non-lib hp’

You can dump (setting 4030 trace => alter session set events ‘4030 trace name heapdump_addr level 1, addr 0x11047e998’ or directly from ORADEBUG if unlimit is not set and you don’t suffer ORA-04030 directly) this subheap after running your procedure. And dump of subheap will appear in trace (extra lines):

HEAP DUMP heap name=”PLS extent sz=0x840 alt=32767 parent=110465e38 owner=0 EXTENT 0 addr=14d34fe48
Chunk 14d34fe58 sz= 448792 EXTENT 1 addr=14d2dfec8
Chunk 14d2dfed8 sz= 447256 Chunk 14d34d1f0 sz= Chunk 14d34d248 sz= Chunk 14d34d2a0 sz= Chunk 14d34d2f8 sz= Chunk 14d34d350 sz= Chunk 14d34d3a8 sz= Chunk 14d34d400 sz= Chunk 14d34d458 sz= Chunk 14d34d4b0 sz= Chunk 14d34d508 sz= Chunk 14d34d560 sz= Chunk 14d34d5b8 sz= Chunk 14d34d610 sz= Chunk 14d34d668 sz= Chunk 14d34d6c0 sz= Chunk 14d34d718 sz= EXTENT 2 addr=14d26ff48
Chunk 14d26ff58 sz= 447128 Chunk 14d2dd1f0 sz= Chunk 14d2dd248 sz= …. non-lib hp” desc=11047e998
het=32767 rec=0 flg=2 opc=2
nex=0 xsz=0x0
freeable “DARWIN ”
free ” ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”
free ” ”
88 freeable “DPAGE ”
88 freeable “DPAGE ”

And there is a bug, Oracle does not free this subheap. It is marked as free, but not returned to OS. The Tanel’s analyzer won’t work satisfactorily with subheaps dumped, so you can suggest him to write version 1.10 :)

Comment by Pavol Babel — August 23, 2009 @ 3:16 pm BST Aug 23,2009 | Reply
Hi Sir,

We are having 2 databases on a solaris box.The RAM is 16GB and the sum of the sga+pga for both the database totals to ~11Gb(approx).Now,vmstat is showing 2Gb free memory which is a concern here.I wanted to know is what all are the areas of concern and where can i look for it.

Comment by Anand — February 16, 2010 @ 12:49 pm GMT Feb 16,2010 | Reply
- Anand,
  
  Is your concern that vmstat is not reporting ~5GB of free memory, or that you aren’t making use of the “last” 2GB available ?
  
  I would check very carefull what vmstat means by “free” memory on your platform, and whether this has to be interpreted differently if you are using file system I/O mechanisms
  
  Comment by Jonathan Lewis — February 16, 2010 @ 6:59 pm GMT Feb 16,2010 | Reply
Hi Sir,
The concern is that vmstat is not reporting ~5GB of free space.Its just showing ~1.5GB free.How do i analyze it?

Anand

Comment by Anand — February 17, 2010 @ 4:16 am GMT Feb 17,2010 | Reply
- Anand,
  
  By “pga + sga” are you indicating that you are running 11g and using the memory target, or are you possibly using 10g and setting the sga target and pga target separately.
  
  Is this memory report from the moments immediately after the database starts up when nothing more than the background processes are running, or is it when both systems are running with their full quota of processes.
  
  You haven’t mentioned your operating system, but you might want to read Christo Kutrovski’s presentation on memory management – here’s a link from my blog, which includes a comment where Chen Shapira makes a point about the tcp buffer settings.
  
  Just as a starting point, though: remember that each process that connects to the SGA will need some memory, including memory to buld a memory map of the SGA. The more processes you attach and the larger the SGA the more memory “disappears” even if none of the sessions do anything.
  
  Comment by Jonathan Lewis — February 17, 2010 @ 8:41 am GMT Feb 17,2010 | Reply
Jonathan,

This looks a little like something we experience, but the symptoms are slightly different. Have you any advice as to what we could do to try to debug.

Symptoms: over time the virtual memory used by the oracle process (10.1.0.5 on Windows 2003R2) increases until it hits the 3Gb limit. SGA and PGA info show massively less memory in use than the VM shown by sysinternals.

Base Data:

From sysinternals using pslist -m oracle
```
Name                Pid      VM      WS    Priv Priv Pk   Faults   NonP Page
oracle             6548 1996236 1210708 1203804 1596496 18076609    103  324
```
From Oracle using data dictionary queries

SGA
```
Total System Global Area 1048576000 bytes
Fixed Size                   792728 bytes
Variable Size             568584040 bytes
Database Buffers          478150656 bytes
Redo Buffers                1048576 bytes
```
Process Memory
```
  PID SPID  USED_MB ALLOC_MB FREEABLE_MB MAX_MB      
----- ----- ------- -------- ----------- ------      
    1             0        0           0      0      
    2 4704        0        1           0      1      
    4 1356        0        1           0      1      
    6 2784        0        1           0      4      
    8 9104        0        1           0      1      
   10 4364        4       12           0     12      
   12 6340        0        1           0      4      
   14 5428        0        1           0      3      
   16 7800        0        1           0      1      
   18 1664        1        1           0      2      
   20 3776        2        4           0      7      
   22 10096       4        9           0      9      
   24 3616        4        9           0      9      
   26 2036        0        1           0      1      
   28 8168        1        2           0      2      
   30 3724        0        1           0      1      
   32 6972        3        3           0      8      
   34 2628        2        3           0      9      
   36 4528        2        3           0     13      
   38 8236        2        3           0      7      
   42 5628        1        3           0      3      
   44 9796        3        5           0      7      
   46 1704        0        1           0      1      
   48 3984        2        2           0      2      
   52 6660        2        3           0      5      
   54 8600        2        4           0      4      
   56 5872        2        2           0      7      
   58 1776        1        2           0      2      
   60 3368        2        2           0      6      
   88 4324        3        3           0      6      
```
So I’m using 2Gb of virtual memory for a 1Gb SGA and maximum 150Mb or so of PGA memory and this keeps increasing. Is it possible for a PGA memory leak not to be cleared when the process exits?

Any suggestions for queries / dumps to run that might point me in the right direction?

Comment by Tom — July 17, 2010 @ 11:26 am BST Jul 17,2010 | Reply
- Tom,
  
  Since Oracle operates on Windows as a single process with multiple threads it is perhaps a little more likely that a memory leak from an Oracle “process” might not be released.
  
  Since you’ve got a fairly elderly copy of Oracle there, you may have a bug that has been fixed in a later release so it’s worth checking metalink and patch release “bugs fixed” listing to see if you can find a match.
  
  The only little oddity that stands out is the gap in process IDs (particulary the jump from 60 to 88) – it’s not terribly important, but might be a hint that something odd is happening with process (thread) re-use.
  
  The only thing I can think of is to run up a little task that takes a snapshot of the differing view points every minute or two (don’t bother capturing the SGA, do check the different sources for PGA memory) to see if you spot any oddity (e.g. a timed pattern of growth, or sudden jumps).
  
  Comment by Jonathan Lewis — July 24, 2010 @ 9:47 am BST Jul 24,2010 | Reply
- Hi Tom & Jonathan,
  
  I am facing the same situation and the same problem.
  On one windows 2003 server (32bits), an oracle database (10.2.0.4 32bits) used to crash with ORA-4030 error. First, I have increased the sga_max and sga_target, giving the database more memory after I noticed that the virtual memory was continuously increasing. So I used the /3GD parameter on the server and allowed the database to use about 2,8 Gb of memory. It was not enough… I have a small program to monitor the virtual memory (using pslist, using windows registry …) and I saw that from time to time in a month, the virtual memory was increasing suddenly (about 300 M° in two or three minutes) but it was never released. Lately, I have upgraded the database because this was the only thing I could do to solve the problem (the 11g upgrade is not my customer’s priority …). So I moved the database to a 64bits Windows 2003 server, EE R2 with about 45 Gb of RAM . I upgraded the database to a Oracle 10.2.0.4 64bits and continued monitoring the virtual memory. It is still increasing … Now the memory is about 3,8 Gb and it’s growing .. slowly but surely…
  I also tried to add the “secret parameter” to the init.ora file, but it did nothing. I tried to use dead connection detection but it was useless either …
  I would be pleased to paste a small snapshot of the monitoring but I don’t know how to do it here … If you want it, mail me and I’ll send it back…
  
  Any idea would be much appreciated ;-)
  
  Regards,
  
  Joel
  
  Comment by Joël — September 27, 2012 @ 12:36 pm BST Sep 27,2012 | Reply
  - Joel,
    
    Sometimes people ask questions like this and happen to get lucky that I recognise the problem, or have some very specific suggestion that may help – but I’m not trying to be a replacement for Oracle Support. In your case, all I can say is that it sounds like a slow memory leak (on x64, the 32-bit “big jump” bit sounded like a runaway pl/sql process or a violent explosion of logons); the only suggestion I have to make to track it down is to run some code in a fairly tight loop that keeps calculating and reporting changes in v$sysstat in case that shows you any sudden strange changes. Apart from that, there’s the VMMAP utility that MOS references – could that give you some clue that (for example) connects the constant growth with one or two specific processes such as m000 or DBWR ?
    
    Comment by Jonathan Lewis — September 27, 2012 @ 1:00 pm BST Sep 27,2012 | Reply
Hi Jonathan,

Thanks so much for responding. I’ve checked through metalink and there are a fair few memory leak bugs in our version, so an upgrade is definitely on the cards in the future!

In the mean time, I think I may have found a lead to what could be causing this. I happened to take a statspack this afternoon as part of debugging this and found…

session pga memory 361,307,828
session pga memory max 491,724,468
session uga memory 764,521,322,824
session uga memory max 393,031,272

Now that 700Mb looks suspiciously like the difference between the expected size and the actual VM size windows shows. What is odd is that a query against v$sesstat shows 39Mb being used for UGA.

Somewhere to start at least.

Thanks for your insights

Comment by Tom — July 28, 2010 @ 5:19 pm BST Jul 28,2010 | Reply
Jonathan,

Since you were kind enough to offer some suggestions on resolving this, I thought I’d follow up with the solution I eventually tracked down

It turns out there is a known bug in Oracle 10gR2 (and we have now confirmed in in 10gR1 as well) on x64 chips whereby threads are not cleaned up correctly in the Oracle JVM after the call completes leading to “zombie” threads which take up memory but never get cleaned out. This means that on an x64 windows platform the Virtual Memory of the Oracle.exe process increases continuously
if you use the JVM.

What do you need to hit this bug
——————————–
1. 64 bit chip (note: not 64 bit OS – we hit this on 32 bit)
2. 10gR1 or 10gR2 (note: supposed to be fixed in 11gr1)
3. Windows OS (2003 in our case)
4. Run Java in the database

Symptoms
——–
1. Virtual memory of Oracle.exe increases continuously
2. If you run VMMap (sysinternals) you will see a massive allocation for Thread Stack space (we had 600Mb in thread stack with 20 active sessions!)
3. The thread stack breakdown will show lots of threads without an associated thread id

Solution
——–
1. Go onto metalink and read 1062406.1

The solution is an undocumented parameter and I am therefore not about to publish it in case somebody decides to set it without being advised to by Oracle Support.

I’m guessing a simple test case would be easy to do – x64 box, set up some java stored procedures, run them and keep watch on the virtual memory!

Comment by Tom — August 9, 2010 @ 2:06 pm BST Aug 9,2010 | Reply
- Tom,
  
  Thanks for the follow-up information.
  
  Comment by Jonathan Lewis — August 9, 2010 @ 3:32 pm BST Aug 9,2010 | Reply
Hi Jonathan,
One of the AWR reports of an application shows following statistics
```
Statistic	                                  Begin Value	 End Value
session pga memory max	709,926,624	764,105,568
session cursor cache count	11,892	                14,633
session uga memory	433,981,789,904	562,878,155,584
opened cursors current	208	                 251
logons current	                100	                104
session uga memory max	1,726,565,024	2,172,072,408
session pga memory	518,379,264	570,019,360
```
session uga memory seems to be very high . I have following questions :

1. what could be the reason of so high session uga memory , what does it really mean ?
2. session uga memory max is smaller then session uga memory – does this indicate any thing which we should fix or analyze.

Thanks !
Ajeet

Comment by Ajeet o — September 12, 2012 @ 7:31 am BST Sep 12,2012 | Reply
- Ajeet,
  
  Sorry I didn’t notice this one when it arrived.
  You don’t say which version of Oracle – not that that would actually have made any difference in this case, but it’s generally a useful detail to include.
  Given that the MAX is less than the current value I would be inclined to assume it’s wrong. As a general guideline, when the SYSTEM figures don’t seem to make sense I look at the session figures to see if there are any clues there.
  
  Comment by Jonathan Lewis — September 27, 2012 @ 12:48 pm BST Sep 27,2012 | Reply
[…] And Jonathan Lewis provides a script you can run if you are concerned about the potantial of Oracle PGA leaks. Over at Oraclue, Miladin Modrakovic shows how to discover memory “leaks and other problems […]

Pingback by Log Buffer #150: A Carnival of the Vanities for DBA’s — February 21, 2013 @ 1:37 pm GMT Feb 21,2013 | Reply
[…] https://jonathanlewis.wordpress.com/2009/06/07/pga-leaks/ […]

Pingback by Useful Tweets, blog posts etc I want to look at… « Oracle Mine…. — October 19, 2014 @ 10:18 am BST Oct 19,2014 | Reply
[…] 这两天看了下jonathan lewis大师的https://jonathanlewis.wordpress.com/2009/06/07/pga-leaks/的博客，然后延伸的一下读了下其他与Oracle内存相关的blog和MOS文章，做了一些实验，这里记录一下比较好的文章链接，后续google到其他文章进行后续更新。 […]

Pingback by Oracle内存相关文章 | leonotes — July 1, 2016 @ 6:12 am BST Jul 1,2016 | Reply
[…] lat temu Jonathan Lewis sugerował, aby większym zaufaniem obdarzać v$sysstat aniżeli v$process (https://jonathanlewis.wordpress.com/2009/06/07/pga-leaks/). Perspektywa v$process uwzględnia np. procesy, które mogą nie obsługiwać żadnej sesji (Dnnn, […]

Pingback by Podstawy: alokacja pamięci PGA – ExplainIT — August 9, 2016 @ 5:52 pm BST Aug 9,2016 | Reply
[…] PGA Memory (June 2009): checking process memory and process memory detail. […]

Pingback by Infrastructure Catalogue | Oracle Scratchpad — July 23, 2022 @ 8:35 pm BST Jul 23,2022 | Reply
[…] PGA Memory (June 2009): checking process memory and process memory detail. […]

Pingback by Troubleshooting catalogue | Oracle Scratchpad — July 23, 2022 @ 8:36 pm BST Jul 23,2022 | Reply