SUMMARY: Swap space leak on Clustered E450's with Solaris 8

From: <SmithBD_at_crane.navy.mil>
Date: Tue Jan 06 2004 - 14:36:46 EST
Thank you to both Val Popa and Mr. Krenzischek for their help on this issue.  I've setup a script to grab the output of the swap -l command on regular intervals to determine if we are actually seeing "shrinkage" or not.
After I look at that data, I'll see if I need to go for some of the tools that Ryan speaks of.

Thanks!

Here is Val Popa's reply:

To see the actual swap the correct command is : swap -l. If this command shows 
swap=0 then and only then you have run out of swap, else, read below

If df -k shows that /tmp is getting full, does not mean that you're running
out of swap, rather /tmp is beeing accessed by someone/something else and 
perhaps a log or some sort of file gets created, which will cause df -k to show 
/tmp at 100% or something allong these toughts.
To verify do this:
cd /tmp
du -sk *
See the sizes and you have found where the bottleneck is.
Go there and trace it back to what caused it.

V

---------------

And the one from Mr. Krenzischek:

Check out memtool at http://playground.sun.com/pub/memtool

Also try running the BSD ps under /usr/ucb.  Pay particular attention to
the MEM and RSS columns.  The RSS size is the resident size defined for a
process in RAM.  You should make sure that these numbers are within
reasonable size.

The other item you might want to take a look at is what programs access
file systems mounted as type tmpfs.  It might not be a memory leak.  A
program that might be writing to /tmp might be unlinking a file without
first releasing a open read/write fd.

Have you considered running sar?  You can record events then play them
back in realtime to exactly diagnose the time (e.g. if a certain batch
process runs) for which the most swap pages are requested.  Eventually,
those pages should be returned after a process finishes up.

And of course, those pesky developers always have a tendency to forget
that they implemented a change.  Have you verified with your
development/applications group if anything has recently changed?  For
example, I manage certain boxes but the DBAs manage sybase/oracle.  They
can install a new version of ASE or Oracle RDBMS without my assistance.

Check your crontabs.  I have had instances where I wrote scripts to
monitor a process and it just kept on re-spawning itself.  Unfortunately,
it took a 6-8 of hours for it to be noticable so it was not apparent at
first that a small script was not properly exiting and releasing the
memory.  Over time, that does increase.

I hope this helps.  Good Luck.

Ryan


Brian D. Smith


-----Original Message-----
From: Smith Brian D CONT CNIN 
Sent: Tuesday, January 06, 2004 1:35 PM
To: sunmanagers@sunmanagers.org
Subject: Swap space leak on Clustered E450's with Solaris 8


We have noticed the following problem on nearly every one of our Sun Cluster 2.2 clusters.
Each cluster is a three node cluster, with each node being an E450 running Solaris 8.  They have been running in this configuration for several years.
We have recently noticed that the swap space shrinks over time.  By this, I mean that when you do a 'df -k', the total space for swap gets smaller and smaller.  Eventhough Sun support doesn't believe us, we ARE NOT seeing a process USING all of the swap space, we are seeing the actual total amount
of swap space shrinking.
The swap space will eventually shrink to the point that no swapping or writing to /tmp can be done at all.
I've looked through every FAQ, manual and website that I can find on the subject, but find nothing on the shrinking swap space.  Thus far Sun support has been of no help.

I will summarize after I have received replies.

Thanks,

Brian D. Smith
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Jan 6 14:36:39 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:27 EST