SUMMARY: Bizarre error messages

From: Grant Lowe (grant@doctord.com)
Date: Thu Dec 18 1997 - 09:32:52 CST


Hi everybody.

I got a number of responses. I want to thank everybody who responded.
First of all, I forgot to mention that I'm running Solaris 2.5.1 on a
UltraSPARC I. Now the original question:

> Yesterday, my main server stopped responding to commands. Entering any
command gave one of > two messages: "No space left" and "No more
processes". I tried telnetting in from another > machine, rlogging in,
su'ing from another user, and even logging in on the console. Nothing >
worked. Now this machine had been up for 75 days without being rebooted,
which may or may > not be part of the problem. Does anybody know what
causes these messages, and what I can do > to avoid this in the future?
Sun tech support said to reboot a machine about once a month to > clear
things up, but I would like to avoid that if possible.

The following people responded:

turoff@disaster.com
jearnest@lasmx.tinker.af.mil
kevin.inscoe@cbis.com
vogelke@c17mis.region2.wpafb.af.mi
lee@mailhost.sju.edu
chad@sequana.com
reynolds@illustra.com
igor@andrew.air-boston.com
tpenning@fedworld.gov
brion@dia.state.ma.us
rsk@itw.com
Glenn.Satchell@uniq.com.au
Kevin.Sheehan@uniq.com.au

There are several possibilities:

Some people suggested that I see how full my file systems are (such as /,
/var/, or /tmp). The / file system is 94 percent full (the whole partition
is 1 GB, so some cleanup would be in order. /var and / share the same
partition, but I don't think this is a problem (or is it?).

Another possibility is that the maximum number of allowable processes has
been reached. I'll be looking into this further. But considering how long
the system was up, I'm thinking that there were some runaway processes. I
would like say a special thanks to Michael Hill
(Michael_Hill@csgsystems.com) for his help. He's writing me a C program
that will help me monitor processes!

One thing that might be a problem is nscd. Apparently nscd can be
problemmatic. I'll be looking into this as well.

Another thing that was suggested is to run a cron job, monitoring each
hours disk space. I'm using a script from Karl Vogel
(vogelke@c17mis.region2.wpafb.af.mil) that is now in my cron file. Thanks,
Karl.

My original thought was that swap space and/or RAM had gone to zip.
Several people responded that that was a possibility. I don't know, but
I'll be keeping an eye on things.

Memory leakage is another thing. This doesn't seem likely given our
environment. A while back, when I took my Sun Advanced Sys Admin course,
the instructor said to use
/usr/ucb/ps -aux (a throwback to BSD), which I've been using periodically.
I haven't seen any signs of memory leakage, but it doesn't hurt to keep it
mind.

Well, that wraps this up. Thanks again to everybody!

grant



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:11 CDT