My original posting was:
> We have occasionally had processes lock on disk access -
> a couple of times it was nemacs, a couple of times it was
> an NFS disk read. Each time the STAT field of ps was 'D'
> (Process in disk (or other short term) waits). But the
> process never seems to come out of it. It is also unkillable.
> So far, the only way I've found to get rid of such processes
> is to reboot. Is there any other solution?
> Vital statistics:
> System: Sun4/370 server + several sparcstation1's
> O/S: SunOS 4.0.3c
Well, just as I feared, the only way to get rid of the process
seems to be to reboot. Someone suggested trying to kill the
rpc.lockd's on both machines. Haven't had a chance to try it yet.
I think I did get the cause of the problem, though.
from email@example.com :
> Oftentimes, especially with emacs flavours, we've found it it because
> the partition is full, or the account using it is full. If that's so,
> then freeing up space works very well.
and from firstname.lastname@example.org
> there are many NFS bugs in 4.0.3[c] that cause processes
> to hang -- most of them have to do with the NFS client code
> going to sleep waiting for a page that was already freed up.
> - take much greater care to keep filesystems < 90% full.
> (this may be worth checking. I cannot remember seeing the
> nfsd's run into disk-wait except if there was very full
It is very probable that the problem is related to disk being full.
I think the locked processes coincided with the disk going to >98%
Solutions suggested are:
Upgrading to 4.1.1
Getting the "NFS Jumbo Patch for 4.0.3" from sun, which
includes about 17 different bug fixes for this
and related problems.
Getting the lockd-patch
Freeing up disk space.
I'm looking into these options now.
Upgrading to 4.1.1 may help although someone seems to be having
similar problems with it:
> Hi Sandra. We just installed three new Sparc2 fileservers running 4.1.1
> and have begun to experience the same problem. Occassionally, NFS reads
> will cause the nfsd processes to go into disk wait; one by one, all 8
> of our nfsd's succumb. Our installation is about as vanilla as it
> comes - pre-installed 4.1.1B. I don't have any solutions for you,
> except to say that we haven't seen the problem in a couple of days...
> We never experienced any of these troubles before we started playing with
> automount -- possibly connected? I don't know.
Thanks to the following people for responding:
email: email@example.com (We are not DEC!)
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:20 CDT