SUMMARY: Killing processes locked on disk accesses.

From: Sandra Harimoto (kddlab!!
Date: Fri Aug 09 1991 - 01:13:13 CDT

My original posting was:
> We have occasionally had processes lock on disk access -
> a couple of times it was nemacs, a couple of times it was
> an NFS disk read. Each time the STAT field of ps was 'D'
> (Process in disk (or other short term) waits). But the
> process never seems to come out of it. It is also unkillable.
> So far, the only way I've found to get rid of such processes
> is to reboot. Is there any other solution?
> Vital statistics:
> System: Sun4/370 server + several sparcstation1's
> O/S: SunOS 4.0.3c

Well, just as I feared, the only way to get rid of the process
seems to be to reboot. Someone suggested trying to kill the
rpc.lockd's on both machines. Haven't had a chance to try it yet.

I think I did get the cause of the problem, though.
from :
> Oftentimes, especially with emacs flavours, we've found it it because
> the partition is full, or the account using it is full. If that's so,
> then freeing up space works very well.

and from
> there are many NFS bugs in 4.0.3[c] that cause processes
> to hang -- most of them have to do with the NFS client code
> going to sleep waiting for a page that was already freed up.

> - take much greater care to keep filesystems < 90% full.
> (this may be worth checking. I cannot remember seeing the
> nfsd's run into disk-wait except if there was very full
> filesystems.)

It is very probable that the problem is related to disk being full.
I think the locked processes coincided with the disk going to >98%

Solutions suggested are:
        Upgrading to 4.1.1
        Getting the "NFS Jumbo Patch for 4.0.3" from sun, which
         includes about 17 different bug fixes for this
         and related problems.
          Getting the lockd-patch
        Freeing up disk space.
I'm looking into these options now.

Upgrading to 4.1.1 may help although someone seems to be having
similar problems with it:

> Hi Sandra. We just installed three new Sparc2 fileservers running 4.1.1
> and have begun to experience the same problem. Occassionally, NFS reads
> will cause the nfsd processes to go into disk wait; one by one, all 8
> of our nfsd's succumb. Our installation is about as vanilla as it
> comes - pre-installed 4.1.1B. I don't have any solutions for you,
> except to say that we haven't seen the problem in a couple of days...
> We never experienced any of these troubles before we started playing with
> automount -- possibly connected? I don't know.

Thanks to the following people for responding:

Thanks, sandra
email: (We are not DEC!)

