SUMMARY: Stale NFS file handle Problem

From: Hong Trac (hongt@sa-cgy.valmet.com)
Date: Tue Jun 20 1995 - 14:20:42 CDT


Hi everyone,

First I would like to thank the following persons for giving me help
to solve the problem. Once again, "sun-managers" has come thru:

        Nico Garcia, raoul@mit.edu
        Yves Hardy, yves@suntech.abcomp.be
        Melissa Metz, melissa@columbia.edu

and others who might be sending reponses after this summary.

* The Problem: On a SPARC 20 file server, sunos 4.1.3_U1, console
  =========== gives the following messages every few seconds:

        fcntl: Stale NFS file handle
        rpc.lockd: unable to do cnvt

* Solutions: (replies are included at end)
  =========

  I followed Yves's suggestions to run rpc.lockd in debug mode (-d) and
  was able to identify 2 Sparc5 clients (Solaris 2.4) that are causing
  the "Stale NFS" errors. The problem went away after rebooting these
  2 clients. I will get patch #100075 later.
 
  I haven't tried Nico's and Melissa's suggestions yet but I will
  the next time. Melissa has a nice script that helps find the
  client (if Mellissa doesn't mind, I can send to you if requested).
  
  Thanks again Melissa, Yves and Nico.

Hong Trac

======================================================================
Hong Trac Valmet Automation (Canada) Ltd.
Phone: (403)-253-8848 10333 Southport Road S.W.
Fax: (403)-253-2926 Calgary, Alberta T2W 3X6
Email: Hong.Trac@sa-cgy.valmet.com Canada
======================================================================

* Replies:
  =======

1. From Nico Garcia:
   ----------------

Try, on the machines exporting the directories in question, running
"exportfs -av;exportfs -v". This flushes the state of NFS interactions
by turning *off* NFS exporting, then turning it back on (I think this
is how I fixed it last time).

Fair warning: NFS does not work well

You can also unmount the NFS imported directories on the client machines
by hand: this may also help.

                                Nico Garcia
                                raoul@mit.edu

2. From Yves Hardy:
   ---------------

   First verify that the appropriate lockd patch is installed on all
   clients and servers on the network.

   SunOS 4.1.X Patch# 100075
 
   On the system reporting the errors, kill and restart rpc.lockd:
        
        #rpc.lockd -d 3

   The "-d" option will put rpc.lockd indebug mode. This may pinpoint a
   client on the network which is trying to access a nonexistent file
   or directory on the file server. Once a client is identified, correct
   the mounts on the client so that it properly accesses the file server.

   Regards,

   Yves Hardy from Belgium.

3. From Melissa Metz:
   -----------------
 
   Here is our internal note about the "unable to do cnvt" errors:

    Problem: spewing errors about: fcntl: Stale NFS file handle
                                    rpc.lockd: unable to do cnvt.

    Diagnosis:
    A client of this NFS server has a stale file handle (one which no
    longer matches the state of the disk) open and locked.

    Solution:
    kill the offending client process, or reboot the client.

    Procedure:

    on server:

    - /sh/sy/subsys/scripts/efindlockmgr

    This will run rpcinfo -p, find the lockmgr processes/ports, and then
    run etherfind on those ports.

    Look for the host that shows up again and again, this is the culprit
    client.

    Try to find and kill a process on that client which would be accessing
    this NFS server. Or reboot the client.

    I will include our "efindlockmgr" script below.

                                        Melissa Metz
                                        Unix Systems Group

---------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:27 CDT