SUMMARY: lockd log messages

From: Curt Peterson (curt@mrbill.is.lmsc.lockheed.com)
Date: Sat Sep 02 1995 - 06:45:27 CDT


Sorry it took so long for me to summarise.

Thanks to the 3 that responded:

1. Irana Whitaker-Patel
2. miket@Seagate.COM
3. Melissa Metz

**************** START of my original question ****************

I'm running Solaris 2.3 on my Sparcserver 1000.
I'm getting the following 2 lines in my /var/adm/messages file
every 15 seconds:

Aug 29 13:15:32 trinity lockd[1024]: _nfssys: error Stale NFS file handle
Aug 29 13:15:32 trinity lockd[1024]: lockd: unable to do cnvt.

This started to happen after the last reboot (yesterday). I have
no entry's in /etc/vfstab to mount any nfs file-systems, but I
do export/share many file-systems. I don't
even know if lockd should be running. Could someone explain to me
what lockd is for, and suggest whether it should be running on my
file-server (trinity) or not? If it should be then why is it all
of the sudden complaining? If it shouldn't be then why did it get
started? I did run "/etc/rc2.d/S73nfs.client stop" and the messages
stopped. Then I ran "/etc/rc2.d/S73nfs.client start" and the
messages started again.

***************** END of my original question******************

My problem cleared itself up (the offending machines must have been
rebooted) before I got any responces, but
the responses did explain what was happening, and I'll be
prepared if this happens again.

the responces are short enough that I'll list them all:

********************** START OF RESPONSE #1 ****************************

Hi there,

Yes you certainly should have a lock running on your SPARCstation.

root 127 1 80 Aug 10 ? 0:03 /usr/lib/nfs/lockd

you may need to apply a jumbo kernel patch 101318 as there are various
bugs with the lock daemon.

You may simply need to reboot your server and client, depends really
on what has happened.

rpc.statd and rpc.lockd manage advisory lock on NFS filesystems.
These daemons are started automatically at startup.

Hope this helps.

Irana Whitaker-Patel

********************** END OF RESPONSE #1 ******************************

********************** START OF RESPONSE #2 ****************************

lockd is required on nfs servers to manage nfs client file locking. If
the server happens to be rebooted while the client has an open and
locked (nfs)file, the new lockd on the server (on the subsequent
reboot) will complain. It is trying to convert the clients nfs file
handle to a local file descriptor. In 4.x the only workaround was to
have the client remount the file system. To identify the client, kill
lockd and restart it with "-d 3" command line options. This will spew a
bunch of debugging output which will make a reference to the client.
In 5.3 the kernel jumbo patch supposedly addresses this problem. Here
is part of the readme for the patch:

Patch-ID# 101318-70
Keywords: security kernel kadb libc lockd libaio automountd sockmod automounter ypbind
Synopsis: SunOS 5.3: Jumbo patch for kernel (includes libc, lockd)

********************** END OF RESPONSE #2 ******************************

********************** START OF RESPONSE #3 ****************************

Below is our local writeup for solving these "cnvt" errors, plus a
short script we use. It was written for SunOS 4.1.3, which has
slightly different error messages, but should be applicable to
Solaris.

Yes, you do want lockd on your NFS server, so that your NFS clients
can lock files.

                                        Melissa Metz
                                        Unix Systems Group

========================================
lockd-cnvt-error.txt:

Problem: spewing errors about: fcntl: Stale NFS file handle
                                rpc.lockd: unable to do cnvt.

Diagnosis:
A client of this NFS server has a stale file handle (one which no
longer matches the state of the disk) open and locked.

Solution:
kill the offending client process, or reboot the client.

Procedure:

on server:

- /sh/sy/subsys/scripts/efindlockmgr

This will run rpcinfo -p, find the lockmgr processes/ports, and then
run etherfind on those ports.

Look for the host that shows up again and again, this is the culprit
client.

Try to find and kill a process on that client which would be accessing
this NFS server. Or reboot the client.
========================================
efindlockmgr:
#!/bin/sh
# run etherfind for lockmgr processes on the specified host (or local host)

if [ $# = 1 ]; then
    host=$1
else
    host=`hostname`
fi

# get the port/protocol via rpcinfo, set up as etherfind args
eargs=`rpcinfo -p $host | awk '/lockmgr/ { print "or -host HOSTNAME -proto " $3 " -dstport " $4 }'`

# replace the hostname, remove an extra "or" (fencepost error)
eargs=`echo $eargs | sed -e s/HOSTNAME/$host/g -e 's/or//'`

# run etherfind
echo running etherfind for lockmgr processes on $host
etherfind $eargs

********************** END OF RESPONSE #3 ******************************

Thanks again

-Curt Peterson- (curt@lmsc.lockheed.com)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:32 CDT