SUMMARY: Slow NFS performance between subnets

From: Timothy P. Peterman (timothy.p.peterman@lmco.com)
Date: Tue Apr 20 1999 - 07:43:10 CDT


The problem turned out to be a network switch that needed to be
reset. Thanks to Vincent Campbell <campbell@tell.ascom.ch>
who provided the information below that helped alleviate some of the
symptoms until our Network Group was able to diagnose the root
cause of the problem. Thanks also to Len Rose <len@NETSYS.COM
who suggested using snoop and nfsstat to further investigate the
problem.

Vincent Campbell wrote:
> We had a similar problem in a mixed Sun/HP environment. It
> would take ages to copy files over NFS from Sun to HP. The
> solution was found in the SunService Tip Sheet for Sun NFS
> ... a side effect for us was occasional "NFS Server not
> responding" messages.
>
> Q: Why do I get the following error message:
>
> NFS Server <server> not responding
> NFS Server ok
>
> A4: Try cutting down the NFS read and write size with the NFS mount
> options: rsize=1024,wsize=1024. This will eliminate problems with
> packet fragmentation across WANS, routers, hubs, and switches in a
> multivendor environment, until the root cause can be pin-pointed.
> THIS IS THE MOST COMMON RESOLUTION TO THIS PROBLEM.

Original Post:
> We are experiencing some unusual slowness during
> NFS file access between subnets, that is when the
> Workstation is on one subnet, and the NFS server where
> the file resides is on another. For example, to grep though
> a 500KB file within the same subnet typically takes less
> than one second, but from a host on another subnet it
> can take up to 10 minutes. Normally it is a bit slower, but
> would still take only a second or two. Other protocols
> do not seem to be affected (ftp, http, X, etc.). We have
> both SUN and HP workstations on the network in an NIS
> environment. I have tried disabling NIS on a client and
> directly mounting the remote file system (rather than using
> automount) but the results are the same.
>
> Another symptom is that "ping -sRv" to a host on another
> subnet hangs. "ping -sRv" within a subnet or even out to
> our corporate intranet works fine. Here is part of a truss
> output from the ping command (repeats until ping is
> interrupted):
>
> alarm(1) = 0
> setcontext(0xEFFFF7D0)
> Received signal #14, SIGALRM, in getmsg() [caught]
> getmsg(4, 0xEFFFFC48, 0xEFFFFC54, 0xEFFFFC34) Err#4 EINTR
> putmsg(4, 0xEFFFF5A4, 0xEFFFF508, 0) = 0
> alarm(1)
>
> Running plain old "ping -s <host>", however, shows normal
> transit times, typically around 1-2 ms. Netstat does not show
> unusually high collisions, and the default route is what it
> should be.
>
> According to our Network Admins, there have been no
> configuration changes to the router. The router has
> been rebooted to see if that would fix the problem, but
> it did not.
>
> Any help will be appreciated.

-- 
Tim Peterman - Unix & Web Server Administration
Lockheed Martin GES/EIS, Moorestown, NJ
mailto:timothy.p.peterman@lmco.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:18 CDT