Summary of responses to: NFS activity; ethernet problem

From: Mike Hannon; UCD Physics; (916 (MIKE@ucdphy.ucdavis.edu)
Date: Thu Oct 18 1990 - 16:17:55 CDT


On 4 Oct. 1990 I wrote to this list, requesting help with two problems:
(1) NFS daemons on a Sun 4/280 server consumed abnormal amounts of CPU time.
(2) Same server had numerous ``ie0: no carrier'' messages.

I received a number of responses, the gist of which seemed to be as follows.
(1) Abnormal NFS activity.
    This can be caused by a process which goes into a ``paging'' loop over
    NFS. This happens when an executable image is recompiled while the
    previous version of the executable is still running (typically in the
    background). Look for a huge value in the PAGEIN field displayed by
    ``ps -gavx'', for example. This appeared to be the cause of my problem.

    It was also mentioned that there are various utilities which help with
    NFS problems, including Sun's ``traffic'' (a Sunview utility, rumored
    to be somewhat undependable), and NFSwatch, written by Dave Curry at
    SRI. I was told that NFSwatch is available for anonymous ftp from
    syn-gate-gw.synoptics.com, in either ~ftp/pub or ~ftp/tmp. I was unable
    to reach that machine, so I can't comment on the utility.

(2) ie0: no carrier
    Someone pointed out that this error message is mentioned in the manual
    entry for ``ie''. Most people noted that this can be caused by flaky
    connections, typcially, but *not neccessarily* on the machine where the
    error is reported. Several people recommended using some kind of a network
    analyzer, which I think is a good idea, but which I haven't yet done.

    Other people mentioned that this message can be caused simply by an
    overloaded network.

Below I've listed the individuals who responded, along with a very short
description of each response. Please contact me if you want the unexpurgated
version of any of these messages. Also, if you responded, and you feel I've
badly misrepresented your position, please send me a note.

(1) Steven Blair (sblair@synoptics.com>
        Get NFSwatch from syn-gate-gw.synoptics.com.

(2) Ron Vasey (uunet!mcc.com!vasey)
        Maybe a leak in the network caused by poor connections or grounding
        problems; might want to do TDR testing.

(3) Laura Pearlman (pearlman@rand.org)
        UFS uses lots of the server's memory; problem is related to file
        caching in SunOS; use vmstat and look for large ``sr'' number.

        File reference count goes to zero after *each* NFS write; SunOS
        will examine every page in cache; bad if large files; use vmstat
        and look for large ``at'' (~10,000).

        SunOS is very inefficient in file lookups if any single component
        of the pathname is more than 15 characters in length.

(4) Bill Eshelman (wde@agen.ufl.edu)
        DB-15 connector prongs are spaced wrong, preventing full contact.

(5) Daniel Trinkle (trinkle@cs.purdue.edu)
        Could be loose transceiver cable or just a busy network. Network
        General Sniffer does good job in such cases.

        Can also try ``traffic'', ``emon'' (a program used at Purdue
        and elsewhere), or ``nfswatch'' to monitor network via software.

(6) Roland Schneider (sch@eeserv.ee.umanitoba.ca)
        Probably caused by excessibe paging; look at ``ps -gavx'', find
        large PAGEIN, then ``kill -9''. Caused by overwriting an executable
        while it's running.

(7) Joe Pruett (tessi!joey@nosun.West.Sun.COM)
        Check for cable problem with ``netstat -i''; look for error count
        greater than 1%.

(8) Ed Morin (mdisea!fh20c!edm@uunet.UU.NET)
        Transceiver cables don't fit well on Sun servers; remove washers
        under binding posts on the cable for a better fit.

(9) Bala Vasireddi (bala@sysopsys.com)
        Problem may be caused by recompiling executable while previous version
        is still running. Use ``traffic'', with ``src'' and ``dest'' in
        different windows; then run ``ps -aux'' and compare MEM and RSS cols.
        RSS will be small (600k) compared to MEM (10MB).

(10) Colin Alison (colin@cs.st-andrews.ac.uk)
        The nocarrier message can be caused by aggressive network traffic.
        They stopped running rwhod to get rid of it.

(11) Alastair Young (alastair@es2.co.uk)
        ``ie0: no carrier'' can be indicative of dodgy hardware causing
        dropped packets. Analyze using a network monitor/reflectometer.

(12) Ron Madurski (ron@DRD.Com)
        Make sure there are no getty's trying to get a non-existent terminal
        to login. Try to isolate section of network that's giving errors;
        maybe use a Lanalyzer.

(13) Debbie Eckel (deckel@relay.nswc.navy.mil)
        She has similar problems. Points out that man page for ie describes
        the problem.

In addition, there were a few other people who responded that they had
experienced similar problems. I hope that this summary has answered some of
their questions.

                                        - Mike
-----------
Mike Hannon mike@ucdhep (Bitnet)
ucdhep::mike (HEPnet) 42385::mike (HEPnet)
mike@ucdphy.ucdavis.edu (Internet) 916-752-4966 (Telephone)



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:05:59 CDT