preSUMMARY: Problems with SunOS 4.1.4 Networking

From: Jim Harmon (jharmon@telecnnct.com)
Date: Mon Jan 12 1998 - 15:35:41 CST


Thankyou all who have responded. I haven't solved the problem yet, (no
repeat crashes since I posted) but the suggestions given were excellent.

Here's the basic set of answers, followed by the respondant's names,
then the original quesion. If we resolve the problem itself, I'll
forward that in a followup Summary.

(in no particular order)
--------------------------suggestions---------------------------------

[Use system logging to check ofr anomalies]
[Use extended tools to monitor system condition]

May be you can look into syslog ,dmesg,/var/adm/messages
for any unusual thing happening with your system (disk
error or le0 errorrs )check the daemons ( using top or
similar software tools)to check any process occupying
good amount of CPU or memory. (tryout vmstat,iostat and
nfstat).

--

[Verify LAN condition, connections.] [Observe changes by moving suspect to new/different subnet]

Initial checks would be probably to check the number of workstations on this sub-net & see how any other WS on the same subnet performs,like the collision rate..& if there are more that 50-60 WS than try moving this system to a different subnet & see the performance.

As per the data shown here the collision rate is something like 5% which is OK.A saturated network is like 8% or more.

--

[Verify LAN condition] [Review CERT warnings for potential compromise signatures/responses]

Have you investigated the condition of your network ? Excessive collisions are usually an indication of network hardware problems, and/or some of the denial-of-service attacks that have been mentioned in recent CERT advisory messages.

--

[Error Rate is more important than Collision Rate] [Investigate possible external sources of error count/collisions.]

More important than the collisions are the errors. These generally indicate a bad network card though not necessarily on the Solaris system. Under normal circumstances, you should not see any errors. Hopefully someone else will have a suggestion as to how to isolate which system.

--

[R&R (remove/replace) I/F Card]

This sounds suspiciously like your network card is failing. I'd replace that if I were you. I bet it fixes the problem.

--

[External influence, killing system.]

It looks like someone else on your network is behaving badly; that many errors, in surges, looks like some other node is periodically dropping a whole lot of packets on the net and not caring about congestion.

Suggestion : use etherfind to look for a node sending out broadcasts. If you can get a sniffer or a copy of 'etherman', start it up and watch for surges.

--

[Verify/install PATCH]

Perhaps patch 102430-02 might help.

Patch-ID# 102430-02 Keywords: macio le hard hang FSBE ss5 ss10 rdump sun4m Synopsis: SunOS 4.1.4: le patch that fixes sun4m ethernet hang problems Date: Jul/26/95

--

[Observe error rate.] [Suspect induced problem, not inherent problem]

The collisions may or may not be a problem. Don't get fixated on that. However your errs count is very, very scary. You should not be within a factor of 100 of that!

I don't know anything about what else is on your network, but it is unlikely that there is anything wrong with your SS20. Far more likely that you have a problem with cabling, or with hubs, or with a bad NIC on a PC somewhere. What's changed recently?

Try etherfind on a couple of different hosts (snoop is better if you have it), look for unexpected traffic. Pull and re-seat everything. Is the problem isolated to this one host? Swap all networking HW related to it.

--

Additional note: UNIX GURU Mailing List posted a "Tip of the Day" last week that applies here:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

UNIX GURU UNIVERSE UNIX HOT TIP

Unix Tip #372- January 7, 1998

http://www.ugu.com/sui/ugu/show?tip.today =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

TALKING NFS 3 WHEN I ONLY TALK NFS 2

Most new versions of NFS are now talking NFS 3. With many systems still talking NFS 2, the newer system will eventually timeout if a mount from and NFS 2 system is attempted.

The NFS 3 system will eventually fallback to the NFS 2 protocol, but to make life easier and quicker, especially when booting and mounting these types of filesystems add the following create your mount points with the "nfs2" entry added to the /etc/fstab or the /etc/vfstab file:

#============================================================= # filesystem directory type options frequency pass #============================================================= foo:/usr3 /usr3 nfs2 rw,bg,hard,intr 0 0

>From a shell the line would read:

# mount -t nfs2 foo:/usr3 /usr3

------------------------------------------------------------------------ To unsubscribe to this list, mail to tips@ugu.com Subject: unsubscribe tips ======================================================================== ---------------------------respondants------------------------------

Thanks again all!

Jerome A Joseph j_alphonse@hotmail.com Chenthil Kumar chenthil@lucent.com Ronald Loftin reloftin@mailbox.syr.edu Harry Levinson levinson@ll.mit.edu Erwin Fritz efritz@glja.com http://www.glja.com John Reynolds reynolds@informix.com Mark Henderson mch@squirrel.com> Jay Lessert jay_lessert@latticesemi.com

---------------------------Original quesion:------------------------

Jim Harmon wrote: > > Running SunOS 4.1.4 on SPARC20, 2 CPU, 128MB Memory. > DNS/NIS Host. > > 3 times in the last 2 days our network has crawled to a stop. > > The first two times it seemed to take a reboot to clear the network > problems, the last time, with some additional research in the Solaris > 2.6 Answer Book, we found a few things to check that led us to > > ifconfig le0 down > ifconfig le0 up > > Which cleared the net and allowed rebooting. > > Now, watching with > > netstat -i <#> (using # = 5 sec interval) > > I'm seeing collisions appearing at a larger rate than typical. > (Normally we only see about 1-2% collisions over 2-3 weeks.) > > Now I'm getting: > > input (le0) output input (Total) output > packets errs packets errs colls packets errs packets errs colls > 105649 8 144505 618 8688 119358 8 158214 618 8688 > 117 2 79 9 30 117 2 79 9 30 > 67 0 51 7 35 67 0 51 7 35 > 48 2 34 10 29 48 2 34 10 29 > 35 0 31 10 25 35 0 31 10 25 > 105 0 82 16 56 111 0 88 16 56 > 154 0 149 3 52 160 0 155 3 52 > 103 0 129 2 67 124 0 150 2 67 > 277 0 220 1 94 461 0 404 1 94 > 91 0 54 2 24 97 0 60 2 24 > 198 0 267 0 68 198 0 267 0 68 > 113 0 71 0 19 119 0 77 0 19 > 122 0 144 0 28 128 0 150 0 28 > 209 0 185 0 48 211 0 187 0 48 > 183 0 155 0 48 193 0 165 0 48 > 164 0 206 1 34 166 0 208 1 34 > 123 0 149 0 28 127 0 153 0 28 > 184 0 180 0 40 190 0 186 0 40 > 92 0 58 3 13 98 0 64 3 13 > 96 0 65 8 47 98 0 67 8 47 > > every 2-3 screens, where the rest of the time I'm getting 1 or 2 > collisions per screen. > > With this going on, my access to search the archives is limited (When > this host hangs, our entire net hangs). > > Here's my kernal info: > > *************** showrev version 1.15 ***************** > * Hostname: "<system>" > * Hostid: "<xxxxxxxx>" > * Kernel Arch: "sun4m" > * Application Arch: "sun4" > * Kernel Revision: > 4.1.4_DBE1.4 (<SYSTEM>) #5: Mon Mar 24 15:43:35 EST 1997 > * Release: 4.1.4_DBE1.4 > ******************************************************* > > Can anyone suggest any patches I may need or what else I can check to > find the source of this problem? I'm looking everywhere, but some > pointers would be greatly appreciated! > > TIA > > -- > Jim Harmon The Telephone Connection > jim@telecnnct.com Rockville, Maryland

-- Jim Harmon The Telephone Connection jim@telecnnct.com Rockville, Maryland



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:29 CDT