SUMMARY: Major Network Trouble

From: Leo Crombach (lcrombach@tropel.com)
Date: Sat Jun 14 1997 - 08:57:13 CDT


My Original Posting:

This has been a bizarre week so far. First, last week (Friday) our senior
sysadmin quit for a job in another state leaving me (a relative rookie) to
take care of our network. Wouldn't you know I come in Monday morning to two
crashed systems - one a server for our CAD workstations. Then, today, we
have been having major network difficulties. Everything is very slow and
some of the servers are generating the infamous "le0: No carrier - cable
disconnected or hub link test disabled? In addition, on one of our Sparc
330 (SunOS 4.1.1) servers I have noticed the following message:

        le0: Receive: giant packet from 0:a0:40:3:c6:ca
        le0: Receive: STP in rmd cleared

The ethernet address is not constant. I actually noticed this message first
a few weeks ago but it did not seem to be a problem at the time - and it is
infrequent. I was able to determine which machine corresponded to the above
ethernet address (a MacIntosh) but removing it from the network did not help.

Using snoop with no options I have noticed the following message appearing
frequently:

        hrisc34 -> odin RPC R XID=1 Program unavailable (low=1, high=2)
        odin -> hrisc34 RPC R XID=1 Program unavailable (low=2, high=1)
     hrisc34 -> odin RPC R XID=1
        odin -> hrisc34 RPC R XID=1 Program unavailable (low=1, high=1)
     hrisc34 -> odin RPC R XID=1 Program unavailable (low=2, high=2)

The constant here is odin. Also, if I run snoop -v broadcast it will die
out with either "Segmentation Fault(coredump)" or "Bus Error(coredump)"
depending on which machine I run it on. However, it always dies at the
following point:

        RIP: ----- Routing Information Protocol -----
RIP:
RIP: Opcode = 2 (route response)
RIP: Version = 1
RIP:
RIP: Address Port Metric
RIP: 192.10.10.0 metnet 0 1
 
Bus Error(coredump)

I searched the Sun Manager archives and found many similar postings;
however, it seemed that in most cases the problem was solved in different
ways. I do not have a network sniffer - all I have is snoop. Any
suggestions for tracking this problem down would be greatly appreciated.

********************************************************************************

The general consensus was that there is a problem at the physical layer: a
bad transceiver, ethernet card, wire/cable, hub, etc. The responses I
received are supported by many previous postings in the sun managers
archives (www.latech.edu/sunman.html). Some other suggestions were to
contact www.cert.org and to apply patch 101954-07 on the SunOS 4.1.1 machine
to correct the "giant packet" error message.

After everyone went home for the day, I began my search to try and determine
where the culprit was located by first removing entire segments from the
network and then working my way down to individual machines. I ended up
replacing one transceiver, two data cables, resetting the main hubs, and
rebooting several servers. Also, I found a reference in the archives about
SQE (Signal Quality Error) switches on the transceivers. We have a few
transceivers from ACSYS with this feature so I checked to verify that the
switch was set correctly. Everything seems to be o.k. right now but, I
really don't know if any of the above items were responsible or not. I'm
sure I'll found out if the trouble-maker is still out there.

Special thanks to all the following individuals for their suggestions:

David Fetrow
Erwin Fritz
Bruce Cheng
Aggeliki Karabas
Gnuchev Fedor
robin.landis@imail.exim.gov
Bismark Espinoza
Raymond Fagnon
K.Ravi
****************************************************************

Leo Crombach
System Administrator
Tropel Corporation
60 O'Connor Road
Fairport, New York 14450
(716)388-3566
lcrombach@tropel.com

****************************************************************



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:56 CDT