My original message:
Our networks people just got a sniffer and now tell me that the four
4/60s on the segment they are looking at show _lots_ of CRC errors and
misaligns. These units connect to a Synoptics hub, at which the errors
are being detected. Apparently, Synoptics feels it is inherent to the Suns,
although I haven't yet talked to them.
The configuration is:
4 4/60s - 3 are diskless 8 MB, served by the fourth (16MB). All are running the
4.0.3 DL60 kernel
Does anyone know of a problem asociated with packet problems on earlier 4/60s?
We don't know if the problem shows on other models or not until other Suns
move to the same building, but they suspect that the problem, if it exists, is fixed in later SPARC models. These machines are heavily loaded (CPU) with Document Prep (Interleaf) activities, and thus do _lots_ of I/O to the server.
I'll summarize shortly after responses quit coming in.
Thanks,
Jim Hendrickson
UNIX Systems Manager and Network Epidemologist
FMC - Minneapolis
=============================
Thanks to all who responded. The responses follow.
Let me pass on the context of the problem; it illustrates why solutions are
difficult to develop:
The segemnt generating the interest has four Sun 4/60s, a few Macintoshes
and maybe a terminal server of some kind. The network is changing to a fiber
backbone with 10BASE-T to most users. Because of this, we do not have a feel
for what is "normal", but we have lots of network tools that bring us data on
this and that - misaligns and CRC errors are included.
Currently, the problem is that the workstations are "slower than they used
to be". From my standpoint, there are several things worth note:
- They added staff who use the workstations heavily
- A new project started that generated 40-75 MB of new or modified files
per day (from backup stats)
- The packet counts (from netstat -i) seemed relatively high for the type
of activity.
- When a typical workstation is running the publishing software, pstat
reports 8.5 to 11 MB of swap space in use (on an 8 MB machine).
I suspect that most of the perceived problems will go away with the addition of
more memory, which is in process. My concern was that I would be moving another
dozen or so machines to that building and don't want to discover major problems
then. From your responses, I feel assured that no disaster will occur.
Sun has been sympathetic and concerned, but report that they have no knowledge
of a problem in that area. I have been reluctant to swap out the hardware
since I can't believe all four units would fail the same way at the same time,
and unless the design has changed, the next ones should work the same way.
I'll re-summarize if any new data comes to light.
Jim
==========================================
----- Begin Included Message -----
From: dpk@fid.Morgan.COM (Doug Kingston)
We see lots of those errors here ast Morgan Stanley and it doesn't matter
what vintage of Sparc you have. We get it from all of them. (All the
ones connected with Synoptics anyways...)
Keep me informed on your investigations. Maybe if we all yell...
-------
From: mike@jupiter.nmt.edu (Michael Ames)
Sorry this letter sounds so vague, but if you don't get
any better answers, this may be of help. It seems to me that some
older sun sparc 1 workstations had an ethernet clock or other part
that tended to run on the "ragged edge" of the ethernet spec. Sun
came out with a hardware fix for it. As I recall, it wasn't something
that can be fixed in software. I seem to remember reading about it
when they first came out, and several people sent motherboards in
for the fix. I don't remember if sun charged for it or not.
--------------
From: tim@prism.nersc.gov (Tim Voss)
Is it the Sun-4/60s OR the Synoptics Boxes ? I mean whose on first ?
From: Tom Conroy <trc@ESD.3Com.COM>
We are currently running somewhere around 130 assorted Suns:
4/60, 4/65, 4/75, 4/20, 3/50, 3/60, 3/260, 3/280, 4/470 ...
Almost all are tied into Synoptics 3000 series 10BaseT hubs.
In most cases, when we see a whole bunch of assorted network errors on
our sniffers (sounds like a similar setup, no?) it can eventually be traced
to a single bad patch cord.
A very high number of the little RJ45-RJ45 Twisted Pair patch cords tend
to have at least two conductors in the wrong places. Both ends should look
exactly the same.
Hope this is a little help.
----------
From: ho@la.tis.com (Hilarie K. Orman)
One of the two 4/60's that I got had a faulty ethernet connection that
required replacement of the board. I noticed it because NFS performance
was so bad.
---------------
From: wolff@duteca.et.tudelft.nl (Rogier Wolff)
On the side I'd like to say that our experience with sparcstations tends
to lead us to the conclusion that the OS needs almost all of the standard
8M. The IO is probably paging.
(I agree - we're adding more memory JEH)
---- From: marke@Solbourne.COM (Marke Clinger)What did you find out about the misaligns and CRC errors? We had (have) a major problem with these here. We connected a Plexcom UTP box to our network, then all of our hosts started seeing CRC errors. We connect a cabletron UTP box to our network and saw the same thing (they fixed their box by putting current rev boards in it). We also put a synoptics box on the network, which didn't "cause" these problems. How old is your synoptics UTP box? Plexcom brought out a new card and the problem went away on after installing this card.
Did you ever find out what was causing your problems? If the synoptics UTP hardware is old I would ask for a current rev. Seems to have fixed the problems around here.
PS. I think you spelt Epidemologist incorrectly. My dict. says: epidemiology ---------- From: name deleted by request
Well... I agreed not to broadcast the fact, but I guess a specific response isn't a broadcast... Lets just say that you should pursue the problem with your hardware support folks and keep on them. Sun has experience resolving this problem. I hope you have a maintenance agreement on your machines.
(I'm pursuing it-no answers yet -JEH)
From: lewis@vuse.vanderbilt.edu (Lewis Saettel)
We have similar problems here at Vanderbilt University School of Engineering with 4/280 and other SPARCs that are *heavily* loaded. It seems the more clients they support or service they provide the worse the misaligned frames etc. There was a report a few weeks ago about someone else with 4/60s who had the same kinds of network errors. They claimed that they talked to SUN about the problem and SUN said it was a known problem and to fix it required replacing the ethernet card with some 3rd party ethernet card. This of course made no sense since SUNs ethernet card is part of the CPU. And later it was decided that the problem had to be with a bad transceiver or cable. I think the problem is real and may have been around as long as SUNOS 4.x.x, but I'm not certain of that. If you find any more sensible solution to the problem, please let me know what you did. I am certain that a significant portion of the bandwidth is being used up in retransmissions, etc.
From: robinson@porter.geo.brown.edu (Darrin Robinson) We are having similar problems with Sun SPARCstations 1+s, and Sun SPARCsystems 4/370s. We are also using the SYNOPTICS 10BASE-T (twisted pair UTP). Are NFS network is really slow and bad. I don't know if it's the wiring, or if it's just the Suns and the SYNOPTICS. We also have VAXen and IBM RTs connected via the SYNOPTICS. I'll let you know how it goes with us. Likewise, if you find out anything more, can you drop me a line?
----- End Included Message -----
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:14 CDT