SUMMARY: Ethernet bridges for Sun networks

From: Steve Groom (stevo@elroy.Jpl.Nasa.Gov)
Date: Thu Feb 13 1992 - 23:58:17 CST


Sorry this took so long to summarize.

To summarize the question:
>
> Looking for comments about Ethernet bridges, especially from Network
> Application Technologies (NAT). Also looking for specific information
> about the legend/myth of Sun Ethernet hardware violating Ethernet
> timing specs (by emitting "back to back" packets or similar).

To summarize the answers:

- Almost nobody acknowleged having NAT bridges. One guy talked about how
    nice they were, but from the description of functions that he gave,
    his is obviously a much more full-featured model than the ones
    we have.

- Lots of people said that heavy network load, and in particular,
    minimally-spaced packet trains, have been known to make bridges
    go bonkers. Nobody was able to suggest that a cure was likely,
    simply dump the bridges and instead use routers to subnet things.
    One person said that when they got a new Auspex everything went to
    hell because of the long stream of packets it put out, their (the
    customer's) bridges just couldn't keep up. Auspex would not recommend
    any bridges suitable for putting between their box and it's NFS clients.

- As for the "blame the Suns" rumors, the answers I got all confirmed
    that since it is that Suns' AMD (or Intel on older machines) ethernet
    chips that enforce the timing, cheating the spec is not possible
    unless the chips are broken. Lots of very good support for this claim.
    HOWEVER...

- One person said that they discovered that some of their Suns DID
    violate the specs. The problem was apparently that the crystal clock
    signal being supplied to the ethernet chip was off. This is
    easily classified as a hardware failure, but it still makes one
    wonder "hey, I wonder what the crystals on MY machines look like?"
    The positive thing to be said is that it's not a design flaw, but
    a hardware failure.

- A couple of folks suggested maybe checking the hardware of the network:
    termination, transceiver spacing, etc.

I had heard the long explanations about Sun packet timings before, but
what I needed was the complete story in it's original uncut form.
In other words, if it's just me repeating it, it doesn't as much
weight as it does when 30 people send me email saying so, including
some very respected folks.

We're now looking into subnetting. This may be tricky to do because
of our facilities layout (several buildings, some trailers), but
I think that the arguments to subnet things have been convincing.
Also, the bridge meltdown explanations tie in with experiences we've
had where adding one new bridge somewhere caused all kinds of NFS
timeouts.

I was extremely interested to hear about the experiences with the
ethernet chip's clock inputs that were bad. Maybe we'll take
a look at this just for kicks, and to learn how to check it as a
diagnostic procedure.

Many thanks a lot to all who responded. I didn't think that compiling
a list of names would be real useful to anyone, but I think that some
excerpts might be interesting, especially the one about the bad clocks.

-steve

**************************************************
Steve Miller:

Let me repeat once again: to the absolute best of my knowledge, Suns do
*not* violate the 9.6ms retransmit time restriction. If I remember
correctly, during Van Jacobson's experiments (y'all know, the ones where
he got 8-9 megabits per second (!) over TCP between two Sun-3/60s), he
actually put a scope on the wire and watched to be sure that the 9.6ms
interpacket gap was respected. It was. Given that SunOS can't ordinarily
deal with pushing bits that fast, I'd say that in any less stressful
situation it's quite unlikely that the interpacket gap restrictions will
be violated.

***************
David Boyd:

        In general, I have given up on bridges because of these problems.
Since almost all the networks we work with are TCP/IP I go almost exclusively
with IP Routers (CISCO is my preferred brand).
***************
Steve Miller (again):

  Sun does not violate Ethernet specs. I don't know that it is possible to
make the LANCE chip (I know less about the Intel) do things outside the
spec, though I suppose that I don't know for sure that making the LANCE
do something odd is impossible. I am fairly certain, however, that in the
process of doing all the performance tuning he's done, Van Jacobson at
LBL put an oscilloscope on his Ethernet, and even his massively-cranked-up
kernels did not violate any specs. What they *did* do is crank so many
back-to-back packets out that some DEC Ethernet hardware and/or operating
system goo (my definition of VMS (-: ) started to exercise some interesting
boundary conditions, shall we say...
**************
Hal Stern:

certain vendors insist on stating that sun violates the ethernet
spec. this is simply not true. those vendors historically have
made poor ethernet interfaces that were not capable of handling
more than 2 packets with minimum interpacket gap spacing.

all sun systems transmit packets with at least the minimum
interpacket gap of 9.6 usec. you can put the analyzer of
your choice on the wire and see this. the issue with
most bridges is how well they buffer back-to-back packets.
if you're running NFS you will get bursts of 6 back-to-back
packets, in groups of 5 bursts at a time, every time you
do read/write operations.
*************
Ed Arnold:

We got *royally* burnt here when we installed an Auspex. Auspex
is even more aggressive than Sun in the speed at which it transmits
packets, being able to push out long packet trains with the minimum
interpacket gap between them. They don't violate the spec, they
just push it to the limit.

This caused our bridges to fall apart. At first, everyone blamed
it on the Auspex. Now, we know the truth: most bridges are
worthless in any net that includes fairly new machines, because
they can and will drop packets when you feed them long trains with
the minimum interpacket spacing allowed by the spec. As of two
months ago, there was NO model of bridge which Auspex would recommend
to be placed between their server and one of its clients.
*************
Thys Antoine (thys@dsy-srv3.cern.ch):

Some time ago we had quite some problems with SUN Sparcstations2 on the
network here at CERN. The network is composed of all sorts of bridges,
repeaters and computers.
We blamed at first the multiport repeaters ( from DEC ). But, a very clever
technician here didn't belief it and investigated somewhat further. He
discovered that some of the SUN's have an ethernet transmission frequency well
out of spec.
AMD specifies a Xtal of 20Mhz +/- 0.005% (SIA Am7992B manual), which results
in a transmission frequency of 10Mhz +/-0.01% .
We measured : WS1 9997,918 KHz
              WS2 9998,063 KHz
and some others that were out of spec.
What happens is that long packets ( NFS !! ) coming from the workstations
which are out of limit are desynchronised in the repeaters ( the packets
have CRC errors), if the frequency is low this can happen for packets as short
as 650 bytes. We also could see that some repeaters are much more tolerant
then others.
It is quite easy to see if you have the same kind of problem by using ping
with long packets (2000 bytes) if you are losing packets try with short ones,
If it works with the short packets you might have the same problem. You should
measure the transmission frequency on the pin of the ethernet chip.
*************



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:36 CDT