SUMMARY (Interim #2) Mysterious Sun Hangs

From: James D. Watson (JW1675A@american.edu)
Date: Thu Jan 06 1994 - 22:42:00 CST


Hi folks -- back, I don't know how long ago, I posted with a problem
of our machines hanging. I posted one interim report, and this is the
second. We are getting places. The problem is actually being worked
by other folks at work, but since I posted the original problem I
am following up for you. Also, Sun is working with us, in the
form of Tim Smith. He reads the list, and Tim - if you want to
add more to this, please feel free as you are more intimate with
the details than I. I got this second hand and I'm not in the
office this week to check the details, but here goes:

What our problem *looks* like is an infinite loop in the socket
buffer routines which appears to be caused by what looks like
a broadcast packet sent from a Solaris Compartmented Mode
Workstation (CMW). If it sounds like I'm hedging this
post with "looks like" and "appears", well I guess I am <:-)

Sun is doing some additional research into this. I will
keep you posted.

Jim Watson
Acting Technical Officer, System Programming Shop
Defense Intelligence Agency
Bolling Air Force Base
Washington, D.C. 20340
--- Standard disclaimers apply ---

------------- Original note follows ----------------
To: sun-managers@eecs.nwu.edu

Hi folks. Mixed environment:
Suns (4.1.x)
DECs (ULTRIX v?)
IBM 3090s
DOS/Novell
Macs (A/UX v?)
IBM Risc 6000 (v ?)
ethernet, with some routing and subnetting

Our problems started with a Sun 690MP running SunOS 4.1.3, no patches. Often
(sometimes twice/more a day) the machine just _locks up_ so hard a keystroke
interrupt (L1-A) will not clear it. We normally generate an interrupt, then,
by unplugging/plugging the keyboard cable. We've taken several dumps and
portmapper is the last beast running.

We recently started transitioning our massive DOS/Novell installed base to
UNIX (Sun SPARC2s 4.1.3, mostly, some DECs, some Macs). Some of our new
SPARC2s are also hanging. (We've stopped taking dumps on them, but saw
portmapper in instances there as well.)

Interestingly, one group of machines on the same subnet as the 690MP, running
SunOS 4.1.2 did _not_ hang. At least until last week when we upgraded to
SunOS 4.1.3. Since then, three out of four upgraded machines have hung.
(No dumps taken.)

          We see a message about "giant packet received from xx:yy:zz:...:"
and the STP cleared message. These are often the last messages on a hung
machine's console. We're seeing lots of these giant packets around the
net, but not always generating a crash.

Some folks think the following: Novell's sending out some IPX packets
which look like TCP/IP packets so the Suns are picking them up and trying
to do something with them and end up crashing.

Sun doesn't know _what_ the deal is. But, they are looking.

We've also got some developers on the net. Maybe doing something with
RPCs. We've traced RPC connections and found nothing out of the ordinary.

Can anyone speak to this problem? Much obliged. Summary, as usual.

Thanks, Jim
---------------------------------------------------------------
Acting Chief, Network Operating Systems Shop
Defense Intelligence Agency
Bolling AFB
Washington, DC 20340
---------------------------------------------------------------
Standard disclaimers apply.
------------- End Original Note ----------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:53 CDT