SUMMARY -- Crashing Sparc IPC

From: John Hasley (hasley@dad.bgsu.edu)
Date: Thu Jul 15 1993 - 13:10:25 CDT


Sorry for the delay, things have been hectic around here.

First, my original description of the problem.

> I've been having some trouble with a SPARCstation IPC (SunOS 4.1.2).
>
> The system in question has been crashing repeatedly (about once
> a night). A call to Sun gave the OK to open up the system and
> reseat the SIMMs; a couple moved into place, but it crashed the
> next night. Sun's next suggestion was to run a REAL "open system"
> (someone decided that it's cheaper to shut off the air conditioning
> at night, so the temperature swings between 68 & 82 F.) With the
> improved ventilation from removing the top, the crashes decreased,
> and the system will now reboot itself, but this is hardly what I
> would call a solution. Besides, it's still crashing.
>
> Turning on the AC isn't an option. Someone else controls it and
> even my boss hasn't been able to make enough noise for that. The
> stress may have damaged the system, but the crashes aren't at any
> specific time.
>
> I ran diagnostics for more than 30 minutes with no complaints. But
> replacing the motherboard seems like the next step.
>
> Would anyone care to make any comments on the following tracebacks?
>
> Thanks for your help.
>
> [[ tracebacks deleted because they were irrelevant ]]

Thanks to all those who responded (I think I sent thank-you's to everyone):

adam%bwnmr4@harvard.harvard.edu (Adam Shostack)
Andy Mitchell <afm@ufnmr1.health.ufl.edu>
paulo@dcc.unicamp.br (Paulo Licio de Geus)
steve@seattle.avcom.com (Steve Lee)
cfulmer@pnc-pimc.com (Catherine Fulmer)
vsh%etnibsd@uunet.uu.net (Steve Harris)
stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
Mike Raffety <miker@il.us.swissbank.com>

HEATING --
As was pointed out, the temperature fluctuations were within the
approved range. [ My fear was that temperatures at the high end of
the range, plus the daily fluctuations was placing an additional
stress on the system, shortening its life. What I've really been
looking for was a stick to beat some people over the head with.
The Sun service rep referred to it as "dumb", but nothing has
had any effect on those who control the air conditioning. (It's
like playing russian roulette, "hey, we haven't died yet.") ]

A couple people suggested putting a small fan next
to the system, to help the computer's own fan. I'm going to
look into that.

OTHER POSSIBILITIES --
It was suggested that a loose serial line connections (with login
enabled on that line) could cause crashes. [But the system has
no serial connections.]

Bad memory was another suggestion. [Since crashes were happening
frequently, it was easy to pull out a bank of SIMMs and wait for
the system to crash. All the RAM checked out OK.]

ANSWER --
The final solution came from Sun. (After some confusion as to who
was supposed to call whom, and some additional problems on this end.
Sun's response was quite good.)

I was told to fire up "adb -k vmunix.20 vmcore.20", then give it
the command "$<msgbuf". Pick out the PC location from the output,
in this case "0xf80abbec". Then run the following command:

0xf80abbec?ai
_idle:
_idle: _idle: sethi %hi(0xf8107800), %g1

When I sent that in, I was told that there was only one problem
that could cause that, and that was a busted chip. (I forget which
one, doesn't matter anyway. Some chip that diags didn't look at.)
The motherboard was replaced and the system hasn't crashed since.

John Hasley Internet: hasley@dad.bgsu.edu
University Computer Services UUCP: ...andy.bgsu.edu!hasley
Bowling Green State University BITNET: hasley@BGSUOPIE
Bowling Green, OH 43403-0125 MaBell: (419) 372-9989



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:00 CDT