SUMMARY: Sunblade 100 freeze with Solaris 9

From: Tim Kirby <trk_at_cray.com>
Date: Fri Mar 18 2005 - 16:17:01 EST
The original posting:

> We have a number of sunblade 100's and sunblade 150's that were originally
> running sunOS 5.8. Some of the early 100's exhibited an annoying lock-up
> bug that eventually seemed to be fixed by an OBP upgrade.
> 
> All was fine until we finally got around to upgrading all the machine to
> Solaris 9. Since then we have maybe four of the 100's again exhibiting
> this lock-up - frozen hard enough that only holding down the power key
> will get its attention.
> 
> I've tried all combinations of up to the minute patch levels, OBP and
> OS, to no avail.

I got some useful commentary that I will note below for reference. When I
posted the question it was 00:30 on a Sunday morning; I did not do a good
job of describing the problem.

I will note that these are all headed boxes; when I say "frozen hard" I
meant exactly that - completely dead to <stop>-A or anything othere than
turning off the power. I went as far as booting one from the network and
left it running a shell script looping in memory with no disk I/O at all
after suspecting a disk problem and it they still locked up.

Thanks to Ray Brownrigg, Francisco, JV, Adam Tomkinson and Tim Longo
for suggestions. Thanks to everyone else who told me they were on vacation.
(*Sigh*)

Suggestions were:

- Power Management
         (unlikely - happens without power management enabled)
- Environmental concerns
         (power/heat - not an issue in this case)
- Bad installation
         (all look good; actually built with the same jumpstart as
          a couple of hundred other boxes, but checked logs anyway)
- 3rd party software
         (happened without any 3PS installed)
- Set up a deadman kernel
         (google for "deadman solaris kernel" if you want to know more)
- Possible hardware, especially memory
         (try running SUNvts, moving memory around, dodgy extra PCI cards)

Most of these are actually discounted in my case because the box would go
awry with a minimal kernel and no apps to speak of, but good things to watch
for as a matter of course. The kernel I might yet do; there are no extra
bits of hardware in these machines, but I happen to know all of these
machines have third party memory in, so I asked the hardware guy who has
these things sitting on his desk to try swapping memory around. I have yet
to hear any results from that...

Still, I thought I'd summarize what I heard in case folks are looking for
suggestions.

Tim
-- 
Tim Kirby                                       651-605-9074
trk at cray.com                Cray Inc. Information Systems
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Fri Mar 18 16:17:29 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:45 EST