SUMMARY: Overheated Sun servers?

From: Saurabh Jang (
Date: Tue Aug 18 1998 - 17:54:15 CDT

I had posted a query about how to determine if environmental conditions
such as temperature might be causing some Sun servers to "freeze".
Here is the update. We were getting kernel panics even upon booting from
CD-ROM. This most def. pointed to a hardware problem. However, the POST
diagnostics seemed to suggest everything was OK. We called Sun and requested
some hardware techs to come in. The temperature of our lab was around
75F, and initially they didn't think that overheating was the cause of our
woes. They decided to replace the CPU referenced in the kernel panic, and
upon replacing the CPU, the machine booted up like a charm. There are
other machines in the vicinity of this E3000 which haven't had any hardware
failures, but that could be because they are newer than this machine. There
is a section of the lab which has better air flow and is noticeably cooler,
so I have moved the E3000 machine to that area for now. Eventually we are
going to be moving our lab to a new room with better air conditioning and
more space, so hopefully we won't have to go through such a
frustrating exercise again.

Thanks to the following people for responding:
Leif Ericksen
Grant Schoep
Heidi Burgiel
Jeff Kennedy
Matti Siltanen

Based on input from above folks the following are useful tips:
- Don't let the lab ambient temperature exceed 72F
- Buy compressed air cans and used them to blow into the machine
  enclosure after opening it up
- Power Supply vents are a particular source of dust buildup



I have noticed a intermittent problem with some of our Enterprise 3000 and
SPARCserver 1000 systems in our lab where they fail to come up
after a reboot via "init 6", and even a hardware level powercycle
with the key in diagnostics mode position does not show us any
output from POST, boot up etc, and we can't even get a PROM prompt.
The machine just "freezes" up!

This has only happened 2 or 3 times in the last few months, and invariably
the machines gave rebooted and come up themselves within a few hours.
Since these are development/testing machines and not mission critical, I have
never investigated the problem further. This lab has a lot of
servers and other computer hardware in a fairly cramped room, and is often
quite warm. This leads me to suspect that overheating might have something
to do with our problem. In the past few weeks, this problem has become
more severe with one of our older Enterprise E3000 systems. We have a
Sun technician coming in tomorrow to assess the causes of these problems,
but I was wondering if someone has had such experiences and how
can one pinpoint environmental factors as the cause for the machine

I checked the hardware manual for the E3000 and it says that
temperature requirements are between 41F to 104F, and while our
lab is warm, it certainly isn't close to the 100F range.


