>We've been experiencing a lot of bad cpu's in our systems lately, causing crashes and then
>down time again (scheduled) to replace them. I'm hearing through the grapevine that Sun is
>having problems with 400MH/8MB cache cpus, which may be what we're seeing. But that's not my
>We're exploring the possibility of offlining the bad cpus after the system comes back up from
>the crash, so we don't have to schedule more down time right away to replace them and don't
>risk the crash again. Unfortunately it always seems to be a production machine that this
>happens to, and I don't want to try it there for the first time. I'm working on getting some
>tests going on a development machine, but even after I've tested it in development I'm not
>sure how comfortable I am with doing this. I've read the man pages and it looks like psradm
>is intelligent enough to know not to offline something if it's busy, but I stilll don't have a
>warm fuzzy about it. Especially when seeing references to it causing panics under certain
>conditions in Sunsolve!
>I'd be interested to hear people's experiences with offlining cpus, good or bad.
>Thanks in advance!
Of the eleven responses I've seen so far, I haven't heard any evidence NOT to turn the cpus off when we need to.
That's good to know, thanks everyone. I did have a couple of people also say they were having trouble with the
400Mhz cpus as well. Hopefully there will be a fix for it soon!
Many thanks to:
Arthur Darren Dunh
John T. Douglass
Kruse, Jason K.
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:09 CDT