Hi Sun Managers !
Here is the summary for the above problem:
The original question:
>I have an E450 server with 4 x 400MHz/4MB cache installed. The server
>have also 1GB RAM (2x X7005) 4 x 9.1 GB/10000 rpm drives, one external
>A1000 with several Dual Differential SCSI controllers.
>Solaris 2.5.1 with 103640-32. Also running Volume Manager 3.0.2 Veritas
>File System 3.2.6 and Raid Manager 6.1.1 Update 2. The use of this
>system is as a server for Catia 5 and Euclid binaries.
>This sistem is experiencing panic's from some time. (Several per week).
>I would like a sugestion to solve this problem. Some coleagues of mine
>think we have a faulty CPU, but as you can see from message below,
>different CPU's are involved each time.
>In my opinion, could be the mother board, the memory (more probable) or
>the Solaris 2.5.1 (even if latest patches are installed).
>Pls. let me know what do you feel to generate the problem ?
>Without an advice, all I have to do is to enter in an endless test
>procedure replacing CPU's one by one, memory, etc.
We noticed most panics are encountered when hme2 interface are plumbed
Se we taked out the hme2 PCI network interface, and since then we did
not crash the server again.
Strange enough, the server was not reporting any network errors and that
interface was used to transfer large files without problems.
Many thanks to:
First of all, enable savecore in /etc/init.d/sysetup
This will enable use of tools such as iscda after the next crash.
Second, check if powerd is running. If it is, make neccesary config
changes to make sure you don't run it. I have seen odd crashes with
running on 450. You don't want power management on a server anyway.
I have a system with almost exactly the same configuration, which
very similar problems about one year ago. Sun replaced every component
system including the motherboard. What it came down to was a CPU, but
needed to replace both of them at the same time as well as the
DC/DC converters. That is critical. One thing that helped me in the
was that I could take one cpu off-line and the system would not crash. I
it sounds fluky that I could take either cpu off-line and it would run
is what worked for me.
You should enable savecore and send the core files to Sun for cause
analysis. This should be covered by most of the support contracts.
Your machine has a pretty complex set of drivers and your panics
could be caused by software incompatabilities.
I believe it has something to do with the model of 400MHz cpu's your
Sun as identified that there are some that are known to have problems.
should probably check with them.
I've just experienced a similar problem with 1x400MHz E250. Our machine
would panic at bootup with a similar message to the one you experienced.
unix: BAD TRAP: cpu=1 type=0x31 rp=0x3047d578....
This occured whether we booted from disk cdrom or net. Occasionally it
would boot OK and then it would run fine until the next reboot. We had
CPU replaced and this appeared to fix it for a while. It started
again and more frequently so we had the main board replaced. It now
to be working OK. We've had Sun study the diagnostics but they just
to replace the CPU followed by the motherboard.
Hope that helps.
"Kulp, Scott (Scott)** CTR **"
Privalege UE errors to me have always been memory. reseat and rearrange
memory so a new bank is in the 0 position and see if it exposes the
sometimes non-sun and sun memory together will cause UE problems
I faced a similar problem. Mine is a one-cpu with
1GB RAM, Solaris 2.6, Veritas 3.0.2. I emailed the
sun-managers, no very specific response. Messages I
get at the panic sound like CPU/Cache problem, but I'm
not sure. The last thing that happened (I do not
know if this is related or not) was that the power
supply burned out and the system just went down forever,
I'm not telling this will happen to you, but just keep
it somewhere in your mind that maybe you experience
I called Sun and we're in the process of changing power
supplies right now, so I have no newer info about the
Anything specific you'd like to know about my case?
Please keep me informed with your progress, maybe we
both have the same problem...
We had a 450 with this same panic error that you have:
panic[cpu0]/thread=0x30023ec0: CPU0 Priv. UE Error <misc
The box would panic and reboot every 4-6 hours. That was 5
months ago, and the machine has been up ever since. Call Sun
support and see if this is the case on your box.
type=0x31 means Data access MMU miss. Faulty CPU?
> May 9 03:25:29 arges unix: panic[cpu0]/thread=0x30023ec0: CPU0 Priv. UE
> Error: AFSR 0x00000000 80200000 AFAR 0x00000000 0dd89f08 SIMM 190x
This looks like a faulty 400mhz CPU. Log a SUN support call.
Sun have had more than their fair share of problems with their 400mhz+
The problems have been sorted, but there's still broken chips out there.
I suspect you have a (or more) bad CPUs. I think I heard there is a
problem with a series of 400-4MB cpus. Call your sun support.
Fernando Nantes de Souza
Start with the memory. We had the same problem and after a long and
painful process where we replaced everything, including the mother
board and cpus, the problem finally disapeared when the memory was
"Balfour, Scott (Eurosoft)"
looks like a memory problem. Sometimes it shows up as a cpu
error and moves around because all cpu's access the same memory.
Sun should be able to tell you which simm using the AFSR and AFAR.
>>May 9 03:25:29 arges unix: panic[cpu0]/thread=0x30023ec0: CPU0 Priv. UE
>>Error: AFSR 0x00000000 80200000 AFAR 0x00000000 0dd89f08 SIMM 190x
This error could be cpu1 getting bad data from ram
>>May 8 09:18:30 arges unix: BAD TRAP: cpu=1 type=0x31 rp=0x3047d578
Do you not have a sun contract? The field engineers usually figure this
Have you got the crash dumps. If you do, send them to sun. If not, there
a Book on the market entitled Panic! which will teach crash dump
I will be buying it soon.
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:08 CDT