SUMMARY: Panic: Asynchronous memory fault

From: Daniel Hurtubise (daniel@sar3.CANR.Hydro.Qc.CA)
Date: Wed Feb 15 1995 - 15:27:43 CST

Sun managers,

Original question:
> I need help on this one! ...please. I'm running Solaris 2.4
> on a SparcClassic with 48 Mbytes of RAM. When I run a certain
> program the system panics after a while. The message is
> unix:panic: asynchronous memory fault: MSFR=80002020 MFAR=823ef0
> unix:syncing file systems ... WARNING
> unix:/iommu@0,10000000/sbus@0,10001000/esp.....(esp0):
> unix:Unrecoverable DMA error on dma
> unix:WARNING:
> /iomm........(sd1)
> unix:SCSI transport failed: reason 'tran_err'
> unix:retrying command
> ....
> ....
> /iommu@0 ..... (sd1)
> unix:incomplete write - retrying (This message repeats twice)
> unix:panic: panic sync timeout
> 2743 static and sysmap kernel pages
> etc, etc.
> And then the system reboots.
> Sometimes the program will run successfully, but seldomly. I've
> had the system board changed thinking that was the problem, but
> it wasn't. I played around with the system memory variables to
> no aveil.

My different suggestions came out of this one.

1) Check mother board
2) Check scsi chain termination.
3) Check disks
4) Check SIMMS. It seems that the DATARAM 32MB. SIMM can have
                problems on SPARC.
5) Use savecore and crash utility to pinpoint software problem
6) Bug in the MicroSPARC I, when a timeout on the SCSI bus occurs.
   Although the bug should be worked out in Solaris 2.4, it is still
   occuring on one persons LX machine. A C program was provided to
   test the possibility of this bug.

I have changed the motherboard, the scsi cables, checked termination,
checked simms with system tests and the problem is still present.
The cause that seems the most plausible is point 6). I do tests with
a C program causing the timeout and do more work with crash.

Another SUMMARY will following with the results of further testing.

Daniel Hurtubise

