SUMMARY: SPARC 5 crashing randomly

From: Derek_Schatz@amat.com
Date: Wed Oct 14 1998 - 12:02:47 CDT


Hello all-

*** Original Question ***

I have a SPARC5-110 that seems to crash randomly, perhaps once every
few weeks or so, and always while I'm away. 64MB RAM, 1.0GB disk.
It is running standard services, and is used only as a low-level
workstation for telnetting, etc. (nothing fancy, just CDE)

I have attached a snippet from /var/adm/messages below, keeping what
I think are the relevant bits. Is this some sort of hard bus or mem
error, or something patchable?

Oct 12 13:17:52 sundeck unix: panic: asynchronous memory fault:
MFSR=80802820 MFAR=1e883a0
Oct 12 13:17:52 sundeck unix: syncing file systems... done
[snip]
Oct 12 13:17:52 sundeck unix: dumping to vp f5ba2e1c, offset 112504
Oct 12 13:17:52 sundeck unix: WARNING:
/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 (esp0):
Oct 12 13:17:52 sundeck unix: dma error: current esp state:
Oct 12 13:17:52 sundeck unix: esp: State=DATA_DONE Last State=DATA
Oct 12 13:17:52 sundeck unix: esp: Latched stat=0x0 intr=0x0 fifo
0x80
Oct 12 13:17:52 sundeck unix: esp: last msg out: IDENTIFY; last msg
in: COMMAND COMPLETE
Oct 12 13:17:52 sundeck unix: esp: DMA
csr=0xa4240212<EN,INTEN,ERRPEND>
Oct 12 13:17:52 sundeck unix: esp: addr=fc00f384 dmacnt=10000
last=fc007000 last_cnt=10000
Oct 12 13:17:52 sundeck unix: esp: Cmd dump for Target 3 Lun 0:
Oct 12 13:17:52 sundeck unix: esp: cdblen=10, cdb=[ 0x2a 0x0 0x0 0x8
0x2f 0x98 0x0 0x0 0x80 0x0 ]
Oct 12 13:17:52 sundeck unix: esp: pkt_state=0x7<CMD,SEL,ARB>
pkt_flags=0x1 pkt_statistics=0x0
Oct 12 13:17:52 sundeck unix: esp: cmd_flags=0xc22 cmd_timeout=60
Oct 12 13:17:53 sundeck unix: WARNING:
/iommu@0,10000000/sbus@0,10001000/espdma@5,8400000/esp@5,8800000 (esp0):
Oct 12 13:17:53 sundeck unix: Unrecoverable DMA error on dma
Oct 12 13:17:53 sundeck unix: panic: asynchronous memory fault:
MFSR=81002040 MFAR=1e883a0
Oct 12 13:17:53 sundeck unix: SunOS Release 5.5.1 Version Generic_103640-20
[UNIX(R) System V Release 4.0]
Oct 12 13:17:53 sundeck unix: Copyright (c) 1983-1996, Sun Microsystems,
Inc.

*** Thanks to: ***

Mark Mellman
Kevin Sheehan
Ron Spillane
Tim Carlson
Michael Wang
Stephen Harris
Movva Mohan Kumar
Kenneth Ash
Derek Terveer
John Bradley
Michael Hill
Douglas Carr
Kun Li

*** Answers ***

The overwhelming consensus was that it was a memory chip (DIMM)
problem. Most suggestions were to pull them out and reseat them,
with some opinions that one of them was bad. A couple folks also
suggested running 'test /memory' in the PROM, which I did after a
reset. The results were lots of these messages, with varying
addresses:

    ERROR: Physical Address = 0x3da52
                    Expected = 0x5a5a5a5a
                    Observed = 0xffd3c598
                    U-Number = J0300

So I guess I'll plan on replacing this memory module. It's just
strange that the problem is so intermittent.

Regards,
Derek

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Derek Schatz Voice: 408-563-4198
Senior Systems Analyst Fax: 408-986-2822
Applied Materials, Inc. E-mail: derek_schatz@amat.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:50 CDT