SUMMARY: Diagnosing Possibly Bad SIMM--Sparc 1+

From: Tim Evans (tkevans@eplrx7.es.duPont.com)
Date: Thu Apr 08 1993 - 20:49:30 CDT


On Monday I wrote:

>A Sparc 1+ attempts to boot, but part way through the loading
>of vmunix (the spinning wheel appears, and the 'NNNNNN+XXXXX+
>YYYYY' shows), it crashes with a "watchdog reset: instruction
>access exception". There is a clearly visible "blip" to the
>workstation monitor just as this happens.
>
>This is after the memory self-test has (apparently)
>passed. However, after the failed boot, running 'test-memory'
>from the boot prom results in another watchdog reset and
>the error "memory address not aligned." There is, however,
>no reference to any slot number to indicate which SIMM might
>be bad.
>
>Powering off the system and restarting it, I can interrupt the
>startup self-test with L1-A and then run 'test-memory'
>from the boot prom, with no errors shown. Attempting
>to boot results in the same failure as above, after which
>'test-memory' again reports the alignment error.
>
>There are two questions: (1) is the boot error shown above
>an indication of a memory problem or something else? and
>(2) if this is an indication of a bad SIMM, how to identify
>it?
>
>And, of course, the real question is what to do now? Thanks.
>
First, thanks to all who responded (and to any others whose
responses come in after I send this):

adam%bwnmr4@harvard.harvard.edu (Adam Shostack)
Dale Houston <dhouston@bio.ri.ccf.org>
trdlnk!mike@uunet.UU.NET (Michael Sullivan)
PHIL_LORGAN@NYMCS.Prime.COM
root@cas.ds.boeing.com (Operator)
Ted Walsky <walsky@asel.udel.edu>
davis@ee.udel.edu
eeikhey@eeiua.ericsson.se (Kevin Heagney)
Gary Marazita (Melbourne Aust. Eng. ) <GAM.GARY@MELPN1.Prime.COM>

Several people suggested that this might not be a memory problem at
all, but rather a corrupted kernel. While this turned out not to be
the case here, the suggestion to boot another kernel (some
suggest the miniroot from CD-ROM or tape) is a good one. This can
rule out the kernel as the problem.

Adam provided a good description of a systematic means of swapping
and testing SIMMS worth repeating:

>Binary swapping refers to yanking out 1/2 of the suspect
>components, and moving them. If the problem moves, move 1/2 the moved
>components back. This allows you to quickly narrow down the source of
>a problem.

In fact, I managed to find the bad SIMM using this method.

With respect to the confusing behavior of the system memory
test, Michael noted:

>My understanding (based on a call to Sun Tech Support several years
>ago) is that the SPARCstation 1's memory test doesn't work properly
>after a boot has been attempted; it has something to do with it
>expecting the caching and/or memory management unit to be in the
>initial state, rather than the configuration which the boot program and
>SunOS put it.

Kevin passed along a list of Patch ID's related to watchdog
resets, none of which are relevant to my situation, but might be
to someone else's:

Patch Description
-----------+------------------------------------------------------------
100232-01 SunOS 4.1.1: Sparcstation 2 crashes or watchdog resets
100017-01 Breakpoints in kadb cause watchdog resets
100319-04 SunOS 4.1.1 Watchdog Reset in Sun4-490 FDDI->Ethernet router

Dale suggested a video problem, which considering the video 'blip'
I reported in my original post, I pursued by swapping both
frame buffer and monitor. Neither helped here, but it was something
else to try.

-- 
Tim Evans                     |    E.I. du Pont de Nemours & Co.
tkevans@eplrx7.es.dupont.com  |    Experimental Station
(302) 695-9353/7395           |    P.O. Box 80357
EVANSTK AT A1 AT ESVAX        |    Wilmington, Delaware 19880-0357



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:42 CDT