Dear fellow Sun Managers,
First of all, I would like to give my thanks to the following Sun
Managers who took the time to reply my post (attached below)
Barry Gamblin <firstname.lastname@example.org>
Kulp, Scott <email@example.com>
John McIntire <firstname.lastname@example.org>
Ray Buckley <email@example.com>
Barry pointed me to a very helpful Sun infodoc which I will quote again
in its completeness. It is exactly what I was looking for:
INFODOC ID: 15510
SYNOPSIS: mfsr mfar numbers & troubleshooting
EXAMPLE:(COULD BE ANY SPARC SYSTEM)
ss5 with 4 32MB simms installed.
and you get the following type of error message...
Aug 9 01:22:10 wsplcp7 unix: panic: asynchronous memory fault:
The useful bit here is the Memory Fault Address Register, MFAR. This
register contains the physical address of the faulting location. Open
your FE Handbook, Volume I to CONFIGURATIONS, CPU, and then the SS5 page
and refer to the SIMM sockets for the SPARCstation 5 CPU board. Compare
the address in the MFAR, 0x06190710, to the address ranges depicted there
and you'll see that it falls in the range of 0x06000000 to 0x07ffffff which
corresponds to J0303.
example of SS5 memory slot layout:
J0403 SIMM7 RAS 7 0e000000 - 0ffffff
J0402 SIMM6 RAS 6 0c000000 - 0dfffff
J0401 SIMM5 RAS 5 0a000000 - 0bfffff
J0400 SIMM4 RAS 4 08000000 - 09fffff
J0303 SIMM3 RAS 3 06000000 - 07fffff ** 0x06190710 falls here
J0302 SIMM2 RAS 2 04000000 - 05fffff
J0301 SIMM1 RAS 1 02000000 - 03fffff
J0300 SIMM0 RAS 0 0e000000 - 01fffff
PRODUCT AREA: System Administration
PRODUCT: Device config
SUNOS RELEASE: n/a
HARDWARE: Sun 4m
Ray took the trouble sending me a PDF documentation which further
enhanced my understanding of the issue with its clear illustrations.
John suggested me a utility memconf. I will get and install it later
on. Finally, Scott Kulp suggested me to search http://docs.sun.com/
Well, I did. But it's not that helpful, especially I knew it didn't
have the Field Engineer's Handbook.
Again, I would like to thank the above four Sun Managers.
------------------------------ original post ---------------------------
Dear fellow Sun Managers,
This morning, when I walked into my office, I noticed that one of our
Sun SS5 170 Mhz was showing an ok prompt. Looking at the display,
I realized immediately a memory module went bad - it's the infamous
asynchronious memory fault - BAD TRAP type variety. No biggie, tuned
off the machine, opened the case, and the other spare machine's case,
swapped memory, tuned it back on. No problems.
I then took my time to try to figure out which memory module went bad
on the spare machine. Did the following
ok> setenv diag-switch? true,
ok> test /memory
It then showed something like Misaligned memory, watchdog reset, and
the memory slot etc.
By swaping memory in and out in sequence, I finally nailed two modules
that needed replacement. After taking them out, the test /memory had
a clean run. This took me a while and was quite a drag.
When I was doing it, I really wished I had remembered how to identify
memory slots on SS5 system board or had a handy reference (like the FE
Handbook) on such matters so that I didn't need to waste time swapping
Checking Sun's web site, it requires a lame My Sun registration and a
wait for two days to access some parts of the FE Handbook. I went to
the Microelectronics section, looking for something like the very nice
PDF OEM Manual for AXi system boards (which identifies all memory
slots using J0001 J0401... J0404 etc.), but couldn't find anything for
SS5 system board.
If anyone knows of where I can find such memory slot id info for
SS5, I would be very appreciative for a pointer. Once I have the
slot id info, I will probably do a preventive audit of all our
SS5s (we still have quite a few) and do an advance memory module
replacement should anything else fails to pass the test /memory
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:25 CDT