Well, I think the problem is solved. Thanks for all your help and comments.
It turned out to be a bad FDDI board. I ran FDDI diagnostics, and it failed
test 0x2A (the DPC NMI Test) with error code 18. So we replaced the board
and it hasn't crashed since. But I noticed that even with the new board
I still get failures on test 2A of the diagnostics, so I am still not
100% convinced by problem will not return.
Most people who responded suggested I make sure I have the latest NFS
Jumbo patch installed (we're running rev 10), upgrading to SunOS 4.1.3
(which we'll be doing shortly, and I'll be upgrading to FDDI 2.0), and
to check the jumpers on the backplane. I had removed a Xylogics IPI
disk controller during all my testing and forgot to put the jumpers
back on. When I remembered to do it, I thought for sure it would
solve my problems, and we did stay up for over 3 hours with heavy
NFS traffic. But it crashed after that, and that's when I started
looking into other possible hardware failures. Finally, others reported
similar problems with 4/490s crashing with data faults, but from
random processes (mine was always nfsd). I don't think I have much
help for them - sorry!
Thanks go to the following people:
fabrice@cisk.ATMOS.Ucla.EDU (Fabrice Cuq)
kevin@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
Michael Stevens <mjs@biostat.mc.duke.edu>
stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
Mike Raffety <miker@il.us.swissbank.com>
jeff
--Jeff Kays Minnesota Supercomputer Center E-Mail: jkays@msc.edu 1200 Washington Avenue South Phone: (612) 337-3422 Minneapolis, Minnesota 55415 Fax: (612) 337-3400
"May fortune favor the foolish"
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:23 CDT