Forcing crash dumps


Date: Wed Nov 20 1991

A couple of weeks ago I mailed the following:

> Situation : Two Sun SparcStation II's running 4.1.1 rev B, 16Mb, with Ariel
> dsp32c card in have suddenly (in the last two weeks) started crashing
> regularly. Note that we have had the dsp cards installed for quite some time
> with no noticeable problems. The message on the console is
> Watchdog Reset
> Window Underflow
> Now I have some idea what the first line means, and I've seen mention of the
> second. The real problem we have is that the system won't produce a crash
> dump which makes it rather difficult to pin down what's causing the problem.
> On occasions savecore fails to do anything silently, but on the most recent
> occasion, the following occured :
> Oct 30 14:43:23 truth vmunix: dump on sd0b fstype spec size 32712K
> Oct 30 14:43:23 truth savecore: Warning: vmunix version mismatch: SunOS
> Release 4.1.1 (SS2) #2: Thu Aug 8 19:13:58 BST 1991
> Oct 30 14:43:23 truth savecore: reboot after panic: Illegal instruction
> I would be very grateful for any information about undocumented monitor
> commands on a sparcII that will force a crashdump. g0 appears not to be
> valid here.
> Any hints, pointers or commiserations will be greatly welcomed.
> I will summarise any useful information.
> -patrick

Apologies for the slight delay in summarising. I wanted to wait until all
replies had come in, and the link from the US to the UK was very dodgy for
about a week and a half (12000 queued mail messages apparently ...)

To divide the problem up into its component parts :

Forcing Crash dumps:
> "sync" will force a crash dump on open boot prom machines.
I was gently chided for not RTFM-ing on this one - I can only plead a
combination of old manuals (complete) and new manuals (incomplete - I'm
still trying to work out if they've gone wandering or came that way) 8-) ...
Also, out of interest, am I missing something with the man page for
monitor (8) being misleading here ?
Anyway, the place to look is the Prom User's Guide.

Forcing crash dumps after a watchdog reset:
The concensus was that a watchdog reset is not going to leave you in a state
where a crash dump is going to work. This is matched by our experience -
attempting to force a crash dump just triggered another watchdog ...
One respondent mentioned some magic that they had been given to try and
retrieve some information as to the cause of the w.r., but said that they
hadn't had any joy with this themselves.

savecore: Warning: vmunix version mismatch:
The explanation of this is that at the time the crash occured, the in-core
kernel appeared to be a different version from the copy in /vmunix. This
problem is probably completely orthogonal to our other ones ...

The actual fix:

This turned out to be (almost certainly) bugId 1050558 (according to the
README from the patch. One person mentioned bugId 1059617). I say "almost"
in that the absence of a crash in the best part of three weeks seems like
fairly good evidence that the "intermittent" fault has gone away ... 8-) .
I had been aware of the relevant patch (ID: 100232-01), but had assumed
that the symptoms described (screen goes black, cpu light goes out, L1-A
does not work, user typically has to power the machine off and restart
machine again, watchdog reset) all occurred rather than just one of them
... In fact only the last of these (watchdog reset) was occurring to us.

Anyway, prompted by (Mary-Helen Donnelly) I
installed the patch, and, indeed everything seems to be happiness and
light now ...

Many many thanks to the following people who replied. I may have missed
other people, since as mentioned above, email between everywhere and the UK
has been in a very tangled state for a significant part of the last few
weeks. I trust that the above is a good enough summary ...

tgsmith@com.sun.east.spdev (Timothy G. Smith - Special Projects)
stern@com.sun.east.sunne (Hal Stern - NE Area Tactical Engineering)
rk@com.att.bartok (Ravi Kagalavadi - 59114)
wallen@EDU.UCSD.cogsci (Mark R. Wallen)
kevins@com.sun.aus (Kevin Sheehan {Consulting Poster Child})
mike@uucp.trdlnk (Michael Sullivan)
guy%auspex@NET.UU.uunet (Guy Harris) (Mary-Helen Donnelly)
maree@au.oz.uq.cs (Maree Hegarty)
Mike Raffety <miker@com.sbcoc>

