SUMMARY: How to find the cause of my problem?

From: Bernt Christandl <beb_at_MPA-Garching.MPG.DE>
Date: Wed Jul 02 2003 - 05:18:46 EDT
Dear managers,

i've asked how to find the "hiding" problem that causes my ultra-60
to "hang"(?)/"crash"(?) from time to time, so that the only thing
that remains to do is a power-cycled reboot... (see my original
question attached below).

I got several helpful answers and wish to thank again all those 
who tried to help me. Nevertheless, yesterday my ultra60 showed
this strange behaviour 3 times and i still have any hints/data/messages
from that machine :(

In the meantime i have a console attached and a script running
that makes a "top" and "ps -ef" any 3 seconds and saves the output
on disk, but i can't find anything abnormal within that data...

I have again patched that machine (7_Recommended) this morning
(3 new patches, including a kernel patch) and at the moment it is running.

Those who answered my question suggested

-> to attach a console to get possibly "last" console messages
   (that failed, there has been nothing...)

-> to setup a script that saves potentially useful parameters
   (this up to now reveals nothing to me...)

-> joe.fletcher@btconnect.com said:
   The problem sounds like a watchdog reset
   which is generally hardware related.

   Do i have such a "watchdog"? I don't know how to tell...
   (Is this a SUN-Default?)

-> to get a core dump
   (this up to now failed too..., 
    "limit" says "coredumpsize  unlimited" and i have enough diskspace
    available, but no core shines up.)

-> Dominic Clarke <dominicc@foe.co.uk> said:
   I wonder if you have power saving inadvertantly configured -
   have a look at the manual page for powerd and for power.conf

   Yes, i have a power.conf and powerd is running.
   But why should only that machine suffer from some power-problems?
   (If the machine-power-supply is not "the" problem.)

-> "Williams, Mario" <mw180013@exchange.DAYTONOH.NCR.com>
   said (among other ideas)
   > check your network table

   and yes, to me the output of "netstat -rn" looks normal/as it should
   and not essentially different from my other ultra60...
   
-> that i my have a failing network interface...


With best regards,

Bernt Christandl


--------------------------------------------------------------------

My original question:

Dear managers,

i have a ultra 60 under solaris-7, with all recommended+security
patches from 2 weeks ago. 
( SunOS sun-5 5.7 Generic_106541-24 sun4u sparc SUNW,Ultra-60 )

The machine is normally running fine, but about once a month,
like this morning, the machine does not "communicate" at all, when 
i come in in the morning:
no answers to ssh, ping or nfs requests, even no output or
"communication" on the console. (My console is connected to a 
terminal server, so i can't see the last screen of messages...)

Then my only idea is a power-cycle and this reboots the machine
without problems.

Afterwards i'm not able to find anything that gives me a hint
about what my have happenend: no messages in /var/adm/messages,
no crash dumps, no core files, nothing that i can find.

The boot itself says, when checking the filesystems, that all(!)
are stable, despite my power off without shutdown.

Being not a sun/solaris guru myself, what can i try to find out
what kind of a problem i have on this machine? (And we don't have
a service contract with sun)

With best regards,

Bernt Christandl
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Jul 2 05:18:40 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:15 EST