SUMMARY: Cause of crash on Ultra-5?

From: Kevin R. Tyle <ktyle_at_atmos.albany.edu>
Date: Wed May 16 2001 - 10:06:24 EDT
Well I still have not isolated the cause of the crash.
Thanks though to the following for responding:

Kevin Buterbaugh:  advised to be sure that latest kernel version
(108528-07) was installed (it was).

Joseph Herpers:  informed me of "Sleuth"; proprietary software that
may help detect intermittent crashes and problems

and especially "Sun Man" Andy Townsend who warned that some Ultra 5's
shipped with bad CPU's (serial #'s start with FW04011x) in late 2000.
My system was obtained earlier in 2000 and has a different serial #
sequence.

Since each crash has occurred while I was running IP Filters (3.4.10
up through current 3.4.17), with my state table filled to capacity,
I suspect that may have contributed to the crash.  I have made
some changes that will prevent the state table from filling up.

I have also recompiled IP Filter with a new version of gcc that
was built on a Solaris 8 platform.  Previous versions were built
with a 2.6-built gcc.  I have read, though, that versions of gcc
2.8 and up no longer need to be rebuilt for a new OS version.

I plan to "stress test" the CPU by running some complex numerical
models on it next.

--Kevin

______________________________________________________________________
Kevin Tyle, Systems Administrator               **********************
Dept. of Earth & Atmospheric Sciences           ktyle@atmos.albany.edu
University at Albany, ES-235                    518-442-4571 (voice)
1400 Washington Avenue                          518-442-5825 (fax)
Albany, NY 12222                                **********************
______________________________________________________________________

Original Post:

Hi all,

I am trying to isolate the cause of periodic crashes on
one of our Ultra 5 machines (333 MHz).

The latest instance:

May  9 11:04:23 fire ^Mpanic[cpu0]/thread=4005fe60:
May  9 11:04:23 fire unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=4005fab8 addr=10 mmu_fsr=0 occurred in
 module "arp" due to a NULL pointer dereference
May  9 11:04:23 fire unix: [ID 100000 kern.notice]
May  9 11:04:23 fire unix: [ID 839527 kern.notice] sched:
May  9 11:04:23 fire unix: [ID 520581 kern.notice] trap type = 0x31
May  9 11:04:23 fire unix: [ID 381800 kern.notice] addr=0x10

Other instances have had the same "BAD TRAP type" although the module is
not always "arp"--it has been "ipf" too (we are running IP Filter 3.4.17).
The machine is only in testing and it is only serving  a couple of machines
as a firewall.  It has two NIC's.

Analysis of the crash dump with adb is not too helpful for me at least:

echo '$c' | adb -k unix.3 vmcore.3
physmem 79fa
panicsys(0x104166e0,0x4005f900,0x1004e48c,0x70002000,0x0,0x10410db8) + 44
vpanic(0x1004e48c,0x4005f900,0x23,0x8,0x8,0x8) + cc
panic(0x1004e48c,0x31,0x4005fab8,0x10,0x0,0x703f1c80) + 1c
die(0x1040c9ac,0x4005fab8,0x10,0x0,0x4005fab8,0x31) + 80
trap(0x0,0x1,0x0,0x104169f0,0x5,0x0) + 8a0
sfmmu_tsb_miss(0x1041b214,0x0,0x0,0x70057fb8,0x70057fb8,0x0) + 5fc
prom_rtt(0x10,0x0,0x0,0x7007e260,0x704f994c,0x0)
ar_ce_walk(0x1045e464,0x1045e200,0x1023b900,0x0,0x0,0x104118c8) + 3c
ar_wsrv(0x1045e000,0x7007e260,0x1045da30,0x704f994c,0x0,0x70465ce0) + 88
runservice(0x2000,0x2200,0x20000,0x1043b43c,0x704f9980,0x704f994c) + 3c
background(0x704f994c,0x0,0x10000,0x104169f0,0x0,0x0) + d4
thread_start(0x0,0x0,0x0,0x0,0x0,0x0) + 4

Is there any way to tell if this is a hardware problem or a problem with
IP Filter?

Thanks.
Received on Wed May 16 15:06:24 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:24:55 EDT