Summary: E3500 Reboots

From: sa_venkatesan@chennai.tcs.co.in
Date: Tue Sep 19 2000 - 23:18:31 CDT


Thanks to all respondents.

I took a complete backup of my system and applied the 2.6 Recommended Patch
cluster. I held my breath for full 20-25 minutes when the patch cluster was
applied and rebooted the machine. The system is stable now (atleast for the last
two days).

Thanks again

regards
venkat

sa_venkatesan@chennai.tcs.co.in on 09/17/2000 06:40:40 PM

To: sun-managers@sunmanagers.ececs.uc.edu
cc: (bcc: Saranathan Venkatesan/Operations/TCSCHENNAI)

Subject: Update: E3500 Reboots

Hi all,

Thanks for your inputs.

Thanks to

Kevin Buterbaugh
Merell Vince
Jamie A Lawrence
Rakesh
Reggie Stuart

Most of them suggested to upgrade patches to the latest level.

I took a full backup of my filesystems and I downloaded the 2.6 Recommended
patch cluster from sun.

Before applying this patch cluster, as a precautionary measure, I want to know
whether any body has ever faced any problem with this patch cluster.

thanks in advance. I will summarize.

regards
venkat

---------------------- Forwarded by Saranathan Venkatesan/Operations/TCSCHENNAI
on 09/15/2000 12:35 PM ---------------------------

sa_venkatesan@chennai.tcs.co.in on 09/15/2000 12:16:39 PM

To: sun-managers@sunmanagers.ececs.uc.edu
cc: (bcc: Saranathan Venkatesan/Operations/TCSCHENNAI)

Subject: E3500 Reboots

Hi Sysadmins,

I have a E3500 (dual processor, 256MB RAM, Solaris 2.6 patch 105580-01) running
quite a lot of software (db2, oracle, mqseries, apache,samba, mastercraft
etc..). It is serving about 30 - 40 users. This machine primarily used for
development.(C++) and database load is not much . nor it is used as a web
server. The machine has plenty of CPU time (avg idle time is 70 -80%) and
memory constraints are not much..(it has about 900 pages of freemem at peak).

>From yesterday onwards, it is rebooting at random intervals. (till now two
times). I traced a bit and found that it was due to a BAD TRAP exception. The
following message is logged on system console (but it is not there in syslog).

Sep 15 10:01:04 GSTPASNR SWIFT Time Service[20338]: Shared memory ID problem (No
such file or directory)
Sep 15 10:01:04 GSTPASNR SWIFT Time Service[20338]: Semaphore ID problem (No
such file or directory)
Sep 15 10:01:04 GSTPASNR SWIFT Time Service[20338]: Stats file uncorrect (No
such file or directory)
Sep 15 10:01:04 GSTPASNR SWIFT Time Service[20338]: EID:SNLLIB064, APPNM:SNLUGW,
PNAME:tuxsrv1, FILE:../ugw.cpp, LINE:700, PID:20338, Init Of the Swift Time
Component Failed:kPreInitComponents
Sep 15 10:02:04 GSTPASNR SWIFT Time Service[20362]: Shared memory ID problem (No
such file or directory)
Sep 15 10:02:04 GSTPASNR SWIFT Time Service[20362]: Semaphore ID problem (No
such file or directory)
Sep 15 10:02:04 GSTPASNR SWIFT Time Service[20362]: Stats file uncorrect (No
such file or directory)
Sep 15 10:02:04 GSTPASNR SWIFT Time Service[20362]: EID:SNLLIB064, APPNM:SNLUGW,
PNAME:tuxsrv1, FILE:../ugw.cpp, LINE:700, PID:20362, Init Of the Swift Time
Component Failed:kPreInitComponents
Sep 15 10:03:04 GSTPASNR SWIFT Time Service[20421]: Shared memory ID problem (No
such file or directory)
Sep 15 10:03:04 GSTPASNR SWIFT Time Service[20421]: Semaphore ID problem (No
such file or directory)
Sep 15 10:03:04 GSTPASNR SWIFT Time Service[20421]: Stats file uncorrect (No
such file or directory)
Sep 15 10:03:04 GSTPASNR SWIFT Time Service[20421]: EID:SNLLIB064, APPNM:SNLUGW,
PNAME:tuxsrv1, FILE:../ugw.cpp, LINE:700, PID:20421, Init Of the Swift Time
Component Failed:kPreInitComponents

The dump message starts here
----------------------------

BAD TRAP: cpu=7 type=0x31 rp=0x3052b980 addr=0x5a16e2a0 mmu_fsr=0x0
db2bp: trap type = 0x31
addr=0x5a16e2a0
pid=20549, pc=0x1002c7c0, sp=0x3052ba10, tstate=0x88f0001e03, context=0x1e48
g1-g7: 11, ffff0000, 0, 0, 20, 1, 61c2ede0
Begin traceback... sp = 3052ba10
Called from 1002dfa4, fp=3052ba80, args=11971 6141d570 6141d570 0 0 0
Called from ef56996c, fp=effeaa88, args=31 3 11971 86b0c cd6 0
End traceback...
panic[cpu7]/thread=0x61c2ede0: trap
syncing file systems...BAD TRAP: cpu=7 type=0x31 rp=0x3052b1d8 addr=0x10 mmu_fsr
=0x0
db2bp: trap type = 0x31
addr=0x10
pid=20549, pc=0x10058208, sp=0x3052b268, tstate=0x4480001e03, context=0x1e48
g1-g7: 258, 1042ffbc, 8500000002d96636, 600b7208, 800014660990115a, 10438488, 61
c2ede0
panic[cBAD TRAP: cpu=7 type=0x31 rp=0x301adac8 addr=0x0 mmu_fsr=0x0
BAD TRAP occurred in module "sd" due to an illegal access to a user address.
sched: trap type = 0x31
pid=0, pc=0x601ab118, sp=0x301adb58, tstate=0x1e03, context=0x1e48
g1-g7: c64, 20, 1, 0, 10438000, 1c, 301ade80
panic[cpu7]/thread=0x301ade80: trap
 5942 static and sysmap kernel pages
   49 dynamic kernel data pages
  472 kernel-pageable pages
    0 segkmap kernel pages
    0 segvn kernel pages
    0 current user process pages
 6463 total pages (6463 chunks)

dumping to vp 60192444, offset 1470944
6463 total pages, dump succeeded
rebooting...
Resetting...

Apart from this, users are running SWIFT NET LINK software which is used to
connect to SWIFT Network. Atpresent SWIFT net is not active and we are using
Loopback and the Swift net TIME Service complains frequently about the
unavailabilty of SWIFT NET which normal when we use loopback.

I am seeking the following info.

     1. Is there anyway to find out the process name for the pid 20549 after
reboot? (is db2bp the process causing the failure? secondline of
dump). it is nothing but a db2 interactive session.
     2. Is there any patch suggested for this kinda problem
     3. Your suggestions

Thanks in Advance . I will summarise as always.

regards
venkat

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:18 CDT