SUMMARY: rpc.ttdbserverd runaway

From: Johnie Stafford (js@cctechnol.com)
Date: Mon Apr 14 1997 - 19:00:17 CDT


>>> On 14 Apr 1997 15:01:47 -0500, js@cctechnol.com (Johnie Stafford) said:

 js> I've noticed on one machine in the office (a Sparc 5 running Solars
 js> 2.5/CDE 1.0.1), that rpc.ttdbserverd runs away, taking up 80%-90% of
 js> the CPU. We kill it and it behaves for a while but eventually it it
 js> will happen again.

Thanks to "Rick von Richter" <rickv@mwh.com>, for the answer.

 rvr> I have almost the same setup, Sparc 5, 2.5.1, CDE 1.0.2 and have had the same
 rvr> probs. Here is a bug report from Sun on the issue. What I did was bring the
 rvr> system down to single user and then mount the filesystems. At the top of each
 rvr> of your filesystems is a TT_DB dir. Remove these directories and all stuff
 rvr> underneath them. They will be recreated by the system. Then reboot and
 rvr> continue as normal. Sometimes it will runaway again. Sun knows about this but
 rvr> I haven't heard if they are doing anything about it.

Here is the attached bug report:

                        Bug Reports document 4017415

                 [ Notify of patch changes ][ Mark README ]

----------------------------------------------------------------------------

 Bug Id: 4017415
 Category: tooltalk
 Subcategory: dbserver
 State: evaluated
 Synopsis: rpc.ttdbserverd spinning, consuming nearly all cpu time
 Description:

Customer reports an errorsituation with the rpc.ttdbserverd consuming nearly
all cpu-time on an Ultra-2 2.5.1. CDE is not installed, only SUNWdtcor.

Unfortunately, I could not reproduce the error, but at least found a
workaround,
that could probably help to understand the error situation posthum and take
measures against it.

# uname -a
SunOS kora2 5.5.1 Generic_103640-02 sun4u sparc SUNW,Ultra-2

Local Tooltalk databases:

/usr/TT_DB
/var/TT_DB
/export/root/TT_DB
/export/home/kora2/ac-home/TT_DB
/export/home/kora2/bv-home/TT_DB
/export/home/kora2/km-home/TT_DB
/export/home/kora2/inf-home/TT_DB
/export/home/kora2/nv-home/TT_DB
/TT_DB

# more /etc/inetd.conf | grep ttdb
100083/1 stream rpc/tcp wait root /usr/dt/bin/rpc.ttdbserverd
rpc.ttdbserverd

After a while (5 minutes upto 48 hours) after having cleaned out the
TT_DB databases and having rebooted the machine, the rpc.ttdbserverd
started spinning:

# w
  8:34am up 58 min(s), 3 users, load average: 1.20, 1.07, 1.02
User tty login@ idle JCPU PCPU what
root pts/0 7:36am 57 -sh
root pts/1 7:43am 7 4 truss -p 619
root pts/3 8:29am 1 w
# ps -ef|grep rpc.ttdb
    root 2006 1967 0 08:34:41 pts/3 0:00 grep rpc.ttdb
    root 619 216 91 07:42:20 ? 41:05 rpc.ttdbserverd

# kill -ABRT <ttdbserverd-pid> yielded the following stacktrace:

Reading symbolic information for /usr/dt/bin/rpc.ttdbserverd
warning: core object name "rpc.ttdbserver" matches
object name "rpc.ttdbserverd" within the limit of 14. assuming they
match
core file header read successfully
core file read error: address 0x5050c not in data space
core file read error: address 0x5050c not in data space
core file read error: address 0x5050c not in data space
Reading symbolic information for rtld /usr/lib/ld.so.1
core file read error: address 0x5050c not in data space
warning: cannot get address of PLT for "/usr/dt/bin/rpc.ttdbserverd"
detected a multi-LWP program
(l@1) terminated by signal ABRT (Abort)
(debugger) where
=>[1] 0xef70cdcc(0x5bdb9, 0xefffda84, 0x9, 0xfff898c3, 0x29de18,
0xefffdb44), at 0xef70cdcb
  [2] 0xef7039f0(0x14f4e0, 0xefffdb24, 0xefffdb40, 0xefffdb3c, 0xd24ff,
0x14f4e0), at 0xef7039ef
  [3] 0xef7037f0(0xefffdbd7, 0xefffdbcc, 0x0, 0xef716710, 0xef71670c,
0xefffdb24), at 0xef7037ef
  [4] isamfatalerror(0xefffdc60, 0xefffdc70, 0xefffdc78, 0xefffdc68, 0x7cda0,
0x7cda0), at 0x23c5c
  [5] _tt_create_obj_1(0xefffdcec, 0xcfe80, 0x1, 0x454, 0xef5fec08, 0x0), at
0x1e46c
  [6] db_server_svc_C:__sti(0x77658, 0xcfe80, 0x77658, 0x547a8, 0x548a4,
0x1e438), at 0x25690
  [7] 0xef5be1e4(0xcd0e8, 0x77658, 0xcff28, 0xcfe88, 0xef5ff210, 0xcfe80), at
0xef5be1e3
  [8] 0xef5be104(0xefffdee0, 0x0, 0xef5fec60, 0xef5ff210, 0xef773e90, 0x16), at
0xef5be103
  [9] 0xef5bffac(0x0, 0xffffe000, 0xef5f46ec, 0xef5fec60, 0xef5ff210, 0x17), at
0xef5bffab
  [10] _tt_process_transaction(0x71a30, 0x71a20, 0x796b8, 0x71a28, 0x71a2c,
0x71a18), at 0x246cc
(debugger)

 Work around:

Clearing out the databases did not help. At least ttdbck did not find
any problems.

Creating a partition-map and mapping all the tooltalk databases to
one single TT_DB did not help to avoid the error situation either,
but helped in that way as there was only one TT_DB to be cleared out.

Starting the rpc.ttdbserverd from a shell with an increased amount of
filedescriptors (128 instead of 64) helped to avoid the problem
permanently.

        Integrated in releases:
 Duplicate of:
 Patch id:
 See also:
 Summary:
The dbserver can run out of file descriptors between it and the various
libtts from clients that connect to it. The dbserve should zoom the number
of file descriptors from 64 to some larger number (probably 1024).

----------------------------------------------------------------------------

     Copyright 1997 Sun Microsystems, Inc. 2550 Garcia Ave., Mt. View, CA
     94043-1100 USA. All rights reserved.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:50 CDT