SUMMARY: talk and ntalk

From: Keh-Wei Lih (lih@rutcor.rutgers.edu)
Date: Thu Jan 16 1992 - 04:40:04 CST


Thanks for all the system managers...

The first post:

> We are having some problems with talk and ntalk. I am not sure what
> causes the problem but only one of our machine has it (the set up for
> all of them are the same). The following is the output from
> "ps aux | grep talk | sort":
>
> lih 19215 0.0 1.8 32 196 p1 S 17:59 0:00 grep talk
> root 18619 0.0 1.4 36 148 ? I 11:44 0:00 in.talkd
> root 18620 0.0 0.4 36 44 ? I 11:44 0:00 in.talkd
> root 18622 0.0 0.2 36 20 ? I 11:55 0:00 in.ntalkd
> root 18626 0.0 0.4 36 48 ? I 11:58 0:00 in.ntalkd
> [ deleted 18 in.ntalkd each separated by 2 minute ]
> root 18727 0.0 0.3 36 28 ? I 12:36 0:00 in.ntalkd
>
> There are always two in.talkd and they are always the oldest in time.
> I mean the first to run (11:44 in the output). The number of
> in.ntalkd is not always the same. I had 29 of them once and 100
> something the other time. But they are separated by two minutes.
> I don't recall if they were separated by two minutes the other times
> it happened. Does anyone have any idea about this? How to fix?
> Please E-mail me. My E-mail address is lih@rutcor.rutgers.edu .
> Thanks in advance.
>
> Keh-Wei

The second post is for how I kill those processors. It is not improtant now
so I will skip it.

A possible solution is provided by Sam Horrocks <sam@john-bigboote.ics.uci.edu>

> Try the following patch. It's worked so far on our sequent where we were
> having exactly the same problem. The problem is that in.ntalkd will get
> stuck opening a tty that's since become unused and it will hang forever in
> the open. You'll need to apply this patch to the 4.3 or later sources that
> you can get via ftp (look through archie).
>
> Sam

--------------------------------------------------- Begin of the Patch ------

*** /tmp/RCSAa14123 Thu Jan 9 21:16:04 1992
--- announce.c Thu Nov 7 15:36:14 1991
***************
*** 17,22 ****
--- 17,23 ----
  
  #ifndef lint
  static char sccsid[] = "@(#)announce.c 5.6 (Berkeley) 6/18/88";
+ static char rcsid[] = "$Header: /usr/src/uci/usr/ucb/ntalk/talkd/RCS/announce.c,v 1.3 1991/11/07 23:35:01 sam Exp $";
  #endif /* not lint */
  
  #include <sys/types.h>
***************
*** 28,33 ****
--- 29,36 ----
  #include <sys/wait.h>
  #include <errno.h>
  #include <syslog.h>
+ #include <fcntl.h>
+ #include <sys/signal.h>
  
  #include <protocols/talkd.h>
  
***************
*** 47,70 ****
  {
          int pid, val, status;
  
          if (pid = fork()) {
                  /* we are the parent, so wait for the child */
                  if (pid == -1) /* the fork failed */
                          return (FAILED);
! do {
! val = wait(&status);
! if (val == -1) {
! if (errno == EINTR)
! continue;
! /* shouldn't happen */
! syslog(LOG_WARNING, "announce: wait: %m");
! return (FAILED);
! }
! } while (val != pid);
! if (status&0377 > 0) /* we were killed by some signal */
! return (FAILED);
! /* Get the second byte, this is the exit/return code */
! return ((status >> 8) & 0377);
          }
          /* we are the child, go and do it */
          _exit(announce_proc(request, remote_machine));
--- 50,70 ----
  {
          int pid, val, status;
  
+ /* Catch SIGCHLD */
+ {
+ int catch_child();
+ static int already_catching;
+
+ if (!already_catching) {
+ (void) signal(SIGCHLD, catch_child);
+ ++already_catching;
+ }
+ }
          if (pid = fork()) {
                  /* we are the parent, so wait for the child */
                  if (pid == -1) /* the fork failed */
                          return (FAILED);
! return (SUCCESS);
          }
          /* we are the child, go and do it */
          _exit(announce_proc(request, remote_machine));
***************
*** 78,93 ****
          CTL_MSG *request;
          char *remote_machine;
  {
! int pid, status;
          char full_tty[32];
          FILE *tf;
          struct stat stbuf;
  
          (void)sprintf(full_tty, "/dev/%s", request->r_tty);
          if (access(full_tty, 0) != 0)
                  return (FAILED);
! if ((tf = fopen(full_tty, "w")) == NULL)
                  return (PERMISSION_DENIED);
          /*
           * On first tty open, the server will have
           * it's pgrp set, so disconnect us from the
--- 78,101 ----
          CTL_MSG *request;
          char *remote_machine;
  {
! int pid, status, fd;
          char full_tty[32];
          FILE *tf;
          struct stat stbuf;
  
+ /* Set alarm to exit in case we block on the tty */
+ (void) signal(SIGALRM, SIG_DFL);
+ alarm(RING_WAIT > 5 ? (RING_WAIT - 5) : 1);
+
          (void)sprintf(full_tty, "/dev/%s", request->r_tty);
          if (access(full_tty, 0) != 0)
                  return (FAILED);
! fd = open(full_tty, O_WRONLY|O_NDELAY, 0);
! if (fd == -1)
                  return (PERMISSION_DENIED);
+ (void) fcntl(fd, F_SETFL, FNDELAY);
+ if ((tf = fdopen(fd, "w")) == NULL)
+ return (FAILED);
          /*
           * On first tty open, the server will have
           * it's pgrp set, so disconnect us from the
***************
*** 174,177 ****
--- 182,193 ----
          fprintf(tf, big_buf);
          fflush(tf);
          ioctl(fileno(tf), TIOCNOTTY, (struct sgttyb *) 0);
+ }
+
+
+ /* Called when a SIGCHLD is raised. Waits for the child to exit.
+ */
+ catch_child()
+ {
+ wait3(0, WNOHANG, 0);
  }

----------------------------------------------------- End of the Patch ------

Sam also provided the way to re-create the problem:

> To re-create it do the following:
>
> Rlogin to the host as yourself. Run tty to find out the tty name.
>
> Find the process id of the rlogind that's running for this tty (it
> should be one or two less than $$)
>
> Rlogin to the host from another machine and become root.
>
> From root login, kill -9 the rlogind that's assoctiated with the
> first login.
>
> Run "w" and you'll see that there's still an entry on the original
> tty for the login you just killed.
>
> Initiate a talk to yourself on the tty that had the rlogind that was
> killed. This talk will get stuck and from then on all new talk's
> will hang.
>
> Sam

I used the above to re-created the case. I got 2 in.talkd. Then I used
talk and ntalk to talk to myself from some other machine to this machine
several times. The in.talkd stays the 2 I created above but in.ntalkd
keep growing. I will have 2 in.ntalkd at first. After a while one dies
and generates two new ones and keep growing this way. I don't know when
it stop growing (may be when I stop using ntalk).

To kill these talkd:

I can kill all in.ntalkd at once by "kill PIDs" (no need -9). To kill the
2 in.talkd I kill the one with larger PID first and wait for the other one
to die itself. Kill them at once will get two new ones. Kill the smaller
PID one first will get two new ones and the lerger PID one.

Hope this helps.

                                Best Regrads,
                                        Keh-Wei

P.S. I followed the above patch to our computing services and they are going
     to fix the talk and ntalk source files. I have not tested the fixed talk
     and ntalk yet.

Keh-Wei Lih INTERNET: lih@rutcor.rutgers.edu
RUTCOR, P.O. Box 5062, Rutgers University BITNET: LIH@ZODIAC.BITNET
New Brunswick, NJ 08903 UUCP: rutgers!rutcor.rutgers.edu!lih

-- 
Keh-Wei Lih                                    INTERNET: lih@rutcor.rutgers.edu
RUTCOR, P.O. Box 5062, Rutgers University             BITNET: LIH@ZODIAC.BITNET
New Brunswick, NJ 08903                    UUCP: rutgers!rutcor.rutgers.edu!lih



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:34 CDT