**** summary:

I received a couple of me-too's on this one, but no solutions to the
problem. The product vendor has since my original posting acknowledged
that there is a problem and are working on a fix for it.

**** thanks to:

Andrew Foote
Jacques Rall
Marc S. Gibian

***** answers:

> From:
> I've been away from the office so I don't know if you've sent out a summary yet.
> Anyway, so far as I know, the only recovery path for a hung socket is a reboot.
> Let me add that hung sockets are not all that uncommon when I've run unattended
> ufsdumps over the LAN. This is why I strongly advise against use of backup
> products that use the OS' underlying tools for the actual tape handling. Some
> argue that they want to be able to restore without first installing the backup
> tool on a crashed system. My position is that you spend so much more time on the
> dump side that the slight overhead during recovery is far outweighed by the
> added reliability during dump.
> Hope this helps,
> Marc S. Gibian
Marc S. Gibian
Telos Consulting Services

> From: Jacques Rall <>
> What about using pmadm or sacadm? (sorry, don't know any switches)
> ----------

> From: ACF
> Me too !!
> I however am running proxy backups under AIX b/w RS/6000's. Like you,
> the only method I've found to "reset" the socket is by killing all
> associated processes.
> PDC do need to work on this as it's pretty dirty.
> Pls let me know how you go,
> Rgds,
> Midrange Services.

**** original question:

   SUN Sparc20 running Solaris2.5 with the 2.5 recommended patches installed.

   Problem description:

   This machine is a dedicated backup server that runs the PDC Budtool product
   This product uses remote shelled dump/restore to backup the client
   machines. There appears to be a bug that gets "activated" when one of the
   backup clients either hangs or crashes while a dump is being run. The
   backup server keeps the socket connection open to the client that was being
   backed up. This socket will stay open until I manually kill the parent
   process on my backup server that initiated the remote dump.

   I'm working with the backup product vendor on a fix for this problem, but
   was hoping in the meantime to find a way to close this socket without
   killing the parent backup process. When I kill the parent process, none of
   the backups that still remain in the "backup schedule" will get run and the
   summary of the backup schedule will not get generated. I guess my basic
   question is: shouldn't a socket get closed when the destination machine is
   no longer accessible (e.g. no longer ping-able)?

   Attached is some info that will hopefully clarify my problem description.
   All of the commands have been run from the backup server (of course, since
   the client is accessible!):

   backupsvr: lsof | grep client
   goserver 2095 root 11u inet 0xf611fec0 0t5 TCP backupsvr:1020->

   backupsvr: netstat -a | grep client
   backupsvr.1020 61315 0 8760 0 ESTABLISHED

   backupsvr: ping client 1
   no answer from

   (NOTE:# the goserver is the "parent" process which controls the backup schedule
   and initiates the remote dump command)

   backupsvr: /usr/ucb/ps auxw | grep "goserver -x"
   root 2095 0.0 4.3 3272 2676 ? S Jan 05 286:19
   /usr/budtool/bin/solaris_sparc/goserver -x0

   backupsvr: truss -aef -p 2095
   2095: psargs: /usr/budtool/bin/solaris_sparc/goserver -x0
   2095: getmsg(12, 0xEFFF87F8, 0xEFFF87EC, 0xEFFF8804) (sleeping...)

   Any info on how to try and close this socket without killing the "goserver"
   process would be appreciated. Thanks!

   Christopher M. Murphy
   Bristol Myers Squibb
   Scientific Information Systems
   Princeton NJ

Christopher M. Murphy
Bristol Myers Squibb
Scientific Information Systems
Princeton  NJ

