Summary: socket problem

From: Dotty Pon (dotty@tgivan.wimsey.com)
Date: Tue Oct 11 1994 - 12:27:27 CDT


Last week I asked for your help with the following problem:

---- Begin Message ----
Newsgroups: comp.unix.programmers,comp.unix.solaris,comp.unix.questions
Path: tgivan!dotty
From: dotty@tgivan.wimsey.com (Dotty Pon)
Subject: socket problem
Message-ID: <1994Oct6.174858.20501@tgivan.wimsey.com>
Organization: TGI Technologies Ltd.
Date: Thu, 6 Oct 1994 17:48:58 GMT

I have a client and a server (both running on Sparcs, SunOS to SunOS,
Solaris to Solaris, and Solaris to SunOS) and I have a problem
detecting when either the client or the server has gone away (what I mean
by 'gone away' is the machine running the client/server has been
L1-A'd or powered down).

Here's a more detailed description of my problem:

I have a client running on machine A and a server running on machine B.
They do all the socket/connect/bind/listen/accept stuff needed to establish
a connection. The two processes run for awhile exchanging information
when all of a sudden someone L1-A's/power's off machine A while the server
is in a blocking read() waiting for some data. The read() blocks forever
even after machine A's been rebooted.

Why doesn't the read() unblock and return 0? I have set the socket option
SO_KEEPALIVE and waited for 15 mins but nothing happens. I've tried
using select and FD_ISSET. I've tried converting the blocking read()
to a non-blocking read which always returns -1 and sets errno to EWOULDBLOCK.
It doesn't seem to detect the missing connection either.

This is even a bigger issue with socket connections between clients
running on a PC and servers running on UNIX.

Do you have any ideas/suggestions?

Pretty please email your responses. Our newsfeed is a very _slow_
UUCP link and email is much faster!

Thanks!
Dotty
------------------------------------------------------------------------
Dotty Pon, Software Engineer + TGI Technologies Ltd.
for Enroute Exchange + Voice: (604) 872-6676
Vancouver Freenet Sys. Admin. + Fax: (604) 872-6604/6601
dpon@freenet.vancouver.bc.ca + Email: dotty@tgivan.wimsey.bc.ca
------------------------------------------------------------------------

---- End Message ----

After I posted this, I found out that there is a tcp_keepalive_interval
that is used to send keepalive packets (when SO_KEEPALIVE is set). Under
Solaris 2.X this item has a value of 2 hours. You can change this value
by using ndd. There is no SunOS equivalent.

I am now able to cause my unix server to timeout after 4.5 minutes (with
tcp_keepalive_interval set to 90000) when a unix client goes away. But
I'm still experiencing the same problem when the unix server is connected
to a PC client (WinSock or 3COM TCP) and the PC is ctrl-alt-del'd.

Thanks for all your help!
Dotty

------------------------------------------------------------------------
Dotty Pon, Software Engineer + TGI Technologies Ltd.
for Enroute Exchange + Voice: (604) 872-6676
Vancouver Freenet Sys. Admin. + Fax: (604) 872-6604/6601
dpon@freenet.vancouver.bc.ca + Email: dotty@tgivan.wimsey.bc.ca
------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:11 CDT