SUMMARY: How to break CLOSE_WAIT

From: Bhavesh Shah <shah.bhavesh_at_gene.com>
Date: Tue Jan 31 2006 - 12:25:52 EST
Thanks to everyone for their detail explanation especially
Crist Clark
Eric Voisard
Casper Dik
Gordon Johnston
Hutin Bertrand

Explanation:
Crist Clark
------------

CLOSE_WAIT means that the local end of the connection has received
a FIN from the other end, but the OS is waiting for the program at the
local end to actually close its connection.

The problem is your program running on the local machine is not closing
the socket. It is not a TCP tuning issue. A connection can (and quite
correctly) stay in CLOSE_WAIT forever while the program holds the
connection open.

Once the local program closes the socket, the OS can send the FIN to
the remote end which transitions you to LAST_ACK while you wait for
the ACK of the FIN. Once that is received, the connection is finished
and drops from the connection table (if you're end is in CLOSE_WAIT
you do _not_ end up in the TIME_WAIT state).

Eric Voisard
-------------

Afaik, there is no ndd parameter which affects the tcp CLOSE_WAIT duration.
There was "tcp_close_wait_interval" but it has been obsoleted and renamed to
"tcp_time_wait_interval" because in reality it affects the TIME_WAIT timeout
and not the CLOSE_WAIT. So, you can try to change it but I doubt it'll have
any effect since they're different things...

Otoh, from what I know, it's the responsibility of an application (i.e. not
to the OS) to close its socket once the remote computer closes its side of
the TCP communication.
RF793 says CLOSE_WAIT is the TCP/IP stack waiting for the local application
to release the socket. So, it hangs because it has received the information
that the remote host has initiated a disconnection and is closing its
socket, upon what the local application did not close its own side.

So maybe the solution consists in finding a bug fix for your application...

Or more dangerously because they still have right to send remaining data in
queue, to kill processes in CLOSE_WAIT state...


Casper Dik
-----------

CLOSE_WAIT connections indicate an error in the software.

It's a connection which has been torn down but your side of things
still has a filedescriptor open.


Gordon Johnston
-------------------
I believe CLOSE_WAIT on the server side of the connection means that the 
server has received a FIN from the client, will have acknowledged this 
back to the client and then informed the application that it can close 
the connection. It is then up to the application to relinquish the 
connection once it is satisfied that all the data has been read from the 
connection. Once it relinquishes the connection the server will send a 
final FIN back to the client and the connection will be fully closed.



 If you are seeing a large number of connections persisting in 
CLOSE_WAIT state it's probably a problem with the app itself, restarting 
it will clear the connections temporarily but obviously further 
investigation will be required to find the cause of the problem.


Hutin Bertrand
-----------------
look at :
http://docs.sun.com/app/docs/doc/806-7009


My original post was:


>Hi Gurus,
>When i perform netstat -a, i saw the hundreds of connections  are in 
>CLOSE_WAIT state. This causes my named-xfer using these connections to 
>sleep, truss -p <process_pid>.
>Is there a timer to set, say after 120 seconds the CLOSE_WAIT 
>connections will break so my program can reconnect again?? For example 
>the "ndd" command??
>Any help is greatly appreciated.
>Best Regards
>shahb
>_______________________________________________
>sunmanagers mailing list
>sunmanagers@sunmanagers.org
>http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Jan 31 12:27:30 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:54 EST