In the original article I described a situation in which
PC telnet users would simply reboot their machines,
leaving sessions on the server. This would not have been
such a problem if it were not for the application taking
to looping when a session had been in this state for a while.
My first fix was based on a misunderstanding of the
tcp_keepalive_interval settable by ndd. I had thought that
this was the interval after which a tcp connection would be
dropped of KEEPALIVE packets had been consistently unacknowledged
for this period. In fact it is the period *between* KEEPALIVE
packets, and if one of these packets is unacknowledged then
the connection is timed out. The associated rfc specifies that
this interval must default to NO LESS THAN 2 hours (which is
the default in Solaris 2.3 and 2.4).
So based on not ever being able to snoop a KEEPALIVE packet to
a departed session (but never waiting longer than 20-30 minutes)
the initial fix was to echo "\0000\c" (an ASCII NULL by itself)
to all sessions once every 5 minues via cron. Fortunately this
is unlikely to affect this application. Anyway, when it is
discovered that the session is not responding it is timed out
after a while.
After reading the rfc mentioned above, I have reset
tcp_keepalive_interval to 300000, or 5 minutes. This
seems to be a common solution with people who responded,
but not all have success with it.
Hope that helps someone
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:18 CDT