SUMMARY : Any cure yet for the "accept failed : protocol error" messages ?

From: chas (panda@peace.com.my)
Date: Tue Apr 07 1998 - 14:55:20 CDT


Apologies for this very late summary but I wished to try the
different solutions to identify the exact problem. Thank you
very much to all those who helped (their advice/replies are
below).

My conclusion :
Netscape Server 2.x shipped with the Solaris 2.5 is not
dependable. (I also had problems with Enterprise 2.x on
other platforms).
I have found that Apache handles multiple domains without
problem. NS Enterprise 3.0 also seemed to work fine.

(interesting to note that the previous Summary of this problem
stated that "According to Netscape, this should be fixed in
Solaris 2.6".)

Many thanks to the following :
(i've added my comments where appropriate)

--------------------------------------

From: Stephen Harris <sweh@mpn.com>

The netscape description is wrong, by the sounds of it.

Netscape works by having multiple processes all in accept() and hoping the
kernel will give out the incoming connection to one of the servers. However,
this doesn't work under Solaris and so a mutex lock needs to be applied so
that exactly one process does an accept() at any time.

You can see the code that does this in the Apache sources if you are really
interested.

This used to work perfectly under Commerce Server 1.1 so I don't know why
they broke it in Enterprise Server 2.0.

I reckon it's a netscape bug.

[then, later when i got it to work by removing one of the IPs and NS
instances] :

Ah, *this* I've seen before as well. After adding an extra IP address I've
sometimes had to reboot for *exactly* this reason. After rebooting everything
works fine AOK.

rgds
Stephen

comment : we did actually reboot several times. (it's an old
          english "cure-all" i think - turn it on and off several
          times. if that doesn't work, hit it with something hard.
          failing that, voodoo.)
          looks like you were right first off with the NS 2.0 bug :)

--------------------------------------

From: Casper Dik <casper@holland.Sun.COM>

It's more likely a symptom of performance loss than a cause.

What happens is that the client has closed the connection before
accept() is completed.

You might want to investigate nscd nameservice caching if you use
DNS (dont' use nscd for hosts on webservers if you use DNS)

comment : i checked 'top' and webserver logs.
          performance was definitely not a problem.

--------------------------------------

From: lbriales@fedex.com (Lisa Riales)

We had this problem with 2.5 and Enterprise 2.0 when Enterprise 2.0 first
came out. We moved to Apache and it hasn't missed a beat since. I
thought it was a NS problem, not a Solaris problem, but I could be
mistaken.

Lisa Riales
Internet Engineering
FedEx

comment : thanks indeed. i'm forever met with frowns from management
          when i mention apache. it's my favourite webserver but they
          seem to think it will be more difficult to manage in my
          absence. they're wrong, i think.
          there's also that "corporate" thing. <sigh>

---------------------------------------

>I'm going to semi-summarise my own post here since we have
>cured the problem (he says, tempting fate) but not sure why
>it should be the case.
>
>2 days ago, a colleague added a second Netscape webserver
>instance AND a third IP alias to the network interface.
>I've just deleted the second webserver (which had no performance
>issues whatsoever) and removed the extra IP that it was bound to.
>
>Without wishing to put the kiss of death on our server (esp. on
>a friday evening), this seems to have cured it.
>
>So, looks like I'm stuck with a very expensive webserver,
>
>chas
>
>>To: sun-managers@ra.mcs.anl.gov
>>From: chas <panda@peace.com.my>
>>Subject: Any cure yet for the "accept failed : protocol error" messages ?
>>
>>For the past 2 days, my Solaris 2.5.1 running NS enterprise 2.0
>>webserver has been very unstable.
>>Sometimes taking a minute to return a webpage (across the LAN).
>>The error logs are full of the "warning accept failed: protocol
>>error" message. I saw summaries on this from June/July last year
>>... but no solution was offered then. Is there any way to cure this ?
>>(I have all the patches for 2.5.1 I believe)
>>
>>From the summaries :
>>[snip]
>> I've been seeing this with Netscape Enterprise 2.0 as well. I checked
>> with Netscape, and they tell me that this is a Solaris message that is
>> passed through to the server. It has something to do (forgive me, my
>> memory isn't what it used to be) with Solaris serializing its socket
>> assignments. It sounded like Solaris was slow in making the assignment,
>> and Netscape's server would print the message, but that the connection
>> would go through anyway. We haven't seen any performance loss that we
>> could attribute to this problem.
>>
>> According to Netscape, this should be fixed in Solaris 2.6
>>[/snip]
>>
>>One of the big differences in my case, is that there is a definite
>>performance loss though... and the two are definitely related.
>>(Both only started appearing after the logs were rotated for this
>>month though the rotation of logs can not be the cause !)
>>
>>cheers,
>>
>>chas



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:36 CDT