SUMMARY£ºwww server hangup problem

From: Kun Li (likun@asiainfo.com)
Date: Tue Dec 08 1998 - 00:19:56 CST


First, I would like to apologize for so late summarize.
I appreciate all that replied to my question, they are Kevin Kob, Karl
Vogel,
and especially K.C. Kong, the technical support engineer of Netscape
Communications

They all suggested to increase the value of the rlim_fd_cur parameter
to 1024, or maxusers to a higher value. but unfortunately the www server
in my case has used up 1000 fds, so this is not the key issue.

I doubt it's the malfunction of the WWW daemon that cause running out of
the upper limit of fds. But K.C.Kong want me to check the CGI process and
other third party plugins to ensure it's not these process that consume the
fds. then I used truss and lsof command to trace the fds opened, and
discovered
the opened files have nothing to do with the CGI processes . At last
K.C.Kong
recommended the ES 3.6. I haven't try it yet.

likun
#############################################
# mailto:likun@asiainfo.com Engineer Project Management #
# Ph: 86 10 68467058 x5619 #
# http://www.asiainfo.com #
# Asiainfo Computer Network Co., Ltd #
#######################################

here is the mail exchanged between K.C.Kong and me ,hopes help to those
with similar problem:

---------------------------------------------------
[my original mail]
One of our www server hangup.

Server: Netscape Enterscape/3.5.1 B98.027.0752.
OS: SunOS 5.5.1 Generic_103640-21
Platform: sun4u Enterprise 3000

from the errors log file , the cause of the problem is : "too many open
files"

hi, experienced Web managers , how can i increase the limit of the
files the Netscape server 3.5.1 can open ?
Using the /usr/proc/bin/pfiles command , we can see the server is using
about 1000 file handles when the error messages appears , then no new
access can be made. Is there any bug of Netscape-Enterprise/3.5.1
prevent the server from releasing used file handles ?

---------------------------------
[first response from K.C.Kong]
Likun,

If you're using ES 3.5.1 on Solaris, you must make sure that there are
enough
file descriptors allowed per process. On solaris this can be done by doing
this
at the shell before running the start script:
ulimit -n 1024
or put this line in the start script before anything is executed so this
will
be used everytime.

Normally enterprise server will use file descriptors to hold the cache
opened
and for cgi processes if there're any. If you're having an excessive number
of
file descriptors opened check your cgi proccess or any other third party
plugins. You may also try turning off web publishing and SSJS if they're not
in
use.

Please also notice that for customers without a support contract with
Netscape,
the proper way to get support is going to our on-line free support web site
at
http://help.netscape.com . You'll find our knowledge based, support
newsgroups
etc on there.

THanks and regards,

-----------------------------------
[my second mail]
        I have enough file descriptors allowed per process, up to 1024. but
        in my case, ES 3.5.1 used up to 1000 fd by the 'pfiles' output.
so, it
        has reached the hard limit. what's the problem ?
        and do you mean it's a malfunction cgi that cause the fd run out of
limitation ?
        Before the NE3.5.1 , we were running NE3.0 on my server , much
better
        I have browsed all over the site you mentioned, such as filelib,
Knowledge Base, with few luck.
        Only the patch C for NE3.5.1 seems to solve a similar problem , but
no success after install it
        on my server. ( I have installed all patch for NE3.5.1)

any hints ?

-----------------------------------------
[second response from K.C.Kong]
Kun,

If you have already allowed 1024 file descriptors per process, here are the
steps you could try:

. Wait for ES 3.6, which shall introduce improvements over CGI executions.
ES
3.6 is due to be released before end of this month

. Trace down the problem by using "truss" to trace all "open" and "close"
system
calls, like:
truss -f -t open,close -o truss.out ./start

For details, see man truss . This shall show you what files ES have opened
but
never close, from there you may be able to tell if it's ES that's leaking
fds
(and which part is leaking) or if it is your CGI's problem. If you find that
it
seems to be ES' problem, let me know.

Regards,

------------------------------------
[my third mail]
<< File: http-opened-files.txt >> << File: cgi-opened-files.txt >>
hi, kckong

thanks for your help.
now here is the opened files by the CGI programs and the ns-httpd daemon.
they are very different.

I collected the ns-httpd daemon's opened files through lsof command when the

opened fd reachs nearly 1000.
And use the command you recommended to trace the running of the ns-httpd
daemon , then grep all open() functions in the forked processes to get the
CGI
-opened files.

the httpd-opened-files.txt file has two column. the first column is the
opened file
name , and the other column is the number of the same file opened at that
moment.
they are :
      /website/htdocs/home.htm 205
the homepage of the site
      /website/htdocs/temp1/news/981125.jpg 129
      /website
72
      /website/htdocs/adver/dxx/dxxbak.gif 48
      /website/htdocs/relax/film/wrkzd.gif 44
      /website/htdocs/relax/film/shenyind.gif 33
----snip----

the above files used up about half of all fds allowed.
The number of tcp connections is 57 at that moment.

-----------------------------------------------------------
[third response from K.C.Kong]
Is upgrading to 3.6 an option for you ? It provides many important fixes
including file descriptors problem . It is available for download from our
web
site now.

Regards,



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:53 CDT