SUMMARY: swatch dying

From: Mark Bergman (bergman@phri.nyu.edu)
Date: Wed Nov 27 1996 - 11:15:31 CST


I originally asked:

=> I'm using swatch (yet another must-have tool for sysadmins) to monitor
=> certain log files. I run 4 invocations, looking at the
=>
=> /var/adm/log/log-auth
=> /var/adm/log/log-message
=> /var/adm/log/log-shiva
=> /var/adm/log/log-tcpwrap
=>
=> The log-message file gets the most traffic, and is the most important
=> to me. Unfortunately, the swatch process that "tails" the log-message
=> file silently dies, usually about every 24 hours, but NOT at any
=> definite interval. The other three swatch processes don't seem to die.
=> All four instances use the same
=> .swatchrc pattern file.
=>
=> I suspect that swatch exits when it gets too many log entries that
=> match a pattern within a given period, but I don't have proof. Of
=> course, that's the sort of situation (disk full, network segment down,
=> NFS server not responding, etc.) that generates a lot of similar log
=> entries in short order.
=>
=> Has anyone seen this kind of problem, and do you have a solution?
=>
=> Is there any easy way of debugging swatch?

The first response, from obryhimk@gecmc.gecmc.ge.com (Kerry O'Bryhim)
was:

=>I've been running swatch for months now and your right it is "(yet another must
=>-have tool for sysadmins)". I use syslogd from over 70 systems into one file s
=>o it get huge fast. I have not had any problems. I suspect that you may have a
=>cron job to trim or move your log files. If you zero out the log file while the
=> tail is active it will hang your swatch session. I'm running swatch 2.2

Exactly correct. I hadn't made the connection to the cron job I use to
archive log files because swatch didn't seem to die on a regular basis,
and it only stopped working on one of the four log files I was
monitoring. The trick was that the log file archiver only worked on
that one log file, only ran 2 nights a week, and made the decision to
trim the log file (and thus inadvertently kill swatch) based on disk
space and file size.

The solution was to re-start swatch immediately after running the log
file trimmer.

Thanks to:
        Jim Harmon <jim@telecnnct.com>
        Stuart.Little@dpcs-sw.co.uk
        Erin Copeland <erin@sam.math.ethz.ch>

----
Mark Bergman                       bergman@phri.nyu.edu
System and Network Administrator   212-578-0822
Public Health Research Institute   Rm. 1074, 455 1st Ave, NY NY, 10016



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:17 CDT