SUMMARY: Recommended process/application monitoring & alert softw are

From: eSolutions, Techlist <eSolutionsT_at_misys.com>
Date: Thu Jul 25 2002 - 22:40:25 EDT
Hi All,

As ever people responded with a great number of suggestions and experiences.
Thanks go to the following for their suggestions and comments (in the order
replies were received):

Joe Fletcher
Gene Beaird
Gary Bacon
Reginald Beavers
Mike Hong
Gavin McDonald
Elizabeth Lee
Paul Richards
Adam Bisbe
Steve Bagdon

This summary contains their suggestions further down, I will post a further
summary some time in the future when we've done some testing. If you don't
want to read the responses and just want the products mentioned:

Big Brother (www.bb4.com)  4
	Also see www.deadcat.net for extensions to BB 
BMC Patrol (www.bmc.com)  2
SNIPS (http://www.netplex-tech.com/software/snips/) 1
Spectrum (http://www.aprisma.com/) 1
Remedy
(http://www.syscomworld.com/solutions/servicedesk/remedy_ar_systems.htm - I
think) 1
XACCT (usage) (www.xacct.com) 1
Mon (www.kernel.org/software/mon/) 1
Site Scope (http://www.freshwater.com/SiteScope.htm) 1

Thanks again to everyone for taking time to respond.

James


My original posting:

I've been asked to put forward some suggestions, for a client, on monitoring
a set of applications and where necessary alerting operators via pager/sms -
a broad brief. I've been searching the Internet for products and going
through this lists archive in search of suggestions and wisdom. Internet
searches have yielded advertisements for various products which is good but
doesn't give me any idea of general experiences, limitations or problems
real life users have had. The list archive mainly has summaries covering
resource monitoring like CPU, I/O, network etc but we're also looking for
'bigger-picture'/'high-level' reporting/alerting. Where there are
requirements similar to mine there are no summaries. So I'm turning to the
list at large for any input you have to pass on.

The (Sun part of the) setup that needs to be monitored has:
 - DB server running Informix & some custom applications 
 - Application server, which talks to the DB server, running mainly custom
applications and MQ Series (a middleware/guranteed message delivery system)
which talks to an external server

The applications mentioned above and the custom applications have their own
logs i.e. syslogd isn't used so we can't monitor a centralised event log.
Obviously we can monitor the processes to see if they are up however we know
of events which can occur where the process remains up but an application
event prevents 'normal operation'. Assume that these events are logged in
the application specific way i.e. to a file somewhere. 

So we will be monitoring server resources, monitoring the application
processes and want to extract event information from the scattered logs. It
is the latter two that I'm seeking your input on as the first one is well
documented in summaries. Where possible we would like to combine as much of
this as possible in a single application to simplify the solution itself.

Where an event is detected and we have defined it as significant enough to
warrant alerting an operator we would like to do this via pager/SMS(/GSM) as
sendmail is disabled on the servers and no alternative will be implemented.

This is almost certainly going to require a combination of products to
achieve the solution so I'm especially interested to hear from people in a
similar situation.

-----------------------------------------------------------------

The responses (in the order I received them):

1) I've used Big Brother to great effect (www.bb4.com). It's free(ish),
simple
to configure and use, extensible, handles SMS alerts and suchlike. 

Purely on a cost basis it's definitely worth considering when compared with
BMC/Tivoli etc al.

-----------------------------------------------------------------
2) It depends on whether or not your client is wanting to buy something, 
or go open source.  There is a nice open source app called Big Brother, 
that we run in our datacenter.  It can monitor systems, message logs, 
processess, etc.  Using qpage and a modem, you can get it to dial out 
and notify a page list of problems.  

It takes a bit of setting up to work, and some tweaking to fine-tune, 
but it has served us well for free.  We use a dedicated system for our 
monitoring, but it is a U5, and is not stressed too much.  We monitor 
>35 Solaris systems and >10 NT systems with this product.

Check out their web site at:  http://bb4.com

-----------------------------------------------------------------
3) Depending on how much money you've got to spend I suggest you look at BMC
Patrol. This software is typical system monitoring software where it
monitors system resources etc... you can set thresholds so that it sends out
warnings and alarms when resources reach certain levels eg disk space is 99%
full. It also has the capability to send out sms messages. This product is
highly configurable and should be ideal for what you're looking for. The
link is www.bmc.com , the product that you need to look at is Patrol.

-----------------------------------------------------------------
4) Although BigBrother (bb4.com) monitors system resources by default, it
can
be easily configured to monitor applications as well. It's fully extendable
plus you can turn off the system monitoring is you prefer. www.deadcat.net
offers extensions developed for databases and more. These provide good
examples for developing extensions yourself.

I don't know what the license fees are (I used BB at a government site) but
I'd guess that they're much less than for Tivoli or BMC. In my experience,
BB did the job just as well as either.

-----------------------------------------------------------------
5) If you havent already, you can take a look at
http://www.netplex-tech.com/software/snips/. SNIPS is a freely distributed
system and network monitoring app. It works pretty well as it monitors
basically all system functions and supports SMS, paging, and email. SNIPS is
highly customizable, so if it doesnt do something you want, you can develop
it yourself. I believe it can also monitor specific logfiles...havent
actually tried it, but i see the config option in the conf file. The only
disadvantage of SNIPS is that since it is freeware, there is really no
support for it besides an email list and forum. 

-----------------------------------------------------------------
6) As far as application monitoring, I cannot offer much advice, but for
hardware monitoring, have you looked at Spectrum by Aprisma? It is a
network monitoring tool, and it can page on alerts.  Another product
which you may find useful is Remedy, a trouble-ticket application, which
can monitor systems for failure, (Apps too?)  and then alert (via pagers/
SMS/IP-clients/etc) the responsible individuals.  These may be overkill,
but then again, you didn't specify how large your networks are.

One last thought,  XACCT makes an interesting product for parsing log
files, and combining miscellaneous data, then converting to a custom
output format, which you specify.  Though again, this is an enterprise-
class solution.

-----------------------------------------------------------------
7) BMC Patrol will do all this & more.  I don't work for 'em -- just a
satisfied customer.

-----------------------------------------------------------------
8) BMC Patrol will do all this & more.  I don't work for 'em -- just a
satisfied customer.

-----------------------------------------------------------------
9) have you checked:
http://www.kernel.org/software/mon/
-----------------------------------------------------------------
10) We use SiteScope, by Freshwater. Does more then we've ever been able to
tap, and that's with 12,000 monitors running.
-----------------------------------------------------------------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Jul 25 22:44:04 2002

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:50 EST