preSUMMARY: sys mgmt pkgs (VERY LONG)

From: Greg Coleman (
Date: Fri Jan 06 1995 - 02:29:15 CST

        I am still in the process of evaluating these packages, but I
received alot of "me too's" on this topic and even some second inquiries.
It sounds like alot of you are currently deciding on this stuff also.
So, I just pasted all the responses I have (excluding me too's) and here
they are. As you'll see, alot of you were very generous and I plan to
respond individually to each, this has been ALOT of help. Anyone who
is interested in my final eval outcome, let me know and I'll include you
on the list. So far I have completed evals of 2 jobs schedulers, UNISON
Maestro and Autosys. I am starting OCS Express. As far as sys mgmt, I have
spent some time with BMC Patrol and Compuware's EcoTools, but alot more
testing to be done and other packages to look at. This isn't the most
exciting project I've been invloved in and most of these evals provide
a 30 day license. Thats ok for 1 but when you're trying to look at 6
or 7 and keep the daily fires doused its somewhat of a bitch.

Thanks to;

Ok, here we go...

*** My original post *****************************************************

> I am currently evaluating third party packages for a pretty
>large Sun network. There are two types of packages I am looking at,
>system management and job scheduling. By system management I mean
>ability to monitor remote nodes resources ie. disk, database, cpu,
>and also performance, daemons, etc. In job scheduling exactly that,
>just a glorified cron with a reasonable GUI interface. The eventual
>goal is to bring the network closer to the operations personnel. From one
>console, "almost" any aspect of our net can be verified, reset, whatever.
>Most system mgmt packages seem to provide some type of shell programming
>interface so whatever is not provided out of the box some simple script
>or "agent" can be written to perform it. The obvious question: anyone
>use and can make comments on any of the following or sugget other
>packages which target these areas?
>System Management Job Schedulers
>----------------- --------------
>BMC Patrol AutoSys
>Compuware Ecotools OCS Express
>Openvision Unison Maestro
>Also, anyone familiar with High-Availability packages, your responses
>are welcome too.

> ------------------------------------------------------------------------
> | Greg Coleman Independent systems wrangler |
> | New York City Email: |
> ------------------------------------------------------------------------

***** NEXT ***************************************************************

   I saw your message posted on info.sun-managers. You may want to look at
LSF (Load Sharing Facility) from Platform Computing Corporation.
   LSF is a layer of software on top of Unix Platforms that turns a network
of Unix machines into a single system. LSF includes a comprehensive batch
queueing facility that allows users to submit jobs to the system and get them
run automatically on the best available hosts.
   LSF also includes a cluster monitoring GUI that displays load and status
information of all hosts in the system.
   We are enhancing LSF to support sophisticated production bacth scheduling
as well as a advanced cluster admin GUI that allows system admins to monitor
and manage any node in the system from a single point.
   If you need to know more about LSF, you can get more detailed description
of LSF by anonymous ftp:
   cd distrib/lsf/doc
   get README
   After you read README, you can then ftp whatever materials in this directory
that interests you.

***** NEXT ***************************************************************

<..stuff deleted...>
Large Company.
  Overall, the company selected OpenVision products for the majority of their
solutions; here are some reasons why they selected the system management
stuff - the process I know most about (from what I have seen/heard about the
   Systems Monitoring/Event Management
      Patrol : PROS
                 - Prettiest pictures of the bunch - they use color bitmaps,
                   so they get really nice looking icons ;-)
                 - Good support for monitoring/reporting database stats.
                 - some nice-looking meters and gauges.
                 - Uses a weird script language - can be hard to learn and use
                 - uses a polling model, and appears to suck scads of stats
                   across the network on every poll, regardless of what subset
                   of the stats you want - this technique did not bode well for
                   scalability on a large network; ie: more network traffic and
                   more noise.
                 - Relies on the presence of/use of certain system commands to
                   do it's collection work. This means that you have to start
                   a process for each collection, and you are in trouble if the
                   platform you want to monitor doesn't have the command Patrol
                   wants to run. This doesn't bode well for performance during
                   the collection tasks, portability, or resource utilization
                   of the collectors; ie: the collectors spawning processes to
                   do collection may add an unacceptable load to the system you
                   are trying to monitor, and might distort results.
                - I think this system requires the console to be up in order
                  for the agents to work. If the console is down, you get
                  no stats, and more importantly, if you have defined fixit
                  scripts for the agents to run in response to some conditions,
                THEY WON'T GET RUN EITHER.
                Don't know much about this one, but it here is an interesting
                true story. The Large Company brought in 4 system management
                vendors for a parallel eval. Each vendor was given a machine,
                and access to a different company business unit to monitor.
                Each vendor was asked to demonstrate their ability to satisfy
                a laundry list of criteria within 4 days. Of the 4, Ecotools
                and OpenVision were the strongest contenders, but in a couple
                of required areas (integration with NetView 6000 and pager
                support), OpenVision actually demonstrated the capability,
                while Ecco just said "Oh, sure, we can do that." Personally,
                I read a lot into the ability of the one vendor to actually
                turn my NetView icons different colors and have my pager
                beeping within the four days when wighed against the air-ware
                demonstrated by another vendor.
      Tivoli: PROS
                - Good integration.
                - High initial cost (you have to buy the framework to get
                  any of the services).
                - High implementation cost/learning curve. The Large Company
                  bought the framework and the Courier package, and for these
                  reasons, coupled with lack of support for some platforms and
                  other feature defficiencies (sp?), it has remained shelf-ware
                  for two years while they rolled their own.
                - Proprietary databases, etc. This makes the framework and
                  the packages hard to extend to meet custom needs and very
                  difficult to integrate with other homegrown or third-party
                  software that must be supported.
                - Pretty pictures
                - Best Ads and Ad campaigns
                - Good integration.
                - Polling model; as noted above this does not scale well at all.
                - Proprietary EVERYTHING!! If you need something they can't
                  provide, I don't think you can plug it in at all.
                - Lack of support for platforms, and lack of more advanced
                - I think this system requires the console to be up in order
                  for the agents to work. If the console is down, you get
                  no stats, and more importantly, if you have defined fixit
                  scripts for the agents to run in response to some conditions,
                  THEY WON'T GET RUN EITHER.
                - Interrupt model - the agent sends to the console at
                  specified intervals, or when an event occurs, as opposed to
                  to the console spraying requests for events over the network
                  every x minutes/seconds. This model also increases the
                  immediacy with which the console is notified; the event is
                  sent when it happens, not at the next poll.
                - Direct kernel access for collection of statistics. The agents
                  access the kernel data structures directly, so the collection
                  process is fast and very non-intrusive.
                - Selective reporting - only the stats requested for a
                  particular machine are sent to the console, minimizing
                  network traffic.
                - Open interface. The data returned, and most other data, are
                  stored in flat ascii files, and there are shell scripting
                  hooks in a number of places. This means that it is very
                  easy to plug the Event Manager into existing frameworks,
                  add monitoring to existing applications, recieve info from
                  existing monitoring tools, generate data display pages from
                  the command line, etc.
                - Message monitoring - this is something alot of the tools
                  don't do, or don't do well, and it is invaluable for
                  monitoring legacy/third party apps. Most old/third-party
                  apps log messages, so this can be used to monitor them
                  even though you can't modify the source to send messages
                  to a monitoring application.
                - user generated alarms - If you have access to the source,
                  you can send event messages to the console directly from a
                  shell script, or a 'C' program.
                - Interface is kind of unexciting, though I think this is
                  slated for an update in '95.
                - Loose integration with other OV tools (though this is
                  addressed relatively easily with a few shell scripts).

The Large Company added up the pros and cons and choose OV Event Manager
because of the ease of integration and scalability mostly.
They chose the OV Scheduler on features, I think.
They chose OV High Availability (that is the name of the product "High
Availability") because it was the most stable, full featured product of it's
kind around (and had support for all the SUN OS's they run 4.x to Solaris 2.3,
as well as support for complex ODS setups, and the same ease of integration
features as the Even Manager).

***** NEXT ***************************************************************

In response to your request - in particular - Tivoli
Brief summary of Tivoli follows -
        - In principle its a great package ie centralised management of all
          your distributed systems.
        - In practice it has some major problems.
        - Its very slow (under 1.7 some functions are painfully slow.
          Version 2 is supposed to be faster but from what I have seen of
          it so far, there is not much of an improvement)
        - You really need to be using Tivoli in a well structured environment
          otherwise base Tivoli will not work particularly well.
        - It offers great flexibility, eg if it doesn't do what you want it to
          then with some scripting/coding/development you can make it fit the
          bill. However, this is also its downfall because there are very few
          sites that can install and use Tivoli straight out of the box and
          therefore a considerable amount of development will probably be
          required to make it fit your sites specific requirements.
        - The idea of being able to delegate certain tasks/duties to "lesser
          mortals", again great in theory, but lousy in practice. This is
          because when you start handing out the Tivoli desktop to other
          administrators to perform their specific functions, the whole
          application runs even slower than usual.
        - A plus is the integration of other applications into the Tivoli
          desktop such as backup, security, scheduling. There is a pretty
          good scheduler under version 2, alternatively AutoSys offer their
          own scheduler which can be hooked in. I understand that the AutoSys
          scheduler is supposed to be good, I haven't seen it, so no comment -
          sounds worth taking a look at.

***** NEXT ***************************************************************

I have been using Openvision's perform and trend product for several years now.
starting when it was sysmon. I highly recommend the product. I has helped to
identify system problems or to clear the system by showing that the problem was
an application problem.

***** NEXT ***************************************************************

After using IBM's HACMP for a while, I think OpenVision HA pales sadly
in comparison.
I have a pair of Sparc 1000's running OpenVision HA, and am unimpressed
with it. It's mostly shell scripts which startup and monitor
"services", like Online Disk Suite, mounting filesystems, Sybase, etc.
When they think something has failed, they notify the backup system to
take over.
Outstanding issues that OpenVision has not yet solved:
- if one of your ODS sub-mirrors fails (like a drive dies), OpenVision
  will put a message on the console, or on the display of anyone who
  happens to be running hamon, but they don't give you the ability to
  run a shell script to do things like send an email or alphanumeric
  beeper message.
- if one of your ODS sub-mirrors fails, and before you get it repaired
  your system has to be rebooted or "fails over" (switches to the backup
  server), OpenVision HA will fail to start. It incorrectly determines
  during startup, that if ANY of the sub-mirrors has failed, the whole
  system must be useless. You could have four copies of a disk, but if
  you lose ONE, they won't start up.
- they run shell scripts for everything, meaning you are limited in how
  many disks they can monitor per service. We had to break our 20 disk
  drives into two separate services, because the command line being used
  to call one of their scripts ran over the 10240 character line length
  limit of the bourne shell.
- failovers take a LONG time, about 20 minutes to initialize 100 ODS
  submirrors (it runs one metainit command at a time).
- openvision 'agents', the scripts which do the "service" monitoring, are
  a bit cpu intensive. An otherwise idle system will always run at a load
  average of 0.50 or higher.
- Openvision depends on one primary server, one backup. You cannot use the
  backup for much else, as it's ip address will change if the failover occurs.
  So, basically, you have a very expensive system sitting idle all the time.
- just like the "standby" versus "online" UPS debate, the time the backup is
  most likely to fail is the moment you need it most - during a failover.
Given another opportunity, I'd investigate Data General's Clarion HA
product. I understand it lets you USE both systems for applications,
and if either one fails, the other can take over the entire workload.

***** NEXT ***************************************************************

I am also looking for a system monitoring tool for a medium (~100) Sun
network and have looked at CA-Unicenter, Tivoli & Ecotools.
CA-Unicenter (and they even admit it ...) provides very rudimentary
set of tools for monitoring - the issues they emphasize are security
& management (users/printers etc.).
I am right in the middle of investigating the Tivoli vs. Ecotools (in
fact I am meeting with Ecotools rep. in 2 hours) and will let you know
what we decide.

***** NEXT ***************************************************************

for high avail. products, we eval'ed openvision's HA. It looks like
a decent s/w solution if you throw away disk mirroring via ODS
and use RAID instead. ODS can get a bit wiley to manage.
For host monitoring only, we have used SNM on a 300+ network - it
did pretty good.
Sorry, no scheduler.

***** NEXT ***************************************************************

        We evaluated remote admin packages recently. One thing we found out
early on was that some packagges are better for different applications, so
I'll give an overview of our requirements:
We are installing a new call management system to handle customer support calls,
& to track problems right throught to R&D bug fixing. As we have support/R&D
centres scattered worldwide this will have servers worldwide, running replicated
Sybase databases on Sun/Sparcs. These will be be centrally administered, with
no local admin other than changing backup tapes. We needed a tool to manage
The key requirements were:
        Sybase Support - most problems we have had have been with Sybase
                          & we're quite new to it.
        Network/OS/Application Support
        Multiple Consoles
        Minimal system/network load
        Ease of Use
        Automated recovery actions
To provide the database support we wanted & to be able to be integrated into
our existing OpenView network management aoolication we soon narrowed the choice
down to EcoTools & Patrol.
EcoTools gave a more fixed framework with a less straightforward user interface,
however it was very reliable & consistant. Patrol was more flexible with a
very good user interface, but initially gave problems.
Both systems are quite new & are evolving rapidly. The main differences were:
        Has own scripting language
            Avoids system demands of shell scripts
        Better sybase support
        Multiple alarm levels (warning or alert)
        Better integration of recovery actions
        Easier to use
        More advanced security
            Different rights for different users
        More reliable
Neither system placed an unreasonable load on the monitored system or on the
network. Both systems could have done what we required, the issue was which
would be best. We finally decided on Patrol, but only once it had been
demonstrated that the problems could be ironed out so it could operate
effectively in our production configuration. To date it has worked fine,
& saved us from numerous problems.

***** NEXT ***************************************************************

we are an Italian company working in the system & network management area.
We developed a product call MOON (Managing Objects On Network) that does
system management for different Unix platforms like SUN/SunOS-SOLARIS, IBM/AIX,
I'll be more than glad to send you an evaluation copy if you need it.
Our software runs on top of HP OpenView or IBM NetView and it extends
the availble features from network management to the system management
(Users, Disks & FS, Printers, User Applications).

                | MOON (Managing Objects on Network) |
MOON is the ONE solution for a centralised, simple, homogeneous and safe System
Management applied to multivendor distributed Unix workstations and servers.
Based on network management industry-standard platforms (HP OpenView or IBM
NetView/6000), MOON provides a distributed object environment, making it
possible to build up a unique Enterprise Management centre.
The MOON MANAGER is the product base module. It contains all the functions which
make it an efficient system for the monitoring of distributed resources. It is
completely based on an object distributed architecture and presents the same
interface, not depending on the managed operating system. The MOON Agents,
which automatically report all the configuration information relative to the
managed nodes, are needed as peripheral modules for a correct use of the
The MOON MANAGER offers a flexible interface for the definition of
not-strictly-topological competence domains, as an aid to the system management.
This interface allows to group systems into domains on the basis of the System
Administrator needs.
MOON MANAGER keypoints
1 system management in a multivendor environment
2 icon based interface,independent from the managed operating system
3 integration with the network management industry-standard platforms
   HP-OpenView/IBM-Netview 6000
4 configuration auto discovery for all the information system nodes
5 real-time central status-monitoring for the critical resources
6 centralised configuration of the applications to be managed
7 dynamic configuration of the management system
8 management domains definition
9 encryption of messages over the network
10 unified security management for all the administrative operations
11 distributed event-driven alarms management
12 reporting and statistics on the managed resources status
13 help on line
14 backup manager handling
MOON MANAGER hw/sw requirements
Platform HP9000 IBM
Model Series 700,800 All Power & Power PC series
Operating System HPUX 9.x/OpenView Rel. 3.x AIX 3.2.5/NetView Rel. 2.x
MOON APPLICATIONS FAMILY is the solution for the principal needs in Unix systems
management. Today it is possible to choose among four available applications:
Users, File Systems, Printers and User Applications.
The data regarding the users on the managed systems is automatically
collected on the management station. Knowing the real characteristics
associated to the users of all the managed nodes, the system administrator can
perform all the users management operations from the MOON User console in a more
consistent way.
File Systems
The addition of a new file system or swap space, the checking of
file systems usage for each user or the normal export and management operations,
are available simply by selecting the desired icon and activating the operation
from a pull-down menu.
Creation of local or remote printing queues, printer server
functionality, management of the printer subsystem (spooler) and of single
queues or jobs, are efficiently supported on each system, using the same
User Applications
Any user application, already existing or to be written, can
be easily integrated. Applications can be installed, executed and terminated
from the control system, which verifies their correct working in real time.
1 centralised operative control
2 systems configuration changes by means of the graphical interface
3 integration of already existing management tools
4 users,printers and disks management
5 user applications management
6 independent from the operating system
7 specific management tasks delegated to the staff
8 log of the performed operations
Platform HP9000 IBM SUN
Model Series 700,800 All Power & Power PC series All SPARC series
O.S. HPUX 9.0 and higher AIX 3.2.4 and higher SunOS 4.1.1/Solaris 2.3
The MOON AGENTS are active on each managed node, producing and storing the data
necessary to carry out the system management operations.
They locally implement the event-reporting system, configured by the
Administrator on the Manager, for the purpose of highly reducing the traffic on
the network. During the management operations they get requests from the
Manager and execute them on the local operating system.
Five MOON Agents are distributed today, each devoted to a different function:
Printer & Spooler;
Disks, File Systems & Swap;
User Applications;
The Agents are activated as Unix Daemons and, differently from other management
systems, they optimise the resources usage (CPU, memory, disk space and network).
MOON AGENTS keypoints
1 sampling and automatic storage of the management data
2 generation of context-sensitive answers/data
3 manager's requests mapping in operations depending on the operating system
4 events notification to the Manager
5 authentication of the management station
6 automatic restart in case of anomalies
7 support for the distributed management of user applications
8 resources usage optimised on the host system
MOON AGENTS hw/sw requirements
Platform HP9000 IBM SUN
Model Series 700,800 All the Power and PowerPC series All the SPARC series
O.S. HPUX 9.0 and higher AIX 3.2.4 and higher SunOS 4.1.1/Solaris 2.3
If you need more specific information, please reply to:
                      | |
                      | ONE (Open Network Enterprise) |
                      | |
                      | Corporate Headquarters |
                      | ----------------------- |
                      | via Matteotti 43/c |
                      | Agrate Brianza (MI) - 20041 |
                      | ITALY |
                      | |
                      | Phone/Fax +39 39 654173 |
                      | Email |
                      | |
***** NEXT ***************************************************************

My history has been with SunNet Manager and Remedy Health Profiler, but
these only provide part of the view. RHP is great for network analysis,
but not designed for management. May be an interesting part of a complete
package though.

***** NEXT ***************************************************************

I have also evaluated TIVOLI and found that it was very thorough and
could handle almost everything we needed.
Unfortunately the price was the only drawback for us.

***** NEXT ***************************************************************

Paradigm systems has a scheduler called OnSchedule. I work for them, so I
won't say that's its any good, but take a look.

+++++++++++++++++++++++++++++ END +++++++++++++++++++++++++++++++++++++++++


------------------------------------------------------- Greg Coleman New York City Independent Systems Wrangling -------------------------------------------------------

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:13 CDT