SUMMARY: large batch jobs

From: Sajjad Ahmed (saja@nhm.ac.uk)
Date: Thu Nov 07 1996 - 07:00:49 CST


Hello sun-managers,

The original question is followed by the many responses received. Apologies
to anybody i may have inadvertantly missed out. Many thanks to Rich Kulawiec
for the list of batch management tools.

Sajjad.
----------------------------------------------------------------------------

QUESTION:
-----------
>Is it possible to run nightly (via crontab) incremental backups using
>ufsdump when there are batch jobs running? I am under the impression that
>any users/processes on a mounted filesystem (i.e. multiuser mode) would
>cause the backup to fail. The batch jobs in question are varied, may
>access differing filesystems, and take days. Any hints, tips, and/or
>sample scripts would be greatly appreciated! Also, has any one got a
>script for batch queue/management? I am running an SS10 under Sol 2.5.1.

TIA

Sajjad Ahmed.
Sys.Adm.
-----------------------------------------------------------------------------------------------

From: Marina.Daniels@ccd.tas.gov.au (Marina Daniels)

Running jobs are not going to cause the ufsdump to fail unless you are using the
verify option (which I don't bother with).
The only problem is you don't know what sort of state your system was in during
the backup - ie: if you restore from that backup - will you restore the system
as it was halfway through some important processing? - that's why it's a good
idea to run the backup when not much is happening on the system.

Marina
----------------------------------------------------------------------------

From: Jason Keltz <cs911089@red.ariel.cs.yorku.ca>

As long as you don't use the verify flag (v) with ufsdump, the dump won't
fail. However, if a file changes after it has been backed up, you will
only be able to restore the old copy. The manual page claims that backups
on systems in multi-user mode are a no no. However, we've been doing this
for quite some time and haven't had a problem. Give it a try... backup
your system, and then try restoring a few files here and there. I'm sure
you won't have a problem.

Jason.
--------------------------------------------------------------------------
From: "Charlie Mengler" <charliem@mwh.com>

The short answer is "YES".
Files that are changing because of the batch jobs may or may not be
usable upon using ufsrestore. It is application dependent. You may
get a copy of a file before, after or during updating from the batch jobs.
If this ambiguity does not cause your applications heartburn, then it
is OK. If all or most file need to be in a known state for data integrity,
then ufsdumps need to be run when the batch jobs are not running.

----------------------------------------------------------------------------
From: "Robert Tommaselli" <Robert.Tommaselli@ska.com>

I'v never had a problem doing a ufsdump on a machine in multi-user mode.
The pitfall of doing a dump while a batch job is operating on the file
system is that you are dumping is the inconsistent state of the files
if you need a restore to a specific state. ie. I need to restore data
from the night before batch job x ran.

If you really want to jazzy about your backups you can do several
solutions. One is to mirror your data and take the mirrors off line
and do a backup of the quiscent off line mirror. Secondly you can
use the fuser command to see if there are any open files on a
file system before doing the incrimental, if there are you can
kill them or wait until there are not open files on the file system.

Good Luck

Robert.
----------------------------------------------------------------------------
From: fpardo@tisny.com (Frank Pardo)

There is an excellent book

 Unix System Administration Handbook (2nd edition)
 Evi Nemeth et al.
 Prentice Hall
 ISBN 0-13-151051-7

that has this to say about backups

 An inconsistency in a level zero dump can make it impossible
 to restore the filesystem; therefore, the filesystem should
 be absolutely stationary while a level zero dump is being
 done. It is not as important to limit filesystem activity
 during higher-level dumps because mistakes on these tapes
 will usually affect only files that were modified during the
 dump. (page 191)

Good luck
--fp Frank Pardo <fpardo@tisny.com>
----------------------------------------------------------------------------
From: Tim Carlson <tim@santafe.edu>

On Wed, 30 Oct 1996 saja@nhm.ac.uk wrote:

> Is it possible to run nightly (via crontab) incremental backups using
> ufsdump when there are batch jobs running?

No problem.. I do it all the time. The only thing that would fail to get
dumped is any file that gets changed after dump dumps the directory
structure.

Every night I dump out 30 file systems that are active.

Tim
----------------------------------------------------------------------------
From: "Ken Picard" <Ken.Picard@ska.com>

I had a similar problem once (backing up mountains of data over many hours
and even days while the system was still live). I did the following.
Buy 1 extra disk (as large as you single largest partition). Install
Disksuite. Metafy (my own term) all partitions to backup. Use the
free disk to attach as a mirror to the next partition to backup (use low
priority resync). When the mirror is resynced, take it offline, and
back it up. Then do the next filesystem.
----------------------------------------------------------------------------

HTH,
Ken

From: Rich Kulawiec <rsk@itw.com>

Yes, you can do exactly what you're planning on doing. Contrary to the
hype promoted by some of the companies which make add-on backup software,
dump/ufsdump is pretty darn robust when it comes to handling "live"
(non-quiescent) filesystems. [And I oughta know; I was one of a number
of people at Purdue who worked on this problem back in the 80's. Significant
portions of our code have found their way into many versions of dump.]
This doesn't mean that it's a sure thing -- there is a small, non-zero
chance that you won't be able to completely restore a filesystem after
a crash. But in over ten years of running full and incremental dumps on
live filesystems, I've yet to encounter this circumstance *except* when
I've deliberately introduced it as part of stress-testing dump.

The best way to handle this is would probably be to do nightly incremental
dumps at progressively higher levels -- level 1 on Monday, level 2 on
Tuesday, etc. Presuming that the batch jobs you refer to are reading
and writing various files all over your machines, it's a reasonably good
bet that a file which might be missed on Monday (because it was created
right after backups finished) will be caught on Tuesday. A file which
is slowly growing (like a logfile for a long-running job) will wind up
on both backups because its modification timestamp will have changed.

Combine this with a once-a-week level 0 and you're just about as close
to completely covered as you would be if you spent big bucks on any of
the add-on products. And you avoid their expense and idiosyncracies.

Batch job management? I'm including several previous Sun-manager
summaries on the topic below in a sharchive. It looks to me like
you have a choice of several packages, depending on just what you need
in the way of features.

Cheers,
Rich

#!/bin/sh
# This is a shell archive (produced by GNU shar 4.0).
# To extract the files from this archive, save it to some FILE, remove
# everything before the `!/bin/sh' line above, then type `sh FILE'.
#
# Existing files will *not* be overwritten unless `-c' is specified.
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 9331 -rw------- batch1
# 14191 -rw------- batch2
# 3067 -rw------- batch3
#
touch -am 1231235999 $$.touch >/dev/null 2>&1
if test ! -f 1231235999 && test -f $$.touch; then
  shar_touch=touch
else
  shar_touch=:
  echo 'WARNING: not restoring timestamps'
fi
rm -f 1231235999 $$.touch
#
# ============= batch1 ==============
if test -f 'batch1' && test X"$1" != X"-c"; then
  echo 'x - skipping batch1 (File already exists)'
else
  echo 'x - extracting batch1 (text)'
  sed 's/^X//' << 'SHAR_EOF' > 'batch1' &&
XFrom netnews.upenn.edu!daemon Tue Oct 29 12:43:10 EST 1991
Article: 1695 of list.sun-managers
Path: netnews.upenn.edu!daemon
XFrom: Glenn Carver <glenn@atmos-modelling.chemistry.cambridge.ac.uk>
Newsgroups: list.sun-managers
Subject: SUMMARY: batch control
Message-ID: <54458@netnews.upenn.edu>
Date: 27 Oct 91 19:44:15 GMT
Sender: daemon@netnews.upenn.edu
Lines: 196
Status: RO
X
X
X
Thanks to all of you who took time to reply to my query about batch job control
on workstations. Sorry about the delay in summarizing.
X
I had a large number of responses which directed me to several freely available
software packages. Unfortunately I've not had the time to examine the
capabilities of each package in detail so this summary will only give brief
details. But, I hope this helps those who are facing the same problems I am.
X
As I half expected, batch control has been a problem for system managers for
sometime and a great deal of effort has been spent on developing a useable
system. However, the level of sophistication varies so you have to decide
on your requirements before investing time in installing any of these packages.
X
Here's a summary of what I found out (my original message at the end):
X
1. SunOS batch, at, cron.
-------------------------
Can be configured for multiple queues per machine. You can specify the number
of jobs per queue, nice value and retry time for jobs. See man page for
queuedefs for more details. Very limited capabilities.
X
2. Using the print spooler.
---------------------------
Several people pointed out that you can use the print spooler mechanism to
setup and manage distributed batch queues by running scripts instead of
printing. Noone sent me details of a working mechanism and I haven't tried
it yet. It might be very useful in combination with some of the non-distributed software.
X
3. dsh.
-------
Alan Stebbens <aks%anywhere@edu.ucsb.hub> pointed me in the direction of
'dsh'. 'dsh' implements a distributed shell which finds the least loaded
machine and runs the command on it. dsh is available by anonymous ftp
from hub.ucsb.edu in pub/shells/dsh.tar.Z
X
4. Batch.
---------
Ken Lalonde <ken@edu.toronto.cs> has written a batch control package. It is
a collection of programs and scripts that allows you to set up various
queues on a machine with characteristics such as the priority of jobs,
job resource limits and so on. It runs a daemon which monitors the load
on the machine and can halt jobs when the load reaches a settable level.
Batch is not networked. Several people recommended this package. It's
available by anonymous ftp from ftp.cs.toronto.edu in pub/batch.tar.Z.
X
5. QBATCH
---------
Thanks to Milt Ratcliff <milt@pe-nelson.com> for mailing me about QBATCH.
QBATCH was developed by Alan Saunders on Sun workstations. It is
not networked but does provide a comprehensive set of job control
options, more than Batch (4.) but does not halt jobs if load reaches some
predetermined level. QBATCH is available from several anonymous ftp sites. I
got it from lth.se in netnews/alt.sources/volume91/jul but it's also available
from cs.dal.ca in pub/bio as qbatch.tar.Z.
X
6. Condor.
----------
Many replies mentioned the Condor package. Condor was written at the
University of Wisconsin and is quite sophisticated and well documented.
It is fully distributed, machines enter and leave a 'pool' which condor uses
to run jobs. Jobs are checkpointed and can be moved from one machine which
leaves the pool and continued on a machine that enters. The snags appear to
be that a replacement version of the libc.a library is required to enable the
checkpointing (programs must be statically linked) and I/O is not implemented
well for FORTRAN. For more info contact condor-request@cs.wisc.edu. Condor
is available from many ftp sites as Condor_4.0.0.tar.Z. Use 'archie' to find
one (USA: quiche.cs.mcgill.ca; EUROPE nic.funet.fi; log in as user archie).
X
7. NQS.
-------
The Network Queueing System was developed on contract from NASA. There is a
version (I assume to be the original) on permac.space.swri.edu in
public/convexug/nqs.tar.Z (and other anonymous ftp sites). NQS is also marketed
by several companies and improved over the original: COSMIC, 382 East Broad St.,
Athens GA 30602 supporting SIG, Sun, VAX & Stardent, Sterling Software (415
area code, sorry no other details). Cray also have a version and sell a version
called RQS for remote queueing on Cray machines. COSMIC are also rumoured to
be developing NQS II. For those with money to spend, this may be the one.
X
At a first glance NQS seems to give similar sort of capabilities as Condor but
this is quite a big package and I haven't had time to go through it all. I
did hear from someone who had successfully installed the permac version
on a multiarchitecture environment (including Suns, although it required a
bit of work).
X
8. MDQS
-------
MDQS was developed at the U.S. Army Ballistic Research Lab. and is available
from ftp.brl.mil in arch/mdqs.tar.Z. MDQS stands for Multi-Device Queueing
System and appears to have been originally developed to handle a large number
of network printer devices (multiple devices per queue, multiple queues per
device) but also includes facilities for batching jobs on machines. This
appears to be a powerful package with alot of documentation to it.
X
X
9. DNQS
-------
Tom Green <green@edu.fsu.scri.ds17> mailed me about DNQS. This is available
from ftp.fsu.edu in the directory pub/DNQS. This package supports a multi-
architecture environment is a distributed way but doesn't include some of
the more fancier features of the above packages. However, it was developed
for a workstation environment rather than a few high-speed processors (such
as NQS). Documentation is good (not always the case!) and it looks fairly easy
to setup (although I haven't done it yet). Won't halt jobs when machine load
is too high, relies on nice priority to do that. Known to run on
Sun, VAX, DecStation, SGI & IBM.
X
------------------------------------------------------------------------
Glenn Carver Email: carver@atm.ch.cam.ac.uk
Atmospheric Chemistry Modelling Group Phone: (44-223) 336521
Chemistry Department Fax : (44-223) 336362
Cambridge University
UK
------------------------------------------------------------------------
X
X
Thanks to all who replied:
X
Mike Raffety <miker@com.sbcoc>
Alan Stebbens <aks%anywhere@edu.ucsb.hub>
Ken Lalonde <ken@edu.toronto.cs>
huittsco@com.pwfl (Scott Huitt 407-796-2969)
erueg@de.gwdg.uni-math.cfgauss (Eckhard Rueggeberg)
Steve Seaney <seaney@edu.wisc.me.robios>
Seth Robertson <seth@edu.columbia.ctr>
sitongia@edu.ucar.hao (Leonard Sitongia)
Loki Jorgenson <loki@ca.mcgill.physics.nazgul>
feldt@edu.uoknor.nhn.phyast (Andy Feldt)
green@edu.fsu.scri.ds17 (Tom Green)
urszula@edu.berkeley.garnet ( Urszula Frydman )
milt@com.pe-nelson
brianc@edu.ucsf.jekyll
peb@com.ueci (Paul Begley)
dan@com.BBN
henry@ca.concordia.davinci
Larry Thorne <larryt@edu.MsState.ERC>
David Fetrow <fetrow@edu.washington.biostat.orac>
"Steven G. Parker" <sgp@edu.uoknor.nhn.phyast>
Jon Diekema <diekema@org.mi.jdbbs>
Ed Arnold <era@edu.ucar.scd.niwot>
pete@uk.ac.ox.physchem (Pete Biggs)
gwolsk%seidc@com.mips (Guntram Wolski)
kevins@com.Sun.Aus (Kevin Sheehan {Consulting Poster Child})
Mike Raffety <miker@com.sbcoc>
X
X
and here's my original message:
X
To: sun-managers@edu.nwu.eecs
Subject: Batch control
Status: RO
X
X
Several users have recently begun to run large programs on machines in our
network. By large, I mean that these programs run for several days and
memory usage is such that we cannot run them on our 8Mb IPCs when OW is
running. I have instructed these users to run at reduced priority and
only on the machines that have enough memory to cope.
X
They've so far been using the 'batch' command to do this. The problem is that
these users are new to UNIX and often start the programs on the
wrong machine. Also, when they have realised they've done something wrong,
they can't figure out why 'batch' doesn't give a queue entry and I have to
tell them to use 'ps' ...etc etc.
X
I'm hoping that someone out there can point me in the direction of some
freely available software (no money!) on how best to present
a batch environment to users continuely running large programs in background.
X
What I'd like to have is:
X
1. A command interface the same across all machines on the network. e.g.
X % batch myjob machine1
X would start up the script myjob on machine1, rather than as at present,
X where batch only works for the local machine.
X
2. User can query state of the batch queues; what's running, what's queued
X and on what machines, again without having to log on to each machine in
X turn ('atq' only tells you what's waiting to run).
X
3. I need to be able to specify which machines can be used for batch jobs.
X
4. I need to be able to control priority, start and stop jobs.
X
5. Ability for the user to kill jobs currently running easily. 'atrm' only
X works for jobs queued.
X
X
This may be asking alot but these are all problems that I will have to
overcome. I am expecting several more users to begin using the network for
large background jobs. I'm sure someone has had this problem before and I'd
be grateful for any advice/software.
X
I will summarise.
X
X
SHAR_EOF
  $shar_touch -am 1030135096 'batch1' &&
  chmod 0600 'batch1' ||
  echo 'restore of batch1 failed'
  shar_count="`wc -c < 'batch1'`"
  test 9331 -eq "$shar_count" ||
    echo "batch1: original size 9331, current size $shar_count"
fi
# ============= batch2 ==============
if test -f 'batch2' && test X"$1" != X"-c"; then
  echo 'x - skipping batch2 (File already exists)'
else
  echo 'x - extracting batch2 (text)'
  sed 's/^X//' << 'SHAR_EOF' > 'batch2' &&
XFrom netnews.upenn.edu!daemon Tue Jul 6 19:43:01 EDT 1993
Article: 10168 of list.sun-managers
Path: netnews.upenn.edu!daemon
XFrom: Craig Kruck <kruckc@hitachi.hitachi.com>
Newsgroups: list.sun-managers
Subject: Summary Batch Scheduling Software
Message-ID: <133569@netnews.upenn.edu>
Date: 27 Jun 93 21:45:13 GMT
Sender: daemon@netnews.upenn.edu
Lines: 273
Status: RO
X
Many thanks for all the input regarding batch scheduling applications/software.
While my primary interest was in batch scheduling, many of the responses provided information on load balancing which will be very helpful in the future.
My original question was:
X
> I'm trying to locate vendors who produce batch scheduling software for
> UNIX platforms. We are in the early stages of a down sizing project and
> any information would be greatly appreciated.
>
> The product will have to support triggers and predecessors along with
> the normal requirements.
X
We currently have Computer Associate's Unicenter product in house. After
speaking with a few of the vendors mentioned below we have decided that the
Unicenter product is the most robust and powerful product on the market. Along
with the Workload Management features, it also supports Console, File, Problem,
Report, and Tape Management as well as some very enhanced Security (well beyond
what I require).
X
Again, Thank you for the information:
X
"Peter W. Osel" <pwo@ztivax.zfe.siemens.de>
charest@CANR.Hydro.Qc.CA (Claude Charest)
nolfb@jcdbs.2000.disa.mil (Bill Nolf)
stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
"Marty Leisner" <leisner@eso.mc.xerox.com>
kevin@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
David Fetrow <fetrow@biostat.washington.edu>
crm (Charlie's Login)
amir@matis.ingr.com (Amir J. Katz)
Hebets <radian!markh@natinst.com>
mwp.michael <MWP.MICHAEL@MELPN1.CV.COM>
X
_______________________________________________________________________________
XFrom: "Peter W. Osel" <pwo@ztivax.zfe.siemens.de>
X There is a product ``Load Balancer'' by Freedman Sharp And
Associates Inc.: I did not have a closer look at it, but here we go
abyway:
X
X ----LOAD BALANCER VERSION 3.3----
X
Load Balancer is a UNIX batch queueing and load sharing system. It
ensures that jobs (ie: applications) submitted from any host in a
network end up running on the best available host. It takes into
account many performance factors, as well as other real-world factors
such as security, licensing, and interactive user detection. Users can
run, kill, and adjust the status of their jobs from any host, and can
also get status information on jobs, queues, hosts, applications, and
users from any host. No application or kernel modifications are
necessary, since Load Balancer deals exclusively with whole
applications, not application fragments.
X
Load Balancer is useful to sites looking to maximize the performance
that they can extract from their workstations and servers. By running
each job on the best-available machine, Load Balancer gives each user
the power to run more jobs in a given amount of time, increasing
productivity and reducing time-to-market for products being designed.
X
Example uses of Load Balancer are: allowing an engineer to run many
simulation runs concurrently on as many hosts as are available,
ensuring that results are achieved as soon as possible; allowing many
people to submit large jobs to a limited set of host machines, using
Load Balancer's batch queueing system to ensure that the hosts do not
become overloaded with too many jobs running at once; allowing a s/w
developer to compile code on the best available hosts, reducing compile
time drastically; giving users the ability to pull up a shell on a
lightly loaded machine for general purpose work; and so on.
X
Load Balancer v3.3 is extremely full featured, giving the system
administrator maximum flexibility to set policies about who can run
what, where, when. Load Balancer v3.3 is available on Sun, HP, SGI,
IBM, and DEC UNIX computers. A microsoft windows front-end is also
available, giving PC users a point-and-click method to start UNIX
jobs.
X
For more information about Load Balancer, please send mail to
dan@fsa.ca, or call Dan Freedman at the phone number listed above.
------------------------------------------------------------------------------
XFrom: charest@CANR.Hydro.Qc.CA (Claude Charest)
X Here is a previous summary about batch scheduling i had keep. It wad
written by Glenn Carver from Cambridge University. I hope that it will help
you and that this data is not too old...
X
________
XFrom: nolfb@jcdbs.2000.disa.mil (Bill Nolf)
I believe SAIC and HP has a batch scheduler and load balancer product,
however I don't remember the name. It should be in one of the trade rags.
______________________________________________________________________________
XFrom: stern@sunne.East.Sun.COM (Hal Stern - NE Area Systems Engineer)
if you want all of the dependencies (like triggers) then
check out OpenVision's Distributed Task Scheduler (formerly
the fusion systems group product). they're in pleasanton, CA
______________________________________________________________________________
XFrom: "Marty Leisner" <leisner@eso.mc.xerox.com>
What's the matter with batch(1) and at(1) and cron(1)?
marty
______________________________________________________________________________
XFrom: kevin@uniq.com.au (Kevin Sheehan {Consulting Poster Child})
You can use archie to look for NQS, but we also got some info
on a product call Load Balancer. lb@fsa.ca is the Email alias
I have for it. (Load Balancer@Freedman Sharp & A.)
X
Haven't used it yet, but it looks interesting.
______________________________________________________________________________
XFrom: David Fetrow <fetrow@biostat.washington.edu>
X Lowend solution include:
X
X Good old "at" and "cron"
X
X NAQ (there are free and commercial versions)
X
X..I'm a lowend kind of guy so don't know about anything better.
______________________________________________________________________________
>From: crm (Charlie's Login)
I heard that Computer Associates of Islandia NY have a product for Sun like this - you may want to check it out.
______________________________________________________________________________
XFrom: amir@matis.ingr.com (Amir J. Katz)
There is a commercial product called CONTROL-M that runs on various UNIX
platforms. It is a full-featured batch processing scheduler which is also
available on IBM mainframes, AS/400 and VAX/VMS.
X
This product is a part of an architecture called IOA (Integrated Operations
Architecture) which is developed and distributed by 4th DIMENSION SOFTWARE.
X
For more information, please contact Mr. Itai Ben-Dor at:
X
X 4th DIMENSION SOFTWARE Ltd.
X P.O.Box 43227
X Tel Aviv 61430
X ISRAEL
X Tel. +972-3-491211
X Fax. +972-3-491002
X
or Mr. Joseph Hollander at:
X
X 4th DIMENSION SOFTWARE Ltd.
X One Park Plaza, 11th Floor,
X Irvine, CA 92714
X Tel. (714) 757-4300
X Fax. (714) 756-3900
X
Disclaimer: I am associated with this product.
------------------------------------------------------------------------------
XFrom: Hebets <radian!markh@natinst.com>
I'm not sure what you mean by "triggers and predecessors",
maybe I just don't recognize the non-Unix jargon.
X
1) All Unixes will include the "crontab" facility for scheduling
recurring batch processes. All newer Unixes (SVR3? SVR4?) will include
the capability for each user to maintain their own crontab schedule.
Crontab will allow you to fire of commands every few minutes, every few
hours, at 4:23 a.m. on every Tuesday, etc., etc. You can expect the
crontab implementation to be very solid on most Unixes, because the OS
uses it to schedule some routine maintenance.
X
2) All newer Unixes will include the "batch" and "at" facilities
for submitting jobs to a queue now (batch) or at some later
time (at). I haven't been real impressed with the quality of
implementation for batch or at on most of our machines, unfortunately.
X
3) I've been looking for some products to spread batch queues
over many machines on a network. I've turned up four vendors,
two of whom have actually answered E-mail and sound like they
have real products.
X
The two I've exchanged mail with:
X Platform Computing, Utopia Load Sharing Facility --- zhou@platform.com
X Freedman Sharp Associates, Load Balancer --- dan@fsa.ca
X
The two I haven't been able to contact:
X Sterling, the Network Queueing System (NQS)
X VXM Technologies, Inc. (I forget the name of the product.)
------------------------------------------------------------------------------
XFrom: mwp.michael <MWP.MICHAEL@MELPN1.CV.COM>
CV have a batch product for Unix. Is a port of our PRIMOS software and I
know little about it (like, when it becomes official, whether it supports
triggers etc, platforms supported), but if you contact your local CV
office and ask about the BATCH/open product they should be able to help.
------------------------------------------------------------------------------

From: "Jurgen M." <sysjxm@devetir.qld.gov.au>
On Wed, 30 Oct 1996 saja@nhm.ac.uk wrote:

> Hello sun-managers,
>
> Is it possible to run nightly (via crontab) incremental backups using
> ufsdump when there are batch jobs running? I am under the impression that

Yes. However, backing up a non-quiescent filesystem is not a good idea, as
their integrity/reliablity suffers since some files will change their state
in multi-user. Unless you can guarantee that the filesystem will not change
whilst being dumped (mounted read-only etc).

> any users/processes on a mounted filesystem (i.e. multiuser mode) would
> cause the backup to fail. The batch jobs in question are varied, may access

The backup should run fine on any mounted UFS filesystem in any state. Its how
dynamic the filesystems are in that state which raises reliability issues.
If your happy with the state of a filesystem when it is dumped and the
probability of successfully restoring from it, then thats fine.

> differing filesystems, and take days. Any hints, tips, and/or sample
> scripts would be greatly appreciated! Also, has any one got a script for
> batch queue/management? I am running an SS10 under Sol 2.5.1.

Can the batch jobs be run quicker? Perhaps you don't have that window of time
necessary for a dump in single-user/sys admin state (S/1). If you did have to
dump in a multi-user state, you could use state 4 as an alternative state,
where you could startup/shutdown particular applications as necessary, possibly
making the system more suitable for dumping.

Just some ideas. Good luck.

-Jurgen

---------------------------------------------------------------------------

From: dlp@medphys.ucl.ac.uk (Dave Plummer)

Many people do use ufsdump on live systems and get away with it.

We do nightly incrementals and monthly level zeros and do restore from time to
time, both individual files and whole filesystems. So far we have never failed
to recover data over some five years and six servers, however there is always
a first time. We recognise that this is not recomended practice.

We typically have long numerical jobs which occasionally write to files running
on our systems (mostly 4.1.1 but some 5.5). we back the servers up with ufsdump
running to a single remote tape.

If you want a sophisticated batch control system, you might want to look at
condor, though I do not know how up to date it is now.

Dave

---------------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:15 CDT