SUMMARY: intermittent dump failure under cron

From: John Chrisoulakis (john_chr@antdiv.gov.au)
Date: Sat Aug 05 1995 - 22:50:07 CDT


Although I sent out a "problem solved" message only an hour or so after I
launched the question I've had a number of helpful replies including a
useful insight into the obscure DUMP failure message. So here is the
official SUMMARY.

Last week I launched the following:
----------------------------
A worrying puzzle...

We run a nightly backup to 5 Gbyte 8 mm exabyte using a series of dump
commands in a shell script run by cron. Output from the script is
redirected to a log file. BUT, on rare occasions each dump fails with the
following type of message:

  DUMP: Date of this level 0 dump: Thu Aug 3 23:00:24 1995
  DUMP: Date of last level 0 dump: the epoch
  DUMP: Dumping /dev/rsd0g (/usr) to /dev/nrst8
  DUMP: mapping (Pass I) [regular files]
  DUMP: mapping (Pass II) [directories]
  DUMP: estimated 511002 blocks (249.51MB) on 0.06 tape(s).
  DUMP: fopen on /dev/tty fails
  DUMP: The ENTIRE dump is aborted.

.....etc
----------------------------

Soon afterwards I discovered the reason for this failure - it turned out to
be something obvious:

Firstly: when I said "cron" I really meant "at" (I don't know what I was
thinking of at the time!). We have an operator who puts the tape in and
sets an "at" job for that night. It looks like a mistake was made and the
"at" job was submitted twice. The two dumps running together trying to
access the one tape drive confused one another and consequently failed in
this manner.

An explaination of the obscure "DUMP:fopen on /dev/tty fails" message was
given by Glenn Satchell (glenn@uniq.com.au) who said:

===========================================
The fopen error is a symptom, not the cause. The reason being that
there was some error and then dump tries to open /dev/tty to print the
error message, and because there is no controlling tty it doesn't print
the message. Dump then aborts due to the original error rather the
fopen failure.

Since there's no error printed it's a little hard to diagnose.

You could try putting a small delay between the dumping of the
filesystems, just in case the tape drive isn't quite ready. A sleep 30
should be plenty.

Also maybe run an fsck on the filesystem and see if there are any
filesystem problems.
=========================================

Other suggestions ranged from: increasing number of processes available on
the system; possible tape media problems; to checking the write protect
tab.

Thanks to those Sun Managers who replied so swiftly:

manjeet@cadence.com
dsteiner@ispa.uni-osnabrueck.de
COURTIER@ulysse.cea.fr
ross.stocks@nt.com
glenn@uniq.com.au

_____________________________________________________________________
John Chrisoulakis
                                        Computing Services
Phone: 002 323 495 Australian Antarctic Division
Fax: 002 323 288 Channel Highway
Internet: john_chr@antdiv.gov.au Kingston, Tasmania 7050
Phone:(international): +61 02 323 495
Fax: (international): +61 02 323 288



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:30 CDT