SUMMARY: Legaot networker problem

From: Rasana P. Atreya (Rasana.Atreya@library.ucsf.edu)
Date: Wed May 01 1996 - 17:15:33 CDT


Hi!
I apologize for the delay in posting.

It was not the networker after, but my drive which had gone bad. We are
waiting for a replacement.

My original post and responses follow.

Thanks to everyone who took the time out to help me!

Rasana
---------------------------------------------------------------------------
We do our backups using Legato's Networker 4.1.3 Turbo/30. The server is a
Sparc 10 with SunOs 4.1.3.
 
Lately, I been getting requests to change tapes very frequently. Sometimes
upto 5 tapes a day for the same pool.

When I go to mount, it shows that the tape is full. But when I go to
media/volumes, it tells me that the tape has much less than it's maximum
capacity of 5 gig. The exact numbers vary widely. The jukebox had had to be
"unconfused" about 4-5 times. We use the 8mm cartridges.

What gives?

Any help appreciated.

Thanks,
Rasana

---------------------------------------------------------------------------
From: Leif Hedstrom <leif@netscape.com>

It sounds like Networker is detecting write errors, and it will then
mark the tape as full. Usually you would see error/warnings on the
console (and perhaps in log-files) about failed I/O requests.

So if this is the case, why is it happening? Well, it could be bad
tapes, or perhaps more likely, a bad tape device. Are you cleaning it
regularly? Is it old? It might be worth taking the device to someone
who can clean it properly, and also check the condition on the magnetic
head.

Cheers!

-- Leif

---------------------------------------------------------------------------
From: davem@cp.tybrin.com (Dave McFerren)

You may want to clean the tape heads. Lots of times this will be the major cause of trouble with the data it gets.

Hope this helps....

---------------------------------------------------------------------------
From: Chad Smith <Chad.Smith@genetics.utah.edu>

I've seen this problem too, usually with tapes that have been used once or
twice. My bet is that Networker keeps track of any write errors on the
tape itself, and as soon as it encounters one, marks the tape "full" and
bails on the backup to it, requesting another tape. If you try putting a
new tape in, it should use the correct capacity. I've seen this same
problem hang a tape stacker, so that behavior is possibly explained as
well. I'm not sure how to fix it, but since I upgraded to Solaris 2.4,
the problem has rarely come back.

Chad Smith

PS hope this helps

---------------------------------------------------------------------------
Peter.Bestel@uniq.com.au (Peter Bestel)

Sounds like SCSI errors on the bus that the jukebox is on. Have there
been any SCSI resets recorded in the messages file? When Networker is
saving data and a SCSI bus problem occurs, it marks the tape being used
as full - just as if there has been a real physical tape error.

Check the /var/adm/messages file for any errors. What version of the
OS are you running? There are several SCSI patches for various versions
of Solaris 1 and 2. In particular for SC2000's. Other problems could
include SCSI bus termination problems causing a reset under some load.

Cheers,

        Peter B.
---------------------------------------------------------------------------
From: Scott Turvey <scottt@nacm.com>

the drives may be shot, or they are in a bad need of a cleaning.
you may have also got a bad bunch of tapes.
---------------------------------------------------------------------------
From: simong@aifs.org.au (Simon Gibbons)

Hi Rasana,

We run a similar set-up with Networker, and I've experienced the same
problems. Networker stops writing to a tape and marks it as "full" as
soon as it encounters any writing problems, e.g. a bad spot on the tape
or a malfunction on the drive. The first time was because I hadn't
cleaned the tape drives. The next was due to SCSI errors on the
Networker server. If you find that it's the latter, drop me a line and
I can tell you the steps we took to solve it.

Regards,
        Simon

---------------------------------------------------------------------------
From: david%org-bone@org-bone.ucsf.EDU (David Burzota)

   Networker will mark a tape as full when it gets an error while writing to
   it. If a lot of tapes are filling up too soon, it is because some tape
   error is occurring frequently. The file /nsr/logs/messages should
   show the reason the tape was marked full. My first guess would be
   that the tape drive needs cleaning. If there are some scsi errors in
   /var/adm/messages or /nsr/logs/messages at the same time that the
   tape was marked full, then NetWorker is getting write failures when the scsi
   errors occur. In any event the error messages in two above log files
   should point in the right direction.

   ( I hope this helps - I worked for Legato a few years back and I may
     be a little out of date... )
---------------------------------------------------------------------------
From: Eric.Olemans@esat.kuleuven.ac.be

There's probably something wrong with the tapedrive, causing to frequent
write-errors.
I've had simmilar problems with a DLT-autoloader. It started asking for
a cleaning tape every day. The DLT-drive had to be replaced.
Greetings,

                                                        ERIC OLEMANS.

------------------------------------------------------------------------------
From: Geert.Devos@ping.be (Geert Devos (org. : Graphidec-Belgium))

Hi Rasana,

Don't have any experience with 4.1.3, only 4.0.2, so this may not be
relevant at all.

We have a customer using 4.0.2 on a SPARC 5 running 2.4.

He had his nsrd gone berzerk last month, after running fine for a couple of
months. Didn't find his media database anymore, lost the mount point of a
8Gig RAID system, did maximum 1.7 Gig on a 4 mm DAT, etc. We had to stop
and restart the damned daemon, after which it recognized everything just
fine and did his normal vomume ('round 2.8 Gig per tape).

Look up the man pages on nsrd and nsr_shutdown.

What is causing this? Haven't got the faintest. And at the looks of what
Sun answered me, they haven't either. Maybe you're better in analyzing this
kind of "mad" daemons when they're still running.

Have "fun" with it.

Geert
--------------------------------------------------------------------------------
From: James M Mosley <jmmosley@uncc.edu>

Rasana,
        Have you checked your logs to see if you are getting tape I/O errors?
At my previous job, I used Legato with Exabyte 10E jukeboexes. The jukeboxes
had Exabyte 8500C tape drives which are notorious for going bad. One of the
symptons I saw when a drive started to go bad was that I would start to get I/O
errors. When Legato encounters an I/O error, it marks the tape as full and tries
another tape in the pool. This can also be a sympton of a tape (or tapes) going
bad. When these sorts of things happen, the jukebox driver can get VERY
confused and often has to be cleared manually.

Mike

----------------------------------------------------------------------------
From: Michael Maciolek <mikem@centerline.com>

It's either time to clean your tape heads (past time, really), or if
that doesn't work, it may be time to get your tape drive serviced. The
big problem with 8mm drives is that they're very susceptible to wear.
Frequent cleaning helps - I usually clean after every 10 tapes (1 jukebox
full) and haven't seen this problem in over a year.

FYI, networker keeps track of tape errors while it's writing; if it sees
a large number of errors, it assumes the *tape* is bad (not the drive)
and moves on to the next tape. (I personally think it would make sense
to send a warning message to the backup administrator) Since the tapes
aren't really bad (in your case), you're really seeing an indication of
how dirty (or worn) the tape heads are.

---------------------------------------------------------------------------
From: Alan Yasutovich <yasu@bgs.com>

        I am not a legato expert, but those packages (much like tape lenghts
        and density with dump) have their own internal tracking methods.

        So they lie to themselves with numbers.

        Look for this. i.e. an artificially full media pool.

        ALan

---------------------------------------------------------------------------
From: jbapti01@puffer.com (Jean Baptiste)

Ah HA,

Finally, something up my alley.

I had the same requests after I started to recycle tapes from a different backup scheme i.e. dump. Once I changed to brand new tapes everything worked fine.

Hope this helps.

---------------------------------------------------------------------------
From: ivy@durham.med.unc.edu (Heather Ivane Wallace)

This may or may not be completely out in left field....

I had to change multiple tapes 1) while our tape head was dirty and 2) when the
drive finally gave out.

>From my very meager understanding, Networker apparently "gives up" on a tape
after so many tape write errors. This could lead to it requesting multiple
tapes.

Hope it helps. (and I hope your tape drive is good....we returned ours!)

Heather Ivy Wallace
---------------------------------------------------------------------------
From: Steve Kirkpatrick <Steve.Kirkpatrick@andataco.com>

Sounds like your tape drive needs to be cleaned? I know that Networker
will mark a tape as "full" when it reaches a certain number of write errors
on the tape.

Steve.

---------------------------------------------------------------------------
From: Scott Turvey <scottt@nacm.com>

the drives may be shot, or they are in a bad need of a cleaning.
you may have also got a bad bunch of tapes.
---------------------------------------------------------------------------
From: pradu@jupiter.Legato.COM (Paul Radu)
Subject: Re: Legato Networker Woes - incident # 194167

Greetings,

In nearly every case, this behaviour is cause by the tape drive reporting
an unrecoverable write error. The capacity that is displayed by Networker
is only an estimate determined by the device type attribute when the device
is entered under Networker. This attribute has absolutly no affect on the
actual capacity of the media. Here is an explaination of why this behaviour
occurs:
Networker uses the standard UNIX deivice file interface to write data to the
backup device (tape drive). This interface is very simple, the application
(Networker in this case) creates a buffer, a pointer to the buffer and a
variable that contains the size of the buffer. This is all passed to the
UNIX kernel, the kernel returns the number of bytes actually written. All
goes well until the kernel returns a number smaller than the size of the
buffer, this indicates that the device (tape drive) will not accept any more
data. This behaviour normally occurs when the end of the tape is reached,
however it also occurs when the tape drive is unable to successfully write on
the tape, due to an extensive defect on the tape or defective hardware (the
tape drive itself). There is no way for Networker to determin if the behaviour
is caused by an error or because the actual end of tape has been reached.

I would recommend the following course of action:

- clean the tape drive(s) with a cleaning cartridge.

- if you have already tried this, then try a new tape.

- if there is no change in the behaviour, the most likely cause is a defective
  tape drive.

Regards,
Paul Radu
Legato Technical Support
---------------------------------------------------------------------------
From: simong@aifs.org.au (Simon Gibbons)

Hi Rasana,

Here's a typical example of the entries in /var/adm/messages on hostX
when I encounter SCSI errors. The first few messages are normal
NetWorker messages, then the SCSI error is encountered, then NetWorker
tells me that the tape is full, but only used 248 MB of 2000 MB
capacity. If this doesn't sound like your problem, I've attached
Legato's response at the end.

Drop me a line if there's anything else I can offer.

Best of luck,
        Simon

Mar 27 01:15:02 hostX syslog: NetWorker savegroup: (info) starting Onsite (with 6 clients)
Mar 27 01:15:10 hostX syslog: NetWorker media: (waiting) backup to pool 'Onsite' waiting for 2 writable backup tapes
Mar 27 01:15:11 hostX syslog: NetWorker media: (info) suggest mounting 9422.full.b for backup to pool 'Onsite'
Mar 27 01:20:49 hostX unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
Mar 27 01:20:49 hostX unix: Target 1.0 botched tagged queueing msg (0x80,
Mar 27 01:20:53 hostX unix: 0x20)
Mar 27 01:20:53 hostX unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
Mar 27 01:20:53 hostX unix: failed reselection
Mar 27 01:20:53 hostX unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0 (sd1):

Mar 27 01:20:53 hostX unix: SCSI transport failed: reason 'reset':
Mar 27 01:20:53 hostX unix: retrying command
Mar 27 01:20:54 hostX unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0 (sd3):

Mar 27 01:20:54 hostX unix: SCSI transport failed: reason 'reset':
Mar 27 01:20:54 hostX unix: retrying command
Mar 27 01:20:53 hostX syslog: NetWorker media: (warning) /dev/rmt/0bn writing: I/O error, at file 25 record 1500
Mar 27 01:20:53 hostX syslog: NetWorker media: (notice) 8mm tape 9612.incr.b used 248 MB of 2000 MB capacity
Mar 27 01:20:53 hostX syslog: NetWorker media: (notice) 8mm tape 9612.incr.b on /dev/rmt/0bn is full
Mar 27 01:22:05 hostX syslog: NetWorker media: (waiting) backup to pool 'Onsite' waiting for 2 writable backup tapes

----- Begin Included Message -----

From: soma@jupiter.Legato.COM (Soma Shekar)
To: simong@AIFS.ORG.AU

Here is a tech bulletin on this issue.....

PURPOSE

This Technical Bulletin addresses tape capacity issues. Specifically,
you may feel that NetWorker does not write enough data to tape to fill
it according to the tape's capacity. For example, a tape with an
advertised capacity of 4000 Mbytes may be marked full by NetWorker
after only 3000 Mbytes of data have been written to it.

EXPLANATION

There are several reasons for what appears to be NetWorker filling
tapes prematurely:

Each of these reasons is explained in more detail in this bulletin.
The NetWorker algorithm for writing to tape is shown below:
res = write(fd, buf, cnt)
if (res != cnt) {
        report tape full;
}

NetWorker does not know the capacity of a tape. It keeps writing until
it encounters a write error or end of tape (EOT). A tape marked FULL
may show a capacity of 40% or 300%. For the purpose of providing a
percentage, NetWorker estimates the capacity of a tape. However, the
actual amount of data written is determined solely by the first write
error or EOT.

Select the highest density device driver appropriate for your device.
This should ensure that when a tape is labeled, NetWorker will write to
it at the highest density supported by your device. High-density
drivers should be used for compressing drives, medium-density for
normal drives, and low-density for older, lower-density formats. For
example, for an 8505 Exabyte drive with compression you would select
the high-density device, for an 8500 Exabyte without compression the
medium-density device, and for an 8200 select the low-density device.
In some cases, especially for compressing drives, you may also need to
select jumper or switch settings on your device to achieve
compression. See Technical Bulletin 146 for tips on correctly
configuring device drivers for the proper density.

If you configure the device driver correctly, tapes written on a
lower-density device should automatically be re-labeled with the
selected device driver's density when written from the beginning of the
tape. On some systems with poor implementations of the high-density
device, the density could be determined by the format an old tape was
written in. If this is the case, use new tapes in the drive instead of
re-labeling old, lower-density tapes.

Performance can affect tape quality. Some drives will pad the tape if
data is not supplied fast enough. The total effective tape capacity is
reduced in an effort to keep the drive streaming. Noticeable
reductions in tape capacity due to streaming are rarely encountered
with NetWorker.

Write Errors

Most tape drives attempt to read after a write operation to verify that
the tape was written correctly, and retry if it was not. A write error
reflected to the write system call indicates either end of tape or a
read error. Therefore, any tape error will result in NetWorker marking
the tape as being full. NetWorker does not attempt to write data on a
tape once a write error occurs.

To NetWorker, a failure to write data is very serious and NetWorker
takes the prudent course: it marks the tape as full (the data before
the error may still be valid) and does not attempt any more write
operations to the suspect tape. In other words, NetWorker views the
end of the tape the same as a write error: in either case, the tape
should be marked full.

To prevent these kinds of tape write errors, you should clean your tape
drive regularly and use only data-quality tapes.

Other hardware issues can cause write errors to occur. If cleaning the
drive does not seem to help, make sure that the device driver is
properly configured, any necessary switch settings on the tape drive
itself are set to the manufacturer's specifications, all cabling is
secure, and other potential SCSI problems have been addressed.

Filemarks

NetWorker periodically writes filemarks to facilitate rapid recovery of
data. These filemarks consume varying amounts of tape depending upon
the type of tape drive - on some drives, filemarks can consume several
Mbytes. The number of filemarks NetWorker writes to tape is a function
of how many save sets are on the tape: many small save sets will
require more filemarks than a few larger ones.

The amount of tape consumed by filemarks is not currently tracked by
the NetWorker statistics. Therefore, if you see x number of bytes
written to tape, and k is the number of filemarks, x + k * (size of
filemark) is the actual number of bytes written to tape.

Tape Capacities

Tape capacities are not constant from tape to tape - two apparently
identical tapes from the same vendor may vary significantly in
capacities. This can cause problems if you copy one very full tape to
another, especially if the destination tape holds less data than the
source tape.

Compression

It is not possible to predict the effect on tape capacities from
compressing tape drives. A compressing drive may provide twice the
capacity of a non-compressing drive - it could be far less or far more,
depending on the kind of data being backed up.

Therefore, if a non-compressing drive writes 2 Gbytes of data to a
specific tape, the compressing drive could write 10 Gbytes, 2 Gbytes, 5
Gbytes, or some other unpredictable amount of data.

Tape Length

Be sure to verify tape lengths. A 120-meter DAT tape will hold more
data than a 90-meter DAT tape, and without examining the printed
information on the tape cassette carefully, the two tapes may appear
identical.

To request an electronic list of all Legato Technical Bulletins, e-mail
your request to request@Legato.COM with a subject line of send
bulletins index. You can also download them from ftp.legato.com
(Internet address 137.69.1.1) or CompuServe (go Legato).. For a hard
copy subscription, see Technical Bulletin #025 and FAX your request to
(415) 812-6034.

PRINTING HISTORY
First published 12/20/94
Updated 2/23/95
J.K.

TECHNICAL BULLETIN
176: NetWorker and Tape Capacity Discrepancies (UNIX and NetWare)

---------------------------------------------------------------------------
From: brol@wc.eso.mc.xerox.com (Paul G. Brol)

Rasana,

If you haven't received any responses, my experience has been that
there is a problem with the tape drive. Either clean the heads or
place a service call. I had this problem and they replaced the tape
drive. I have hardware copression drives and went from being able to
put a wide range of data (10MB to 8GB) on each tape to consistently
being able to put 9-10 GB on each.

Paul

---------------------------------------------------------------------------
From: amy.hollander@amp.com (Amy Hollander)

clean your tape drive or replace it.

(Message inbox:69)
>From msiple@mfg.wb.xerox.com Mon Apr 22 05:09:27 1996
Received: from alpha.Xerox.COM by library.ucsf.edu with SMTP id AA16509
  (5.67a8/IDA-1.5 for <Rasana.Atreya@library.ucsf.edu>); Mon, 22 Apr 1996 05:09:16 -0700
Received: from two728.mfg.wb.xerox.com ([13.252.78.28]) by alpha.xerox.com with SMTP id <14544(12)>; Mon, 22 Apr 1996 05:09:53 PDT
Received: from two665 by two728.mfg.wb.xerox.com (5.x/SMI-4.1)
        id AA04962; Mon, 22 Apr 1996 08:07:45 -0400
Received: by two665 (5.x/XeroxClient-SVR4-1.2)
        id AA03647; Mon, 22 Apr 1996 08:09:44 -0400
Date: Mon, 22 Apr 1996 05:09:44 PDT
From: msiple@mfg.wb.xerox.com (Matt Siple)
Message-Id: <9604221209.AA03647@two665>
To: Rasana.Atreya@library.ucsf.edu
Subject: Re: Legato Networker Woes
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Md5: MWyRfFf/IdPLiBcm6lSPkg==

Rasana,

This is pretty commoninour shop too. Usually what happens is that the tape
drive writes to part of the tape and then dies partially through the backup.
Networker sees that it cant write to the tape and therefor marks it "Full". The
root cause of this is usually dirty heads on the tape drive. Sometimes you can
clean it by running the tape cleaner through several times. However more often
than not I have to have the unit replaced by SUN. ( Im sure they just clean it
and send it back to somone else). This seems to happen on tape drives that are
not used very often more than drives I use every day. I have been told that this
is because the tape head oxidizes if not used. Any way , hope this helps.

Matthew Siple
Systems Engineer
EDS/XEROX

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Rasana Atreya Voice: (415) 476-3623 ~
~ Programmer/Analyst Fax: (415) 476-4653 ~
~ Library & Ctr for Knowledge Mgnt, Univ. of California at San Francisco ~
~ 530 Parnassus Ave, Box 0840, San Francisco, CA 94143-0840 ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:59 CDT