SUMMARY: SCSI errors during dump/tar

From: Ross Helfand (rhelfand@census.gov)
Date: Tue Apr 07 1998 - 15:28:46 CDT


Sorry for the late summary. Thanks very much to the following people:

Art Freeman <Art.Freeman@ing-barings.com>
Aaron Lafferty <lafferty@oar.net>
David Thorburn-Gundlach <david@bae.uga.edu>
NEERAJ VERMA <nverma@hotmail.com>
Gerald Litteer <gll@moran.inel.gov>

Original Question:

> Managers,
>
> We recently started having problems backing up certain file systems on
one
> of our servers. We seem to be able to back up /, /usr, /var, etc. -
> basically, regular devices.
>
> There is a Storage Array connected to this server, and the problem
file
> systems are under Volume Manager control. We've tried backing up
> these file systems across the network to other tape devices, with the
same
> results. A section of the ufsdump errors follows:
>
> DUMP: Writing 63 Kilobyte records
> DUMP: /dev/rmt/0hn: I/O error
> DUMP: NEEDS ATTENTION: Cannot open volume. Do you want to retry the
> open?: ("yes" or "no")
> DUMP: The ENTIRE dump is aborted.
>
>
> When the dump fails, the following errors appear in our message log:
>
> Mar 17 14:24:45 fldspc5 unix: WARNING:
> /io-unit@f,e2200000/sbi@0,0/dma@0,81000/esp@0,80000 (esp2):
> Mar 17 14:24:45 fldspc5 unix: polled command timeout: current esp
state:
> Mar 17 14:24:45 fldspc5 unix: esp: State=CLEARING Last State=CLEARING

> Mar 17 14:24:45 fldspc5 unix: esp: Latched
> stat=0x97<IPND,XZERO,MSG,CD,IO> intr=0x8<FCMP> fifo 0x80
> Mar 17 14:24:45 fldspc5 unix: esp: last msg out: <unknown msg>; last
msg
> in: COMMAND COMPLETE
> Mar 17 14:24:45 fldspc5 unix: esp: DMA csr=0x40040010<INTEN>
> Mar 17 14:24:45 fldspc5 unix: esp: addr=fc0003aa dmacnt=0
last=fc0003a8
> last_cnt=2800
> Mar 17 14:24:46 fldspc5 unix: esp: Cmd dump for Target 2 Lun 0:
> Mar 17 14:24:46 fldspc5 unix: esp: cdblen=6, cdb=[ 0xa 0x0 0x0 0x28
0x0
> 0x0 ]; Status=0x0
> Mar 17 14:24:46 fldspc5 unix: esp: pkt_state=0x1b<STS,XFER,SEL,ARB>
> pkt_flags=0x0 pkt_statistics=0x3
> Mar 17 14:24:46 fldspc5 unix: esp: cmd_flags=0x10c2a cmd_timeout=0
> Mar 17 14:24:46 fldspc5 unix: WARNING:
> /io-unit@f,e2200000/sbi@0,0/dma@0,81000/esp@0,80000 (esp2):
> Mar 17 14:24:46 fldspc5 unix: Connected command timeout for Target
2.0
> Mar 17 14:24:46 fldspc5 unix: WARNING:
> /io-unit@f,e2200000/sbi@0,0/dma@0,81000/esp@0,80000/st@2,0 (st16):
> Mar 17 14:24:46 fldspc5 unix: SCSI transport failed: reason
'timeout':
> giving u
> Mar 17 14:24:46 fldspc5 unix: p
> Mar 17 14:24:46 fldspc5 unix: WARNING:
> /io-unit@f,e2200000/sbi@0,0/dma@0,81000/esp@0,80000/st@1,0 (st15):
> Mar 17 14:24:46 fldspc5 unix: SCSI transport failed: reason 'reset':
> giving up
>

Gerald probably summed it up best:
'Sounds' like hardware to me...

Check the following:
        1) Tape drive
               a) try on another machine
               b) try on another SCSI controller (NOT another id)
        2) Cable - REPLACE
        3) Controller

After more testing, it doesn't seem to matter if the file systems are
"local" or on the storage array. We switched tape changers, and the
"problem" changer worked fine on another system, while the new changer
reported the same errors. We then changed cables (a couple of times),
but the problem still persists (although not as often). I'm guessing
it's probably a bad scsi controller on the server.

I appreciate all the excellent help!

Ross Helfand
Census Bureau



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:36 CDT