SUMMARY: Disk Read Error

From: Rick Wightman (wightman@jupiter.sun.csd.unb.ca)
Date: Sat Mar 04 1995 - 04:13:23 CST


ORIGINAL POSTING:

Gentle Folk:

SS10/30, Sun OS 4.1.3

I have a group of undergraduates who have an account with us occupying an entire
disk. We thought they were backing their own stuff up, and they (of course)
thought we were. None of this would have mattered until today when...

sd3d: Error for command 'read'
sd3d: Error Level: Retryable
sd3d: Block 1008, absolute block 205682
sd3d: Sense Key: Media Error
sd3d: Vendor 'SEAGATE' Error Code 0x10.
 .
 .
 .
sd3d: Error Level Fatal

Given that no-one besides this group of students can write/read this disk, the
data *should* still be there. And now the question:

Can anyone suggest a way to limp the disk along enough to copy the data to tape?

I can't do dd or tar. I assume that dump will die an equally ugly death if I try
it. I'm in over my head, and so I humbly ask for your support to save this
graduating class project (yikes!)

Sincerely,

SUMMARY:

The general consensus was to use the format utility to try and repair the bad
block and I went with it. Unfortunately, while attempting to get the machine up
with only the boot drive, the tape drive (to back-up to) and the bad disk,
things got worse. On boot the bad disk was reported as having an unknown size.
If I cold booted things were better but when I got into format, format didn't
think that the disk was formatted. To boot, the listing of partition tables
varied in number each time.

So, I left the drive out over night to cool and booted cold in the morning with
the minimal scsi configuration and with the pizza box open. The bad drive has
always played a set of descending notes on power up. This time, after it
completed, I noticed some loud clicks from the bad drive, and a minute later the
descending notes. And of course, format was confused.

Bottom Line: Looks like the controller is bad in addition to the bad block.

Fortunately for all involved, the data on the bad disk was GIS data that was
imported to our system to be used for straight mapping purposes. The data can be
reobtained with a little effort.

Thanks to all who aided this effort. Of special note is the extensive reply from
who sounds like they know this road well. Attached is the response since I
thought it was a super effort.

"Ashwin P. Rao" <ashwin@cadence.com>
Manjeet_Singh <manjeet@cadence.com>
gaskell@chester.digicon-egr.co.uk (Phil Gaskell)
Glenn Carver <glenn@atmos-modelling.chemistry.cambridge.ac.uk>
thielen@irus.rri.uwo.ca (Susan Thielen)
Ross.Stocks.INSDRS01@nt.com
mshon@sunrock.East.Sun.COM
yamaguch@cqt.com (Bob Yamaguchi)
Mr T Crummey (DIJ) <tom@sees.bangor.ac.uk>
whi@celsius.oz.au (Wayne Hinton)

From: stevee@sbei.com (Steve Ehrhardt)
Date: Wed, 1 Mar 95 12:12:24 PST
To: wightman@unb.ca
Subject: Re: Disk Read Error

Rick,

There are a couple of hardware-related tricks that you can try that may
help.

1) Allow the disk to cool down completely (power down for a while).
   Then try accessing the disk *immediately* after power up. (You'll
   want to disable any fsck from the fstab to prevent warming it up
   before you can get to it.)

   If you can read the disk successfully before it warms up, then try
   running it with the system's chassis open and a desk fan aimed
   at it. You might be able to keep it running long enough to get your
   data off.

   You might also want to check to make sure that the system's cooling
   fan(s) are still working. Hot disks don't perform very well, or
   at least not for very long...

2) Change the physical orientation of the disk to vertical from horizontal.
   Frequently disk problems are at least in large part related to some
   mechanical wear. Changing the orientation changes the forces exerted
   on the bearings, et. al. and can make disks work again, sometimes for
   extended periods.

   You can achieve the orientation change by either moving the entire chassis
   or just the disk. If you're going to move the chassis, make sure that you
   don't block any of the cooling openings.

If approach (1) seems to work for a while, try combining is with approach
(2). I've had a good deal of success with both of these tricks, and they
just might work for you.

I'd suggest using the format program's read-only analysis for testing.
since you won't have to mount it (and therfore fsck it) before testing.
If it survives a full pass, you're in good shape.

This is fairly general, since you didn't supply enough data in you original
message to make much of a guess as to what's gone wrong. Feel free to
contact me for more detailed information.

Steve Ehrhardt stevee@sbei.com
SBE Inc. (510)355-7773
San Ramon,CA
"The opinions expressed are those of the author. His employer would disavow
        any knowlege of them, presuming they knew that he had any."

Rick Wightman, RPFNB: wightman@unb.ca Research Associate
    (506) 453-4910 Garnet Strong Laboratory
                               UNB Forestry and Environmental Management
   Say what you mean, PO Box 44555
   Mean what you say, Fredericton, New Brunswick
   Do what you say. -- Barbara Coloroso CANADA E3B 6C2



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:17 CDT