SUMMARY SCSI Disk Errors - sense key: not ready

From: Wiest, Damian <dmwiest_at_rc2corp.com>
Date: Tue Feb 07 2006 - 11:00:34 EST
Thanks to Jason Grove, Chris Ruhnke and Brad Morrison for their input on
this topic.

Possible reasons for the disk problems included high humidity, power supply
problems, dying motor, poor cabling and a problem with the SCSI interface to
the server.  Unfortunately, I didn't receive any information on the meaning
of the vendor's error code.  If anyone has access to this, I'd appreciate
the information.

I suspect the drives' motors were slowly dying; I replaced both disks one
week ago and have not experienced any SCSI errors since then which would
seem to rule out problems with power, cabling and the server's interface.
Also, the climate in our server room is controlled, with humidity typically
below 20%.

I've included my initial question along with some of the responses I
received below.

-Damian

> Those drives (IBM) if I remember correctly had some problems with 
> humidity. Make sure the environment they are in does not have high 
> humidity.. Sun was replacing them a while ago... run an iostat -En and 
> see how many hard and media errors there are. if it is over 10 I think, 
> then you need to replace the drive.
> 
> jason


> I have an E450 which has exhibited similar problems on Fuji and Sun SCSI
harddrives. 
> 
> "device not ready" means exactly that -- the device has spun down for some
reason. 
> In my case I have been able to "unplug" the disk from the backplane, wait
one minute and plug it back in and the
> drive will spin back up. 
> You will then have to re-enable it with SVM and it should sync up with its
mirror -- "# metareplace -e <metadevice> > <slice>". 
> If the drive is truly bad, it won't spin back up. 
> It could also be an early indication of a failing drive; but you won't
know for sure until it dies completely. 
> Or your power supply may be marginal and under "stress" of heavy activity
the power level may fall below the level 
> needed by this drive. 
>
>  
> --CHRis
>
> Chris H. Ruhnke
> Technical Services Professional
> IBM Global Services
> Dallas, TX


> My opinion is that it's a bad cable or a physical problem with the
interface on the machine. It seems like a very,
> very remote possibility that both drives have the same problem. Yes,
they're old, but what are the odds of two
> having the same problem, i.e., transport failures at high bandwidth usage.
Don't let the block identifier fool you:
> "Drive not ready" means that the operation was interrupted because the
"drive ready" signal went to zero during the > operation. Although this can
be caused by a bad drive, it doesn't seem likely that both drives would
come/go
> on/offline. 
>
> Hmmm. It is possible (not too likely, IMHO) that you have a power problem.
It's unlikely b/c a drive that fails in
> this way would have to spin up again after having lost power, i.e., you'd
be seeing many more "drive not ready"
> messages during the spin-up. 
>
> OTOH, given that you have replacement drives handy, you could prove this
by swapping them out and causing the high
> traffic. In fact, since they're mirrored with SVM, you could perform one
drive replacement to the mirror and
> determine whether the same errors happen with the replacement drive. I'm
guessing that it will. :-) 
>
> Be sure to summarize this one. SCSI errors (and their associated
resolutions) can always use more exposure. ;-)
>
> Brad Morrison | The Capital Group Companies 
> Location: SNO | x43199 | (210) 474-3199 | Cell: (281) 704-5375 
> E-mail: Brad_Morrison@capgroup.com 
> [ Mailing: 3500 Wiseman Blvd  San Antonio, TX 78251-4320 USA ] 


-----Original Message-----
From: Wiest, Damian [mailto:dmwiest@rc2corp.com] 
Sent: Tuesday, January 31, 2006 8:25 AM
To: 'sunmanagers@sunmanagers.org'
Subject: SCSI Disk Errors - sense key: not ready


Greetings all,

I have a couple of IBM SCSI drives that are requiring maintenance on a
weekly basis.  I have six 18GB drives installed in the first half of a D1000
array which is attached to a dual-channel Symbios SCSI card in an old E-250.
Four of the disks are from IBM (product number DDYST1835SUN18G, revision
S94A) and the other two are from Fujitsu (product number MAJ3182M SUN18G,
revision 0804).  I have configured the disks as three, two-way mirrors under
SVM; one of the mirrors with IBM drives is logging errors.  Here's a sample
entry from /var/adm/messages:

Jan 28 06:30:01 lcidev01 unix: WARNING: /pci@1f,4000/scsi@5,1/sd@2,0 (sd47):
Jan 28 06:30:01 lcidev01        Error for Command: write(10)
Error Level: Fatal
Jan 28 06:30:01 lcidev01 unix:  Requested Block: 6137304
Error Block: 6137304
Jan 28 06:30:01 lcidev01 unix:  Vendor: IBM
Serial Number: 00361EE587
Jan 28 06:30:01 lcidev01 unix:  Sense Key: Not Ready
Jan 28 06:30:01 lcidev01 unix:  ASC: 0x4 (<vendor unique code 0x4>), ASCQ:
0x1, FRU: 0x0 Jan 28 06:30:01 lcidev01 unix: WARNING: md: d112: write error
on /dev/dsk/c1t2d0s7 Jan 28 06:30:11 lcidev01 unix: WARNING: md: d112:
/dev/dsk/c1t2d0s7 needs maintenance

The disks typically begin exhibiting this behavior during periods of high
activity.  I do have a couple of replacements lying around, but I'd like
some advice as to whether this problem is related to the drives, or if it's
indicative of a bigger problem before simply swapping them out.

TIA!

-Damian
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Tue Feb 7 11:16:22 2006

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:55 EST