SUMMARY: metareplace -e (scsi vs disk errors)

From: Jordi Vidal <jordivi_at_wtransnet.net>
Date: Wed Jan 21 2004 - 13:37:46 EST
Thanks to:

Mike Salehi
Harrington, David B
Gary Chambers
Dan Lorenzini

	I ran a format/analyze/read over the failed disk, it fails and
errors now goes to messages file. I metadettached the failed submirror
(d62) and asked my boss for a new disk.

metadettach -f d60 d62
metadettach -f d62

----------- Surface analysis && /var/adm/messages errors ----------
# format
[...]
      7. c3t10d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@8,600000/pci@1/scsi@5/sd@a,0
Specify disk (enter its number): 7
selecting c3t10d0
[disk formatted]

format> analyze
analyze> read
Ready to analyze (won't harm SunOS). This takes a long time, 
but is interruptable with CTRL-C. Continue? yes

        pass 0
Medium error during read: block 2153264 (0x20db30) (211/14/192)
ASC: 0x11   ASCQ: 0x0

Medium error during read: block 2153264 (0x20db30) (211/14/192)
ASC: 0x11   ASCQ: 0x0

C^C^C^C^C^C^C^C^C^C
Medium error during read: block 2153264 (0x20db30) (211/14/192)
ASC: 0x11   ASCQ: 0x0

quit
quit
#


/var/adm/messages ->
Jan 21 19:05:26 xxx  Error for Command: read(10)                Error Level: Retryable
Jan 21 19:05:26 xxx scsi: [ID 107833 kern.notice]    Requested Block: 2153264                   Error Block: 2153264
Jan 21 19:05:26 xxx scsi: [ID 107833 kern.notice]    Vendor: SEAGATE                            Serial Number: 0302B0MFC8  
Jan 21 19:05:26 xxx scsi: [ID 107833 kern.notice]    Sense Key: Media Error
Jan 21 19:05:26 xxx scsi: [ID 107833 kern.notice]    ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xe4
Jan 21 19:05:30 xxx scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/pci@1/scsi@5/sd@a,0 (sd25):/pci@8,600000/pci@1/scsi@5/sd@a,0 (sd25):
[.... many of these ...]
 



---------- Original post  ----------
Hi

SunOS xxx 5.9 Generic_112233-04 sun4u sparc SUNW,Sun-Fire-480R:

Yesterday, one disk of an Solaris-9 SVM (SDS in previos releases) mirror 
failed:

Jan 20 20:20:44 xxx scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/pci@1/scsi@5/sd@a,0 (sd25):
Jan 20 20:20:44 xxx SCSI transport failed: reason 'reset': retrying command
Jan 20 20:31:13 xxx scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/pci@1/scsi@5/sd@a,0 (sd25):
Jan 20 20:31:13 xxx Unhandled Sense Key 'Vendor Unique'
Jan 20 20:46:17 xxx md_stripe: [ID 641072 kern.warning] WARNING: md: d62: write error on /dev/dsk/c3t10d0s7
Jan 20 20:46:18 xxx md_mirror: [ID 104909 kern.warning] WARNING: md: d62: /dev/dsk/c3t10d0s7 needs maintenance

I mounted the failed disk to /mnt, touch a file, umount. It seems ok.

I invoked "metareplace -e d60 c3t10d0s7" to enable the submirror and 
resync it to see if it fails again, and after 5-10 minutes it failed:

Jan 21 15:52:50 xxx md_stripe: [ID 641072 kern.warning] WARNING: md: d62: write error on /dev/dsk/c3t10d0s7
Jan 21 15:52:55 xxx md_mirror: [ID 104909 kern.warning] WARNING: md: d62: /dev/dsk/c3t10d0s7 needs maintenance

No other errors in /var/adm/messages (bad-blocks or so). Other times that
a disk failed, in an other server, there were errors about bad blocks in
the messages file and "metareplace -e" worked for a while (some days)
before the mirror failed again (I dont have spare disks, and in the mean
time I prefer a bad mirror than no mirror)

How can I check if is a disk problem or a SCSI bus problem?


Jordi

http://www.wtransnet.com
Dpto. Tecnico
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Wed Jan 21 13:37:42 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:27 EST