SUMMARY:SS1000E:18GB RAID

From: Vikas Arora Extn-6001 (vikas@del05ld.sgs-thomson.it)
Date: Fri Oct 06 1995 - 01:34:31 CDT


Hi !

I think i owe this one to the list.Although i got very few responses
but i did get answer to my problem.My original posting was

Hi Admins,

I have an SS1000E with 18GB RAID 5 configured running Solaris 2.4.About
two days ago i started getting following error on my console

Unix: Warning:/io-unit@f,e0200000/sbi@0,0/SUNW,soc@2,0/SUNW,phea0000000,
       722e27/ssde3,1(ssd16):

              error for command 'read(10)' error level retryable

              Requested block 1837056 error block 1839012

              sense key: aborted command vendor CONNER

After this error the machine suddenly hanged and i tried to cold boot.
But obviously since the file systems were not properly shutdown
it shouted while coming up and i tried to do a "fsck" then.although the

fsck went quite well on the "/", "/usr" and "/opt" file systems it
started cribbing on the raid 5 18 GB file system(/s2va).It started giving
warning like

DUP 1234567
DUP 234567
Excessive DUP BLKS I=4554478 ?
continue ?

on giving "yes" here it again started giving some listing of some
more DUP blocks and at the end terminated with following error

"Cannot fix, first entry in directory contains"
Fsck: Warning: The following command (process 271) was terminated by
signal 11 and dumped core.

fsck -y -F ufs /dev/vx/rdsk/s2va

Do anyone one know what to do to have my file system restored ?

Any help would be appreciated !

Thanks and Best Regards

___________________________________________________________________________

Frankly I had not hoped that quite a few people would have faced similar
problem when i posted this question but to my surprise i had atleast three
people who had faced similar problem with SUN RAID Boxes.Following are the
answers i got :

<jkelle@mis.snap.org> (John)

He sent the reply which i guess would have worked in case i had not tried
the other option(Next one).Here is his answer

The following was done on a Sun SPARCstorage array 200 hanging off a SS1000.
The raid config was raid 5, 1 stripe across 17 spindles, 1 hot spare, total
storage approx. 47 GB usable (18 2.9 GB HD's).

When I did this a while back, I started openwindows, and then ina shell tool
I started vxva by entering "vxva &". This brought up the volume manager
software for monitoring and controlling the plexes (or metadevices if you'd
prefer disksuite terminology). As I understand the process, the device type
/dev/vx/... is a logical device that represents your raid array configuration,
including the number of physical spindles, the number of coloumns the stripe
is to cover and so on...if that logical device gets hosed, then I was most
successful in recovering that through the Volume Manager software. When in
the software, choose volume operations and choose start ( I believe this is
correct...bear with me, it's been over 6 mos.) - then go to the "world" view
of the controller, which should show you all of the physical and logical
devices represented in your array. With the mouse, click once on the plex
(the logical representation of your raid volume) of the troubled volume so
that it is hi-lited. Then go to the menu and choose file system operations,
fsck. If the logical device hasn't been totally corrupted, I believe that
this should complete the fsck operation.

One note: I did experience a time when the above process was not sufficient to
bring the plex back. The final solution that time was to touch a file in the
root directory, and I believe that it should be called reconfigure. This does
a total reconfigure of all links and misc. configureation parameters w/in the
sun. This worked for me, but it also was a bit of a headache due to the fact
that it also reconfig'd alot of other things, too.

_____________________________________________________________________________

>From <marg@columbia.edu> This one really did the trick for me and
helped me to restore my data and file system with not much loss in my case.

this happened to us just last night. we were able to get our filesystem
back by using the fsdb command to map out the offending inodes.

        fsdb -F ufs -z <inum> /dev/vx/rdsk/diskgroup/volume

then you should be able to fsck the volume without dumping core.

______________________________________________________________________________

These were the replies i got.Thanks to people whose reply
might be on way.

With Best Regards



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:34 CDT