SUMMARY: DiskSuite on SunFire280R

From: Urie, Todd <TUrie_at_trueposition.com> Date: Wed Jun 05 2002 - 07:57:34 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:42:46 EST

I got quite a few responses and some very good suggestions / comments.  I
had read through the manuals and was basically looking for some feedback to
make sure that I understood what I had read.  I'm glad that I posted the
questions because a few people posted things that I had not considered.  I
learned something, which is what makes this list so valuable.

Thanks to the following people for the replies:

Eric vande Meerakker
Hichael Morton
Henrik Schmiediche (Henrik even included a text version of his setup /
recovery procedure, thanks very much)
Thomas M. Payerle
Nelson T. Caparroso
Andrew Stuever
Simon-Bernard Drolet
Bevan Broun
thetrick@wizard.net
Eric Shafto
Peter Evans.

There were quite a few good comments, therefore I'm going to post each of
the replies below.  The general consensus appears to be that 2 or more
metadb's on each disk is sufficient.  Location does not appear to matter in
the opinion of most, the assumption being that failure of a disk will not
likely be limited to a particular slice.  It is more likely to be a total
disk failure, therefore, it doesn't help to spread the metadb's across
slices.

Recovery is certainly possible, and several different options are included
in the responses below.

Thanks,
Todd Urie

Responses:
****************************************************************************
**********************************
> Since DiskSuite requires that there be at least 1/2 the number of
> db's + 1 in order to boot a system, it is impossible to set a practical
metadb
> configuration such that, in the event of a complete failure of one of the
> disks, the system can be rebooted.  Therefore, by configuring mirroring on
a
> 280R (or any other 2 disk system), I only gain the ability to keep my
system
> up in the event of a complete failure of 1 of the disks.  Not suggesting
> that 'only keeping the system up' is bad, just trying to explicitly
> understand the limitations and risks that are not addressed.

True, however, when you reboot, you can tell the system to disregard the 
metadb replicas on the failed disk and continue booting. (Off the top of my 
head: boot -s, do a metadb -d and reboot. Maybe better test that again 
sometime .:-)

It is advisable to put three replicas on each disk though, so you'll still 
have three replicas (the minimum needed for DiskSuite to boot) in case one 
disk fails.

I never saw a problem that put one or more entire slices out of commission.
I 
cannot think of a good situation either (except a corruption on the
partition 
info block, in which case it is unlikely to affect only and exactly the 
parameters of the slices containing the replicas without affecting the other

slices), as slices are in the "mind" of the OS only. The disk itself is just
a 
bunch of blocks...

Regards,

Eric.

***********************************************************************

Tood,

docs.sun.com should have the disksuite user
manual/guide.  

Sun disksuite class suggested/instructed 2 on one disk
and 1 on the other as a minimum.  More databases are
possible.  I would create a minimum of 2 on each hard
drive.

The possibility of have 2 slices or hard drives die in
this situation highlight the need for good backup
procedures.  

My customers use disksuite to mirror the boot drives
(smaller sites do it by hand).  In 2 years we have
lost 3 boot drives on my 100+ servers--we have never
lost both boot drives.

I would suggest you read the disksuite manuals and
follow the guidelines that Sun gives--they deal with
1000s of networks and are the experts.  Disksuite is
an old program and these types of procedures are well
defined.  

Hope this helps,

HM

**********************************************************************
     Hello,
I have a SunFire 280R with internal disks mirrored using DiskSuite 4.2.1. I
have two meta DB's on each disk (slice 7). In case of one disk failure the
system will continue to function fine. The only problem is that if you
reboot the system will refuse to startup --- unless you delete the metaDB's
on the failed disk. So in my case all I need to do is:

 metadb -d /dev/dsk/c1t?d0s7

after the failed startup. Then the system will boot up again. Also, if I
notice the failed drive, I can delete the metaDB on the failed disks before
reboot/restart and the system will come up smoothly. I know this works
because I have tested it thoroughly.

I have attached *my notes* to help me rebuild/restart *my* system in case of
failure. They may help you, the may be incorrect, don't do anything you do
not understand.

Sincerely,

    - Henrik

***********************************************************************
The best config is to put 2 or more metadb replicas on EACH of the two
disks.
I put 3 on each disk.  I doubt you will gain much by splitting these
replicas
across different slices; usually when a disk fails it FAILS.  Basically, I
think the main advantage you get is if a sector containing one of the
metadbs
goes bad, and even then not sure how necessary to be in different slices.
Also,
even for hot-swap drives, I've been told that they should only be swapped 
after the system believes they have died, so such a semi-failed disk would
not
be easily replaceable (you would have to manually off-line all working
mirror
metadevices, then remove the metadb replicas; of course as is a half dead
disk
you might decide to forgo that:).  But I think it is rare for a disk to only
partially fail, and if they do it is only briefly (power cycling may be
enough
to completely kill it).

BTW, if root is on one of these disks, then rebooting is likely not to be
trivial/automatic anyway.  Solaris understands mirrors, openboot prompt does
not.  Openboot boots from a single disk, not a mirror, and if the default
boot device fails, I believe you must manually inform it to boot to the
other
device.

As for dealing with the N/2+1 on reboot issue, it actually is not that bad.
All you have to do is delete the replicas on the failed disk, and this can
be done while the disk is failed.  Ideally, you would do this between the
loss of the disk and the reboot.  If that cannot be done, you must first
reboot into single user and delete the replicas, then boot into multiuser.

Tom Payerle 	
Dept of Physics				payerle@physics.umd.edu
University of Maryland	

**********************************************************************
You've perfectly presented your logic sir.  In our case, we use SDS as a
mirroring tool so that our servers remain up and running in the event of a
disk failure. Since the 280R has pluggable disks then you need not schedule
a down time to fix/replace your failed disk. On systems that do not have
pluggable disk systems, then this is very good insurance for the server to
still be able to function until an opportune time for a maintenance window.

We use the 4 state DB replica's on slice 6 (1 cylinder) on each disk - slice
7 (2 cylinders) is for VxVM minirootDG. We've 280R's that sport dual 73G
drives which we "treat" as an array and use soft partitions to create
"filesystems" beyond the 7 slice limit of the old disksuite.

NELSON 

*********************************************************************
You have hit the major problem with Disksuite and only two disks.  This
is one of the reasons I always like to spread the db across three or
more.  If it is a system that can only have two disks (Netra 1125, etc.)
then, you have to go through a bunch of hoops booting from your
alternate boot disk if you loose you either hard drive. 

Basically, before you reboot, you have to clear the metadb from the bad
disk.  Then you can boot, and replace the disk.

-- 
----------------------------------------------------
Andrew Stueve    

**********************************************************************

Hi,

Can I suggest two things ?

First on each disk, you should create two metadbs, one in slice 4 (let say)
and one in slice 7 (well, that's what I do).

So your metadb  will look like this:

root@test# metadb
        flags           first blk       block count
     a m  p  luo        16              8192            /dev/dsk/c1t0d0s4
     a    p  luo        16              8192            /dev/dsk/c1t1d0s4
     a    p  luo        16              8192            /dev/dsk/c1t0d0s7
     a    p  luo        16              8192            /dev/dsk/c1t1d0s7
root@test#

NOTE HERE: I'm using metadb with 8192 in size ! (Getting ready for Solaris
9!)

Then in /etc/system:

set md:mirrored_root_flag=1

This tells SDS/SVM to forget about quorum (50% + 1 metadbs)

This way, in a two disks config, the system will stay on if one disk goes
away and also reboot  with one disk.

Remember to set "boot-device" in the boot prom.

Simon.

***********************************************************************

So the system lost a disk and needed booting it would be a matter of "boot
-b" with the correct device. fsck / and remount read/write. Then edit
/etc/vfstab to remove the reference to the metadevices. Ie, your back to
raw partitions on the good disk.

BB

**********************************************************************

It works with 3 copies in one slice on each disk.

I saw only one failure of a mirror in the past 14 months.  Recovery isn't
real smooth, but it works.  The procedure should be on docs.sun.com

***********************************************************************

There is a procedure for booting when you have insufficient metadb replicas.
It's just unpleasantly involved. I am pretty sure that 3 per disk will allow
you to boot with one disk gone. I did that just a few weeks ago.

**********************************************************************

	this works for me, since you can create metadb's with 5 copies per
	minimini slice, one on each disk, giving you 10 metadb's.

	typically i have a noddy 32mb partition for this. 

	everytime you reboot, it resyncs the sub mirror,
	takes about 15 minutes of "sluggish" performance after booting.
	-.-

	P
	----*
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers