SUMMARY: Online Disk Suite, mirroring boot disk

From: Kitty Ferguson (ferguson@hao.ucar.edu)
Date: Fri Apr 23 1999 - 13:44:59 CDT


Thanks much for the opinions, experiences and scripts from:

Mark Uris Craig Ruff Ed Arnold David Evans Brooke King
Jim Freeman David Babcock Mark Almeida Robert Rose Paul Teasdel
Sam Vilain Nate Itkin Philip Plane Matthew Stier Richard Smith
Kuldip Ottal Ray Delaney Ray Trzaska Richard Goerwitz

A number of folks replied that they could boot from the secondary disk, but had
done so only in testing, or when the primary disk was corrupted but the system
had not yet failed. There were some dificulties reported when trying to break
the mirror following an actual failure of the primary disk or having to relabel
a mirrored disk.

There are good reasons for using dd through cron jobs or perfomrming a ufsdump
on a regular basis, since corruption of the primary disk could be mirrored to
the secondary as well.

This summary addresses two issues: Reasons for using dd or ufsdump and success
and suggestions/cautions when recovering from an ODS mirrored disk.

Not included in my original post:
OS: Solaris 2.5.1 (SunOS 5.5.1)
ODS ver 4.0

My original post is included following the summary.

===============================================================================
SUMMARY: Online Disk Suite, mirroring boot disk
===============================================================================

dd, ufsdump
-----------

- dd out of cron (daily, weekly or monthly): [ODS] does work and works well but
in cases with identical disks I prefer dd out of cron. As the [secondary] disk
is not active you don't have the overhead of mirroring. Just remember to check
the machine boots of the alternate as part of testing. This is set in NVRAM.

- Successful boot of a 1000E from a mirrored boot disk but not when one disk had
failed. Problems experienced with DiskSuite mirroring when a disk had died; it
could not detach the failed half of mirror, it said it wanted to resynch before
it could detatch, useless!

- dd: This is good. Simple is good. Downtime when boot disk dies and whoever is
handy has to type 'boot disk1' = 10 minutes. Downtime when the boot disk dies
and no-one nearby can remember how to boot from the mirror = lots. On the old
ODS on SunOS 4 I once spent over a day trying to get a critical system back up
after a software snafu got ODS confused. The old ODS was much more complicated
with mirrored boot disks, but it's put me off mirroring system disks.

- In our shop we have done ODS mirroring and "poor man's" mirroring whereby we
setup a cron job to copy the O/S to an available alternate disk. I prefer the
poor man's mirror in our shop. What the ODS mirroring doesn't save you from are
the "fat fingered" mistakes where files and directories get deleted/destroyed.
Those problems have been more frequent in our shop than hardware failures. If
ODS suffers a hardware failure on one drive, you avoid a reboot (0 minutes); if
it suffers a fat finger, you restore from a backup tape (30 minutes?) [And in
our case, if you can get the tape library indexes mounted without a primary boot
disk, a catch-22.]

- I second the SUN rep's suggestion to dd the root disk. I've always either dd'd
the root disk over or ufsdumped it. Set up an alternate bootprom, and use
devalias to set an alias to boot backup. This works great. I have never had any
luck with the mirrored partitions, especially if the primary goes down. I
generally run the root disk backup 3 times a week but you could do it more. The
last machine that we had mirrored took us 3 hours to break a mirror and get it
booted.

- The problem mirroring the boot disk is that corruption was mirrored to the
second disk, so it couldn't boot either. The label was there though, so we were
able to boot off cdrom, restore the needed files to the mirror disk, etc. and
boot off of the mirror. Perhaps on an Ultra2, in the same scenario, you would be
able to recover more easily if a dd copy were not corrupted. You would perhaps
not gain performance from a mirrored root volume, etc., so the dd copy would be
a better way to go.

ODS mirror/booting from mirrored disk
-------------------------------------

- Mirroring via ODS is physical mirroring between 2 or 3 physical devices. Yes
it is indeed possible to boot off of one of the mirrors as the Sun Docs report -
we've done this. The identical copy is a bit-by-bit mapping of the original and
can be used if it goes out. Do not use ODS for swap since it is a waste of disk
accesses.

- Yes, this really works. You need to be aware that with a mirrored root disk,
you need to edit the /etc/system file to update the location of the new root
physical disk.

- booting from the alternate: Yes. Although when I did it it was a planned boot
from the alternate. I manually broke the mirrors so I could do a patch
installation and have a fall back in case things didn't work.

- Can you boot off of the mirror root, the answer is yes, I just did it day
before yesterday on a U/E 5000. We had a problem with an application, which in
turn led to corruption of the root filesystem, and we lost a couple of files.
When we booted, the primary root disk didn't want to boot, it reported a missing
label. I went to OBP and did a nvalias mirror <device> and boot mirror, and it
booted.

- Yes, but the failures were induced by my simply removing the original drive.
It worked great. I have never had to react to an actual failure, however, it
ought to continue working (no reboot necessary) after a disk failure if the
event that caused the failure does not also cause a system failure. Of course,
you may have to boot from the alternate root when it comes time to replace the
original because most originals are not hot swappable.

- Yes, you can boot from the alternate disk, in most cases. But keep in mind
that the failure can occur for various reasons. The easiest test is to power off
the primary disk. That's no problem. You can just boot off the secondary. (Here,
easiest is to explore setting nvram aliases so you don't ever have to type that
string in an emergency!) It will work. But if a controller fails or, also often,
a disk develops some scsi difficulties which can propogate to the other disk.
You will have to relable, anyway. Best to make sure you use separate controllers
both for the mirrors and also for the metadb's.

- Yes, we use it here. One thing to watch out for - ensure that each boot disk
has *3* copies of the metadb replicas; ie, supply metadb with the -c 3 option
when you create the metadb's. Otherwise you boot up in a very insane state in
the event of a failure; you have have to jump through some extra hoops (they're
explained in detail in the manual) to recover from a disk failure.

The only problem we had was on the IDE systems (Ultra 5's and 10's). The system
can't talk to a secondary disk after a primary failure. Solution: SCSI secondary
disk. You may also want to set an alias for the second disk, and add it to the
boot-device list in the PROM. Then, if the primary fails you can even boot
successfully from the second without having to remember the string:

        At the OK prompt
         
        1. set up aliases for the first and second disk using
           devalias and nvstore
                e.g.: pri-disk and sec-disk
           HINT : the command show-disks makes this simpler

        2. run "setenv boot-device pri-disk sec-disk"
               This will cause the server to boot automaticaly off
               the second device if the first fails.

- A one-time item you will need to do is to installboot on the secondary disk;
else you won't be able to boot of the disk. (Note: In a crunch, you can boot of
the CDROM and then run installboot, but you're better off doing it beforehand.)
[Note: We do have a bootblock on the secondary.]

##############################
Original posting:

> We have as our main File Server an Ultra 2, with two identical internal disks
> (t0, t1). We have recovered from a major boot disk failure and currently have
> the primary boot disk (t0) duplicated on the secondary internal (t1) via
> ufsdumps.
>
> We have a plan to complete the setup of mirroring on the secondary internal
> using Online Disk Suite (ODS). Before actually setting this up I would like to
> verify from any other Admins if the mirroring through ODS is actual physical
> mirroring; if the mirrored disk can in truth be used as a boot disk as stated
> in the ODS doc. We'd just hate to find difficulty with this when we need it
> the most!
>
> In summary from the doc:
>
> - record the alternate root device path for booting off the
> alternate device should the primary device fail:
>
> # ls -l /dev/rdsk/c0t0d0s0
> lrwxrwxrwx 1 root other 55 Feb 27 09:37 /dev/rdsk/c0t0d0s0 ->
> ../../devices/sbus@1f,0/SUNW,fas@e,8800000/sd@0,0:a,raw
>
> - to boot from the alternate device:
> > boot sbus@1f,0/SUNW,fas@e,8800000/sd@0,0:a
>
> Has anyone successfully booted from the alternate mirrored root device after
> failure of the primary?
>
> Also, considering what a Sun rep had to say about mirroring the boot disk,
> simple is perhaps best - in which case I am reviewing dd as run daily from a
> cron job.
>
> Opinions and alternative ideas are welcome. I will summarize.
>
> TIA,
>
> Kitty
>
> --Kitty Ferguson System Administrator - CSMT
> ferguson@hao.ucar.edu NCAR - High Altitude Observatory
> tel: (303)497-1556 P.O. Box 3000
> fax: (303)497-1589 Boulder, CO 80307-3000



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:18 CDT