Summary - Disaster Recovery

From: Koonz, Jay (jKoonz@USCO.com)
Date: Mon Nov 06 2000 - 13:54:42 CST


Thanks to everyone for all the response. There are a bunch of interesting
options but I'm not sure which one I'll try yet. I couldn't even begin to
summarize the replys into a couple lines so here they all are.

============================================================================
Original Question
=============
Coming from the mainframe world, I had the capability to create a
stand-alone backup tape that contains a bootable program followed by a
backup of my system. I could boot from the tape, and it would restore my
system. We could use this when we went on disaster tests.
We're running Solstice Backup on out Solaris 7 boxes and I don't see an
equivalent function.
The Sun procedure seems to be (1) install Solaris on the box at the disaster
site (2) install the Solstice Backup software on the box at the disaster
site (3) restore the client. I can't believe this is the easiest/best way.
Also, this requires I restore my backup server also.
Question...Is there any way to create a backup tape that I can boot from and
restore my system at a Disaster Site ??
============================================================================
Reply - Nelson T Caparrosso
----------------------------------
I would think Legato Networker would have a similar thing although I am not
sure.
============================================================================
Reply - Alex Shepard
--------------------
There's a chapter in "Unix Backup and Recovery" by W Curtis Preston
(published by OReilly) on Solaris Bare Metal Recovery that probably will
meet your needs. It doesn't include a bootable tape, but it's simpler (and
more effective) than installing the OS and then restoring on top of a live,
active filesystem.
============================================================================
Reply - Joe Fletcher
-------------------------
Theoretically ufsdump/ufsrestore is the tool you are looking for.
SUN are a bit behind in this respect cf DECpaq's btcreate on Tru64, HP's
make_recovery and IBM's whatever it is.
============================================================================
Reply - Reggie Stuart
---------------------------
I heard a myth about bootable Solaris tapes about five years ago. PLEASE
post a summary if you find out otherwise, as your procedures for disaster
recovery for a Sun are my standard procedures.
If your installation is that important, you need to look at
clustering/redundancy. Because backups are mostly done over the network,
most servers don't have a local tape drive to boot from anyway. In the
past, I have setup backup servers with much extra disk space so as to be
able to restore a server's worth of data, and thing reconfigure nfs/nis+/etc
to point at the backup server until proper maintenance can be scheduled.
============================================================================
Reply - Thomas Wardman
----------------------
I've never seen any Sun capable of booting off of a tape. You could do a
couple things instead. You could make a Solaris bootable CD, that contained
the backup client, and you could restore the system from there.
Or, you could create a Jumpstart server at the disaster site, and if
something bad happened, simply boot the system using a automated install.
The automated install could include adding the backup client. Then you
could restore from there.
============================================================================
Reply - Darren Dunham
---------------------
>Coming from the mainframe world, I had the capability to create a
>stand-alone backup tape that contains a bootable program followed by a
>backup of my system. could boot from the tape,and it would restore my
>system. We could use this when we went on disaster tests.
>We're running Solstice Backup on out Solaris 7 boxes and I don't see
>an equivalent function.

There isn't one.

>The Sun procedure seems to be (1) install Solaris on the box at the
>disaster site (2) install the Solstice Backup software on the box at >the
disaster site (3) restore the client. I can't believe this is the
>easiest/best way. Also, this requires I restore my backup server also.

You are correct.

>Question...Is there any way to create a backup tape that I can boot >from
and restore my system at a Disaster Site ??

Not at this time. While it should be possible, no one has engineered a
bootable tape for Sun Solaris.

You may be able to construct something 'useful' on your own, but it is
not integrated into any commercial product that I'm aware of.

There are several sysadmin scripts that try to do this..
1) save a copy of the scripts to tape
2) save the disk configuration
3) save a ufsdump.

>From that, you could boot the machine from a cdrom. Grab the scripts
from the tape, then have the scripts read the configuration and redo the
disks and then restore from the ufsdump.

At that point, you should have a bootable system that has Solstice
Backup or another product on it ready to restore other filesystems.
============================================================================
Reply - blymn
-------------
>Question...Is there any way to create a backup tape that I can boot >from
and restore my system at a Disaster Site ??

Yes - don't use Solstice backup. Create a recovery tape just using
ufsdump which has all your partitions backed up on it. The procedure
then devolves to booting from CD, mount formatted/newfs'ed harddisk,
restore from tape (no OS install required), installing boot blocks,
reboot. I dislike a lot of backup software solutions because they
make things hard just when you don't need them to be - partially
rebuilding a server just to get access to your data is something you
should not need to deal with in a disaster situation. This leaves
aside the issues of trying to get a license to allow you to do the
restore (though, some products will allow you to restore without a
license)
============================================================================
Reply - Bret Hester
-------------------
This is quite intresting please summarise soon.

I am using Veritas Netbackup and the procedure is just as you have
decribed for Solstice Backup. The procedure we are in the middle
of implemently here for our Suns is to instead of tape booting.
We have a Jumpstart server (on the lost client we just do "boot net) which
sets up the correct partition slicing and installs the Networked backup
client software. Then we still have too do a client restore.
============================================================================
Reply - Ric Anderson
--------------------
Nope, that's it - just as DUMB as ADSM's idea of a bare metal recovery
in the IBM world.

What I do is make a ufsdump backup of the system device (/ in my case,
but some people put /var, /opt, /, /usr and who knows what else in
separate partitions, so you may need to back up more than just /) on
a single tape, e.g.
        for fs in / /usr /var /opt; do
        ufsdump 0f /dev/rmt/0cn $fs
        done
This creates a multifile ufsdump tape, from which you can restore / via
        mt -f /dev/rmt/0cn rew
        ufsrestore -rf /dev/rmt/0cn
or /var via
        mt -f /dev/rmt/0cn rew
        ufsrestore -rfs /dev/rmt/0cn 3
etc. Also make a prtvtoc of the system device (and all other disks) so
you have partitioning info for use at the disaster site (or to recover
after replacing a smoked disk).

With this tape and a Solaris install CD (no install required though) in
hand for your version of the OS, you
        1. stick the CD in
        2. type boot -sw at the ok prompt to get a single user shell
        3. run format, select the system disk, and the partition
           submenu. Use the prtvtoc output to help you partition
           the new disk like the old one, then do "label", followed by
           "yes" to write out the new partition table, and quit format.
        4. newfs the new partition(s).
        5. For discussion, presume c0t0d0 is the system disk, s0 is /.
           * mount /dev/dsk/c0t0d0s0 /mnt
           * cd /mnt
           * stick tape in drive, and do
           * ufsrestore -rf /dev/rmt/0cn
           * rm restoresymtable # ufsrestore scratch file
           * cd /
           * umount /mnt
           * installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
               /dev/rdsk/c0t0d0s0
Now you should have a working "/". If /var (or /opt, or /usr) are
on their own partitions, then you need to repeat portions of the
above (skip boot block, but do rewind before each restore, and
remember to use the "s" option).

Once you've done a couple dozen of these, its pure autopilot :-)
============================================================================
Reply - Geoff Lane
------------------
As I understand it only fixed block length tapedrives can be used to boot
from under SunOS. There are few of these available these days.
We are looking at other schemes. The following are some (site specific)
notes that I wrote some time ago. We haven't yet decided what to do.
----------------------------------------------------------------------
Using ufsdump/ufsrestore
------------------------
(Although this describes Solaris based procedures, I would expect that very
similar steps and programs could be used on any Unix-like operating system.)
It's possible to use ufsdump to create a disaster recovery dump by
ufsdump'ing to a remote machine (in the case of vxfs you can use vxdump.) On
most Solaris systems this will end up requiring about 300Mbytes of remote
filestore for the full dump of /, /etc, /usr, /var (assuming var is not full
of email or logs.)
In theory you need to do this in single user mode, but ufsdump doesn't
enforce that restriction.
Assuming a total loss of a root disk, to recover the system you need to...
        1. boot from CDROM into maintainence/single user mode
        2. recreate the disk partitions if necessary (which you better
have records of, in the case of Solaris the explorer output contains
           the information)
        3. newfs the disk partitions just created.
        4. ufsrestore from the dumps.
        5. reboot the machine from disk.
So, for the 20 odd Solaris machines we would need about
        20 x 300M + 2G = 8G
of remote disk storage for the level 0 dumps (the 2G is for Irwell that has
a lot of application software installed on the root disk :-() You will need
about 20% more for the incrementals if needed (but if the total dump is only
about 300Mbytes then there's no real point in taking incrementals, it just
lengthens the time required to perform the restore.)
Ideally, one should have one "initial" dump taken when the system was first
installed and a "current" dump taken once a week or so. This allows for the
possibility that you may wish to determine exactly what has changed since a
known good system was running (if for example you suspect you have been
hacked.) This would double the total storage space required to 16G.
Step 3 is complicated if you have VXFS as you need to get the support s/w
online somehow - no idea how. However, one rarely does any more than
"encapsulate" the root drives in which case simple ufsdumps are fine (though
the final steps to total recovery are more complicated; /etc/vfstab will
need editing and the root filesystems will need re-encapulating.)
Step 4 is the interesting one; traditionally you do this with a locally
attached tape drive. We would like to use a remote fileserver which would
require network services to be running after booting from CDROM.
In the case of Solaris this is done by...
        1. Boot the system from CDROM into single user mode
                ok> boot cdrom -s
        2. Initiallise networking. You need some information for this
           that is best collected while the system is running.
                The name of the network interface hardware (hme0 etc)
                The IP number for the machine
                The Netmask and broadcast address
                The default route
           The values used here are examples, they will differ per machine.
                # ifconfig hme0
   Should return information on hme0 if it is configured in; if not
           then try
                # ifconfig hme0 plumb
           which should get the interface operational.

           Now, configure the network interface
                # ifconfig hme0 192.11.111.1 netmask 255.255.255.0 \
                        broadcast 192.11.111.255 up
           Check the interface state
                # ifconfig hme0
hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
        inet 192.11.111.1 netmask ffffff00 broadcast 192.11.111.255
        3. Set a default route
                # route -n add default 192.11.111.250

You should now have a working network interface. You could go on to set up
DNS etc but it's probably not worth the bother. The OS at this point can use
telnet, ftp etc with explicit IP number addresses and you can pull the dumps
from a network attached fileserver.

Alternatively we can write the dumps to CDROM. The problem doing this is
that with active systems is the dumps will quickly become out of date. It
does eliminate the necessity to initiallise networking.
Another possibility is to get a cheap SCSI disk array and just plug it into
the system to be recovered -- there's no problem doing this as the system is
already down. That way the dump data is local to the machine and no fancy
network footwork is needed. Normally, the disk array would be plugged into
the dump server and kept up to date. The downside of this method is that
you would be disturbing the SCSI cables, never a wise move. Of course, if
money were no object a Fibre channel SAN would do the same job without any
manual plugging needed.
Step 5 should be simple but there are some possible problems. The selected
new boot drive may not be bootable. This can be fixed by the ??? command.
Because the recovery process has not used the standard OS install procedures
the EEPROM will not have been updated to take account of any change in the
boot drive location (if it differs at all.) This requires that the boot
disk alias in the EEPROM be updated.
Most of this is theoretical. We'll have to try various procedures out and
see what works.
============================================================================
Reply - Stuart Whitby
---------------------
HP and DEC have similar functions (maybe just installs the core OS
image from tape - I haven't had to use it and I'm not sure), and
you can use a Jumpstart server to get things back up to date
quickly, minus your data.

The only way I know of to do this in Solaris is to use the bare-
metal recovery stuff that we make. I haven't had to use that
either, and on my first (and only) look at the functionality, it
appeared to be pretty limited in its hardware scope. I don't know
if there's any equivalent software from any other companies, or if
the functionality of our own software has improved since I looked -
around the beginning of the year.
============================================================================
Reply - Gary Litwin
-------------------
I always just made a ufsdump tape for each OS filesystem (including wherever
your solstice backup lives).
In the event of a disaster, boot from the cdrom, partition and replace the
bad disk, ufsrestore the filesystems it originally contained, now you are up
and configured as of the data you took the last ufsdump.
Now you can just restore the same filesystems via solstice, replacing all
the changed files, and you are back in business.
This process has saved me several times.
It is a really good idea to have a backup tape of /nsr on your backups
server as well, in case you lose that disk in an emergency. (I used to
ufsdump it to a hot spare disk once a week...)
============================================================================
Reply - Seth Rothenburg
-----------------------
The official procedure is just like the "fix root password" procedure...boot
from CD into single user mode.
Then, format/newfs/mount needed file systems (eg, on /a) and then restore
them. However, booting from CD rom is slow.
In our recent Disaster tests, we have been fortunate that we arrive at the
disaster site and find the system up and running off some disk c0t0d0s0, and
there are 6 disk drives in the system, so we can start in on the restore
without CD. We actually wrote a script and soon we hope to change our backup
to put files needed to start the restore on their own partition....Here is 2
examples.....

rothen@testdg[/home/mangala]:> more restore_prod restore_dg | cat
::::::::::::::
restore_prod
::::::::::::::
#!/bin/sh

#set up paths for commands

TEE=/bin/tee
UFSRESTORE=/usr/lib/fs/ufs/ufsrestore
DATE=/usr/bin/date
REWIND=/dev/rmt/0
NOREWIND=/dev/rmt/0n
MT=/usr/bin/mt
RESTORELOG=/tmp/restore_log
$DATE > $RESTORELOG

echo Please monitor console for all error messages...

# Rewind tape
$MT -f /dev/rmt/0 rewind 2>&1

# Start restore.

######### RESTORE root #############################################
newfs /dev/rdsk/c0t2d0s0
mount /dev/dsk/c0t2d0s0 /a
cd /a
$UFSRESTORE rvf $NOREWIND 2>&1|$TEE -a $RESTORELOG
installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk dev/rdsk/c0t2d0s0
########## END RESTORE root ########################################

######### RESTORE /usr #############################################
newfs /dev/rdsk/c0t3d0s5
mount /dev/dsk/c0t3d0s5 /a/usr
cd /a/usr
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /usr ########################################

######### RESTORE /var #############################################
newfs /dev/rdsk/c0t3d0s0
mount /dev/dsk/c0t3d0s0 /a/var
cd /a/var
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /var ########################################

########## RESTORE /opt ########################################
newfs /dev/rdsk/c0t2d0s6
mount /dev/dsk/c0t2d0s6 /a/opt
cd /a/opt
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /opt ########################################

########## RESTORE /opt/gnu ########################################
newfs /dev/rdsk/c0t2d0s7
mount /dev/dsk/c0t2d0s7 /a/opt/gnu
cd /a/opt/gnu
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /opt/gnu ########################################

########## RESTORE /home2 ########################################
newfs /dev/rdsk/c0t2d0s5
mount /dev/dsk/c0t2d0s5 /a/home2
cd /a/home2
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /home ########################################

########## RESTORE /loglu ########################################
newfs /dev/rdsk/c0t3d0s1
mount /dev/dsk/c0t3d0s1 /a/loglu
cd /a/loglu
$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG
########## END RESTORE /loglu ########################################
        echo " " |$TEE -a
$RESTORELOG
        echo "restore of System Disks Completed." |$TEE -a
$RESTORELOG
        echo " " |$TEE -a
$RESTORELOG
$MT -f /dev/rmt/0 rewind 2>&1|$TEE -a $RESTORELOG
$DATE >>$RESTORELOG
::::::::::::::
restore_dg - for restoing the data partition
::::::::::::::
#!/bin/sh

 

#set up paths for commands

 

TEE=/bin/tee
UFSRESTORE=/usr/lib/fs/ufs/ufsrestore

DATE=/usr/bin/date

REWIND=/dev/rmt/0

NOREWIND=/dev/rmt/0n

MT=/usr/bin/mt

RESTORELOG=/tmp/restore_log

$DATE > $RESTORELOG

 

echo Please monitor console for all error messages...

 

# Rewind tape

$MT -f /dev/rmt/0 rewind 2>&1

newfs -m 1 /dev/md/ssa1/rdsk/d56

newfs -m 1 /dev/md/ssa1/rdsk/d45

newfs -m 1 /dev/md/ssa1/rdsk/d89

# Start restore.

######### RESTORE /dg #############################################

mount /dev/md/ssa1/rdsk/d56 /dg

cd /dg

$UFSRESTORE rvf $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg ########################################

 

######### RESTORE /dg/dghome/log
#############################################
mkdir /dg/dghome/log

mount /dev/md/ssa1/dsk/d45 /dg/dghome/log

cd /dg/dghome/log

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg/dghome/log
########################################
 

######### RESTORE /dg/dghome/queue
#############################################
mkdir /dg/dghome/queue

mount /dev/md/ssa1/dsk/d89 /dg/dghome/queue

cd /dg/dghome/queue

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg/dghome/queue
########################################
 

        echo " " |$TEE -a
$RESTORELOG
        echo "restore of System Disks Completed." |$TEE -a
$RESTORELOG
        echo " " |$TEE -a
$RESTORELOG
$MT -f /dev/rmt/0 rewind 2>&1|$TEE -a $RESTORELOG

$DATE >>$RESTORELOG

rothen@testdg[/home/mangala]:>
============================================================================

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:21 CDT