SUMMARY AND FOLLOW-UP QUESTION: Problems with metadb's out of order

From: Marc Sheldon <Marc.Sheldon_at_Quaack.com>
Date: Mon Mar 10 2003 - 19:38:41 EST
Hi,

I received numerous responses.  Unfortunately, none of them seem to fix
the problem.

The general view was that the system should work without a problem if
the new system is identical to the old (which it is).  If there would be
any problems, they should have been solvable by copying md.cf to md.tab
(md.cf retains the last running configuration).  This is in fact what I
did with the quoted md.tab line, unfortunately without success.

One suggestion focused on the output of my metadb, which is:

        flags           first blk       block count
     a m  pc luo        16              8192
/dev/dsk/c0t9d0s7
     a    pc luo        16              8192
/dev/dsk/c0t10d0s7
     a    pc luo        16              8192
/dev/dsk/c0t11d0s7
     a    pc luo        16              8192
/dev/dsk/c0t12d0s7
     a    pc luo        16              8192
/dev/dsk/c0t13d0s7
     a    pc luo        16              8192
/dev/dsk/c0t14d0s7
     a    pc luo        16              1034            /dev/dsk/

(yes, the last metadb did NOT show a disk!).  Now here might be the crux
of the problem.  When the system was originally created, it had a metadb
on the internal disk.  What got me was the difference in size of the
block count between the metadb's on the external disk and the metadb on
the internal disk (which is also constrained by the size of the slice
being 1034 blocks).  It seems someone was being smart and hooked up the
external disks to an old Ultra Enterprise 2 we have with Solaris 9, ran
through the necessary steps to create the metadbs and metainit to hook
up the disks and did not meet with success.  I checked and the default
size of the metadb's on Disksuite as distributed with Solaris 9 is 8192
blocks vs. 1034 blocks as installed on Solaris 8.

If this is true and we have the metadb's on the same slices as the data
I guess it is possible that the newly created metadb's overwrote some
data.  This still does not explain why the system will at least create
the raid device (metadb claims that the databases are on the system and
workable but metainit does not like them as per the post).

To ensure I did not miss anything I wrote a script that went through
every possible permutation of disks in the metainit commands.  Also, no
luck - the system always responded with "... devices were not RAIDed
previously ...".

The data on the disks is unfortunately irreplacable and we would do a
lot to get the data back.  Are there any further ideas that anyone may
have ?

	Cheers,
		Marc

PS: Many thanks already to:

	Luca Pizzinato [mail@pizzinato.it]
	Eric van de Meerakker [eric@rhodix.nl]
	Kumar [ccaqsk@hestia.herts.ac.uk]

The original post was:
[ ... ]
Hi,

We run an E220R (Operating Environment: SunOS XXXX 5.8 Generic_108528-18
sun4u sparc SUNW,Ultra-60) with six external SCSI disks in a RAID5 setup
using Disksuite.  This system was intended as a stopgap (are they not
always ?) and therefore all disks are on the same controller and the
metadb's are on the same slice as the data.  For numerous other reasons
to embarrassing to go into there is no current backup and massive
amounts of data on the disks themselves that is not replicated
elsewhere.

We had a lightning strike effectively frying the power distribution
board and one of the power supplies (don't ask ...) but luckily not
damaging either the external or the internal disk(s).  Once we moved the
internal disk and reattached the external disks to another E220R (the
only difference here is two processors instead of one) we could get the
system back up but not the RAID5 setup.

Even more embarrassing than the other issues is the fact that it seems
we have no current copy of md.tab and metainit -k provides this
response:

metainit: XXXX: /etc/lvm/md.tab line 4: d0: devices were not RAIDed
previously or are specified in the wrong order

We know which disks had metadb's on them (md.tab contains: d0 -r
c1t9d0s7 c1t10d0s7 c1t11d0s7 c1t12d0s7 c1t13d0s7 c0t0d0s6 c1t14d0s7 -k
-i 32b) but do not know the appropriate order for them.

It is critical that we retain the data on these disks and get access to
it quickly.

I checked the archives but could not find any comparable issues
(probably no one was this stupid before) and/or solutions.

Any ideas what we can do ?  I will summarize.

	Cheers,
		Marc
[ ... ]
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Mar 10 19:46:37 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:04 EST