SUMMARY: rm6 lost LUNs

From: Patricio Mora <pmora_at_cgob.junta-andalucia.es>
Date: Thu Oct 11 2001 - 04:41:36 EDT
Thanks to my anonymous Sun Engineer, Hans Engren, Sajeev George, Mike DeMarco, Tony Walsh, Mike Salehi.

--------------------------------------------------------------
Solution:
--------------------------------------------------------------

Entries lost in /kernel/drv/sd.conf to add support for more than one lun (A1000's scsi id = 0):

# BEGIN RAID Manger additional lun entries
# DO NOT EDIT from BEGIN above to END below...
name="sd" class="scsi"
        target=0 lun=1;
name="sd" class="scsi"
        target=0 lun=2;
name="sd" class="scsi"
        target=0 lun=3;
name="sd" class="scsi"
        target=0 lun=4;
name="sd" class="scsi"
        target=0 lun=5;
name="sd" class="scsi"
        target=0 lun=6;
name="sd" class="scsi"
        target=0 lun=7;
# END RAID Manger additional lun entries

Installation process of raid manager is supposed to add these, but they were not.

We add mirrors to the A1000 volumes and took a backup (they weren't fortunatelly of any use), modified sd.conf and booted -r, previously deleting the devices links to ...rdnexus@5/rdriver... in /dev/(r)dsk (normal boot + drvconfig + disks left all things the same). Then in format appeared a new disk SUN18G (not the expected 51G lun created). Deleted lun 1 with raidutil, and recreated with rm6. All things right now.


--------------------------------------------------------------
Your ideas:
--------------------------------------------------------------

Were these disks brand new, or were they in use in any other A1000/A3x00 controller previously? If they were, I take for granted that you plugged these four disks in during a short period of time.

The minimum delay between each insertion of a hotadd disk is 60 seconds. If you add them faster than that, the controller might go "argh" and fuck things up for you, which seems to be what has happened.

I hope you have functional backups. If not, it's probably a good idea to take it RIGHT NOW and place it on another system, redo the entire LUN configuration from scratch, and restore the data. Whatever you do, DO NOT REBOOT or chances are that your data is lost.

--------------------------------------------------------------

When you reboot the machine, you should be aware of the fact that all LUN's on this unit might have disappeared, rather they probably have. This means that you should take a note of the size of all the file systems located on this system, as you might be in need of redoing them. Also, keep a record of how the LUN's looked before you added these disks. As it's just an A1000, the amount of disks are not excessive, which means you will have the easy way out of this, ofcourse if you have decent backups.

My recommendation to you would be to remove every drive in this unit, low level format them in another NON RAIDED unit (ie single D1000 or just as a single drive in a machine. After that, I'd put them back into the A1000, and rebuild the LUN's from scratch. That way, you will have cleaned up all
the private regions and configuration on this A1000 which was broken by the introduction of four previously used drives with a private region on them. This is probably what SUN would recommend in the end anyway, as you haven't lost your data.

A clean installation would solve your problem, and if you have the good backups, or in your case - functional mirrors (that would work if you unplug this A1000) - it wouldn't be much work to do it. It'll take a couple of reboots at first though, to clean out the device tree and get it to understand the new drives.

This problem you have gotten into is unfortunally a common mistake that people do with these units, because they aren't working as one might first think. Observe, I'm not saying these units are bad. They are excellent. They kick some serious butt when it comes to performance. They are just sensetive beasts, and you really need to be up to date with the documentation, and preferably know it by heart. Of course, you do this mistake only once, and actually do this. I did. ;-)

I wish you the best of luck with this, again. You seem very fortunate to not have rebooted this machine yet, and actually still have the data valid and accessable. That makes things so much easier, as you won't be in desperate need of having data restored from the confused unit.

--------------------------------------------------------------

You might need to restart the machine with -r option. Before doing so, run genscsiconf and hot_add executables, available in /usr/lib/osa/bin directory. This should solve your problem. Also verify whether patch 106552-03 is loaded.

--------------------------------------------------------------

I have had this problem also, Its not just the LUNS that can not be seen but if you look at the controllers there is no information on them as well. Sun was no help (suggesting patches and such that did not correct the problem. Most of the time the only way to have it show you the LUNS again is with a reboot.

--------------------------------------------------------------

The process is not actually ended until the LUN is 'optimal'. With RM6.1.1 the formatting process takes quite some time to complete and the bigger the drives the longer it takes to respond. Rm6.1.1 also does not refresh the GUI information at all during the format process. If you were using RM6.22 (to which I also strongly advise you to migrate), you would see much cleaner responses to your format and a much quicker arrival at an optimal state. If you were to use the command line programs, raidutil is not really the command you need, drivutil should be used to more accurately display the state of the drives and LUNs.

--------------------------------------------------------------

While you can, take a backup and upgrade rm6 to  6.22+ also install the latest recommended patches.

--------------------------------------------------------------


--------------------------------------------------------------
Original message:
--------------------------------------------------------------

Initial configuration:
    Solaris 2.6
    A1000
    raid manager 6.1.1
    4 disks -> 1 RAID5 lun
Facts:
    4 new disks 'hot' added.
   Created one RAID5 lun with the 4 new disks. The process seemed to correctly end with the message 'formatting...'
Problems:
    On the refresh (or restart) of the rm6 window, can't see neither of the luns.
    The old one is still mounted and in use (VM) without problems or messages.
    Can't find raidutil man page.

    Just submitted a query to Sun, but in the meanwhile i'm a little nervous...
Received on Thu Oct 11 09:41:36 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:33 EDT