SUMMARY: Lost access to Solaris 10 11/06 after renaming /usr/lib/libxml2.so.2

From: Doug Yatcilla <yatcilla_at_purdue.edu>
Date: Mon Jun 25 2007 - 11:01:46 EDT
There were 11 replies (most of them on a Friday afternoon; thanks!)

Even though, I had lost the ability to log in as root or any
other user into the global zone, a few suggested renaming the file
from a NFS client with root access to /usr/lib (but I was not sharing
that filesystem) or getting root access via another program (such as
modifing a script run by root's crontab.)  But, the lack of libxml2.so
seemed to have also shut down cron (even through "svcs cron" reported it
was still online and neither /var/svc/log/system-cron:default.log nor
/var/adm/messages had errors.)

A few scolded me for renaming a system library and expecting things to
keep working.  I had no idea that libxml2.so was so vital for keeping
the system going or that the system would fail so ungracefully (no
errors or warning in log files.)  But, it is of course a point well
taken (I'll restrict troubleshooting software to a throw-away system in
the future.)  By the way, the application ran fine once I renamed the
Solaris supplied /usr/lib/libxml2.so file.  So, the next step is to
figure out how to properly compile the application to use
/opt/csw/lib/libxml2.so and stop touching /usrlib/libxml2.so, but that
is another story.

Most of the suggestions told me to reboot from Solaris CD/DVD media
(but no built in CD/DVD in a Sun X4500; could have tried to plug an
external CD/DVD drive with USB or a USB flash drive, I guess.)  Since
this is server runs the x86 edition of Solaris, it boots to the GRUB
menu and gives the option of a booting to a "failsafe kernel", which 
I presume is the same thing as booting from the Solaris CD/DVD. 

I did not need to use the failsafe kernel, though, because I could
just let the system boot up normally and I was able to log in without
problems.  Although some XML related services didn't start
(system/fmd & system/pools; zones failed to start, etc.) I was able to
rename the library, reboot, and everything was back to normal.  I
considered myself lucky.

I opened a support case with Sun about this problem and got a
surprising quick response.  The party line was to reboot from the CD
media.  They would not speculate as to why a missing libxml2.so caused
the problems or any other way to fix it.

Even though I did not use the failsafe kernel, I dread having to use
it.  Since I use Solaris Volume Manager to mirror the root filesystem,
the failsafe kernel complains that it cannot mount it on /a.  So, what
good is this for me?  If I cannot mount the root filesystem directly,
I would need to do all of this:

1.  Know ahead of time which disks and slices contain each part of the
    mirror containing the root partition.

2.  Choose one root filesystem slice and manually mount it on /a

3.  Change /a/etc/vfstab so that the root filesystem is mounted from 
    the slice mounted on /a instead of the metadevice

4.  Do whatever maintenance I need to do on the root filesystem.

5.  Reboot the system back to the normal kernel

6.  If the original problem is fixed, then I need to set up the root
    filesystem mirror again (using metaroot, etc.)

7.  Need to reboot again to get the root mirror back in place.

What a runaround!  Would be worth trying the procedure described in
Sun document 75210 "Solaris[TM] Volume Manager Software and Solstice
DiskSuite[TM] Software Mounting Metadevices" from May 2005 that
describes how to mount metadevices after booting from a Solaris 9 CD.

Thanks again to the generous folks who volunteered advice:

Gaziz Nugmanov <gaziz.nugmanov*AT*gmail#DOT#com>
Mike Salehi <mike.salehi%AT%inbox$DOT$com>
"James W. Abendschan" <jwa(AT)jammed!DOT!com>
Edward Scown <eascown3#AT#yahoo(DOT)com>
Brad Morrison <brad.morrison%AT%gmail*DOT*com>
Ric Anderson <ric@AT@Opus1*DOT*COM>
Rainer.Heilke!AT!atcoitek*DOT*com
Alan Pae <alanpae$AT$ilkda*DOT*com>
Paul Richards <p.richards$AT$ukonline*DOT*co(DOT)uk>
"Cole, William" <William.Cole%AT%gedas&DOT&com>
"Lineberger, Aaron" <alineberger#AT#ncdoc(DOT)navy&DOT&mil>

----- Original Message -----

Date: Fri, 22 Jun 2007 14:51:38 -0400
To: sunmanagers@sunmanagers.org
Subject: Lost access to Solaris 10 11/06 after renaming /usr/lib/libxml2.so.2

While troubleshooting an application software problem, I renamed the
/usr/lib/libxml2.so.2 in the global zone on a Sun X4500 running
Solaris 10 11/06.

This act, done in desperation, turned out to have a bad side effect.
The problem is that I cannot figure out how to rename it back:

$ sudo mv libxml2.so.2-dist libxml2.so.2
ld.so.1: sudo: fatal: libxml2.so.2: open failed: No such file or directory
ld.so.1: sudo: fatal: relocation error: file /usr/lib/libproject.so.1:
symbol pool_get_binding: referenced symbol not found
Killed

$ su
Password:
ld.so.1: su: fatal: libxml2.so.2: open failed: No such file or directory
ld.so.1: su: fatal: relocation error: file /usr/lib/libproject.so.1:
symbol pool_get_binding: referenced symbol not found
Killed

I tried logging in from the system console and that failed for
presumingly the same reason also.

I speculate the problem to be that the /usr/lib/libproject.so has a (now
unresolved) dependency on the libxml and libproject is needed to login.

I could have avoided this by having a root shell open somewhere, but I
did not do that.

I cannot log into the system as any user (including root.)  I am
surprised to see that the Solaris zones running on the same system are
not affected by this.  That is, I can log into them without problems
(even though /usr/lib is read-only mounted to the zones)

Is there any hope of getting root access to the system without a
reboot?

Will the system even work if I reboot?

I experimented on a Solaris 10 11/06 system running on a PC.  I
renamed the same libxml2.so.2 file and power cycled the system.
It started back up again (a few XML related services failed to start)
But, I was able to log in without problem as root and rename the library.

I thought maybe the /sbin/su.static would let me log in without needing
anyting in /usr/lib.  But, the /sbin/su.static is not setuid-root!
So, it does not work.  On the test system, I made it setuid-root and I
was able to log in as root even with the missing libxml2.so.  But, I
cannot do that on the production system since I cannot run any command
as root!  What is the point of the /sbin/su.static?

Thanks for any advice.  I will summarize any responses.

Thanks,
Doug
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers

----- End forwarded message -----
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jun 25 11:02:05 2007

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:06 EST