[Summary] SunRay server failure

From: Chris Hoogendyk <choogend_at_library.umass.edu>
Date: Mon Mar 08 2004 - 17:29:53 EST
Original message at end.

Bottom line. The combination:

  SunRay Server Software 1.3
  Solaris 8 Release 10/00
  Solaris 8 Patch 109077 (patches included in recommended cluster)
  Solaris 8 Patch 111302

Is not compatible. Won't work. Will fail. SunRay Server Software 1.3 
apparently requires Solaris 8 Release 4/01 or later and preferably 
Release 7/01 or later (although I was running 10/00). Patch 109077, 
updates dhcpd and it's configuration, on which SunRay Server Software is 
dependent. This Patch precipitated the failure. Furthermore, this patch 
has a bunch of dependencies, and the instructions recommend that you NOT 
try to uninstall it. So, I seemed to be basically stuck as far as simple 
solutions were concerned.

I tried doing an upgrade install of Solaris 8 Release 2/02. Freakishly, 
I had a disk drive failure during the install. So, find an unused disk 
drive, partitian it to match, go to backup tapes for recovery, and punt 
on the upgrade for now. Unfortunately, when I went back to the January 
full backup, I found it had the same failure. On checking the patches, I 
found it had an earlier version of 109077. On checking my records for 
patching, rebooting, and backups, I found I had not rebooted since 
before that patch cluster. If I had, the system would have failed back 
then. So, go back even further on my full backup tapes and recover 
again. This worked, but then I had a couple of months of fixes and 
changes on that server that I had to repeat. Fortunately, it wasn't too 
much.

Anyway, I'm back up and running, and next time there is a break on 
campus and I can schedule some official down time, I'll try the upgrade 
to Solaris 8 Release 2/02 and SunRay Server Software 2.0 (that 
combination works).


--------------- Bloody Details for those who care ---------------


After having gone through this "bare metal" recovery, I now have some 
changes I will make in my backup procedures. More on that after these 
details.

Since this server had no tape drive, I do my backups to a tape drive on 
another server. So that added to my difficulties a little. I had to go 
through:


  reboot/shutdown/init and then "stop-a" to get to ok prompt

  insert CD 1 of 2 of Solaris 8 software

  ifconfig hme0 129.117.162.215 netmask 255.255.255.0 broadcast 
129.117.162.254 up

  ping 129.117.162.133


Since I'm booted from CD, I don't have my user accounts and profiles, so 
I have to get the machine at 133 to let me in as root. I have rshd on 
that machine already open and covered by tcp_wrappers to allow in only 
my server that have no tape drives. Now I had to add a /.rhosts file for 
root. It had to have DNS names for reverse lookup. When I started by 
trying just the IP address, it didn't work.


  129.117.162.215 +
  sunrayserver +
  sunrayserver.mydomain.edu +


Then I'm all set to do my recovery from the other server's tape drive.


  newfs /dev/rdsk/c0t0d0s0

  mount /dev/dsk/c0t0d0s0 /a

  cd /a

  ufsrestore rvf 129.117.162.133:/dev/rmt/0n

  ls

  cd ..

  umount /a


repeat above for each partitian required and on tape in sequence. Then do a:


  installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
    /dev/rdsk/c0t0d0s0


The documentation said to do "pboot", but I found that in that directory 
the only file was "bootblk". When I rebooted, it worked. I did a "uname 
-i" to see what it returned (SUNW,Ultra-4) and looked down through the 
directories.



--------------- Changes to my procedures ---------------


I use a script to generate an informational file that I call a label and 
then write it out as the first item on the tape when I do backups. Thus, 
when I pick up a tape, I can pull off that first file with an 
interactive ufsrestore and see what I put on the tape and what the 
machine it came from was like.

My label looks like:


<Label>
Amen-ra-02Dec2003-t1
Tue Dec  2 09:36:17 EST 2003
Library Information Systems & Technology Services
W.E.B. Du Bois Library
University of Massachusetts
(413) 545-0074

------------

Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0t0d0s0    15344171  539601 14651129     4%    /
/proc                      0       0       0     0%    /proc
fd                         0       0       0     0%    /dev/fd
mnttab                     0       0       0     0%    /etc/mnttab
/dev/dsk/c0t2d0s3    15346527 1063952 14129110     8%    /var
swap                 4511248      16 4511232     1%    /var/run
swap                 4614696  103464 4511232     3%    /tmp
/dev/dsk/c0t0d0s5    1018191  230495  726605    25%    /opt
/dev/dsk/c0t3d0s7    11214644 4813822 6288676    44%    /usr/local
/dev/dsk/c0t3d0s6    4131866 1719578 2370970    43%    /export/home
/dev/dsk/c0t0d0s1    1018191  247358  709742    26%    /usr/openwin
/proc                      0       0       0     0% 
/var/opt/SUNWbb/root/proclocalhost:(cifsBrowse)browser      10      10 
      0   100%    /CIFS
/tmp/SUNWut/sessions 4614696  103464 4511232     3% 
/var/opt/SUNWbb/root/tmp/SUNWut/sessions
/tmp/SUNWut/units    4614696  103464 4511232     3% 
/var/opt/SUNWbb/root/tmp/SUNWut/units
</label>


I thought that would be more or less totally adequate. However, it was 
not as easy as it should have been to get the information I needed. I am 
changing my script to make this label more informative by including 
prtvtoc for each of the drives included in the backup, and a "cat" of 
/etc/vfstab and the backup script, as well as the "df -k" that I have 
been putting there. That will give me all the information I need to 
replace and repartition a failed drive as well as recovering from hacks 
or software failures when I have intact partitions to recover to.





---------------

Chris Hoogendyk

-
    O__  ---- Network Specialist & Unix Systems Administrator
   c/ /'_ --- Library Information Systems & Technology Services
  (*) \(*) -- W.E.B. Du Bois Library
~~~~~~~~~~ - University of Massachusetts, Amherst

<choogend@library.umass.edu>

---------------



-------- Original Message --------
Subject: SunRay server failure
Date: Mon, 01 Mar 2004 22:39:38 -0500
From: Chris Hoogendyk <choogend@library.umass.edu>
To: Sun Managers <sunmanagers@sunmanagers.org>

E450, Solaris 8, SunRay Server Software 1.3, 20 SunRay1's in Restricted
Access Mode.

Last Friday I did the latest Recommended and Security patches. Last done
a little over a month ago. This morning I rebooted the server around 7am.

Mid afternoon today, my SunRays started failing. First two were hung
waiting for DHCP. I tested by recycling my SunRay (logging out) before
going down to look. It started a new session just fine.

I used a fluke to test the connection for the failed SunRays and could
not get a DHCP. I went straight to the switch port with the fluke to
eliminate any intervening wiring questions. No DHCP. I thought perhaps
it was a problem with the switch vlans (CISCO).

This evening I got a call that all the SunRays were failing.

I connected to the server. Looking at the messages file from the SunRay
web admin interface, I found the following error sequence repeated over
and over for one SunRay or another:

Mar  1 17:17:22 amen-ra-01 utauthd: [ID 639584 user.info] Worker2
NOTICE: whichServer pseudo.080020c0c454:
Mar  1 17:17:22 amen-ra-01 utauthd: [ID 641787 user.info] Worker2
NOTICE: CLAIMED by StartSession.m3 NAME: pseudo.080020c0c454 PARAMETERS:
{_=1, rawId=080020c0c454, terminalIPA=192.168.128.61, startRes=1152x900,
state=disconnected, initState=0,
fw=1.3_12.c_111891-05,REV=2002.05.10.11.53,Boot:1.3;
1999.11.29-09:58:55-GMT, pn=34583, rawType=pseudo, sn=080020c0c454,
tokenSeq=1, event=insert, id=080020c0c454, cause=insert, hw=SunRayP1,
type=pseudo, namespace=IEEE802}
Mar  1 17:17:22 amen-ra-01 utauthd: [ID 388005 user.info] Worker2
NOTICE: CONNECT IEEE802.080020c0c454, pseudo.080020c0c454, all
connections allowed
Mar  1 17:17:22 amen-ra-01 utauthd: [ID 475121 user.info] Worker2
NOTICE: SESSION_OK pseudo.080020c0c454
Mar  1 17:52:49 amen-ra-01 utauthd: [ID 794400 user.info]
SessionManager0 NOTICE: EMPTY: ACTIVE session
Mar  1 17:52:49 amen-ra-01 utauthd: [ID 716730 user.info] Terminator
NOTICE: DISCONNECT IEEE802.080020c0c454, pseudo.080020c0c454 session
terminated
Mar  1 17:52:49 amen-ra-01 utauthd: [ID 190098 user.info] Terminator
NOTICE: DESTROY pseudo.080020c0c454 lifetime=2127277
Mar  1 17:52:49 amen-ra-01 utauthd: [ID 927710 user.info]
SessionManager0 NOTICE: TERMINATE: inactive session


followed by this:

Mar  1 18:55:25 amen-ra-01 utauthd: [ID 699394 user.info] Worker3
NOTICE: SESSION_OK pseudo.080020c0c454
Mar  1 19:26:34 [192.168.128.176.2.2]  0x0.0x42e1b9 8:0:20:f9:68:97
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:40 [192.168.128.174.2.2]  0x0.0x42ee63 8:0:20:c0:c5:ea
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:40 [192.168.128.61.2.2]  0x0.0x42ee60 8:0:20:c0:c4:54
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:40 [192.168.128.160.2.2]  0x0.0x42ee2e 8:0:20:b9:66:d7
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:40 [192.168.128.56.2.2]  0x0.0x42ee42 8:0:20:c1:c:44
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:41 [192.168.128.62.2.2]  0x0.0x42ee68 8:0:20:e7:b5:8c
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:42 [192.168.128.171.2.2]  0x0.0x42ef11 8:0:20:c0:bd:f2
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:43 [192.168.128.175.2.2]  0x0.0x42efda 8:0:20:f2:47:7a
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:45 [192.168.128.177.2.2]  0x0.0x42ee4b 8:0:20:f5:76:76
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:45 [192.168.128.177.2.2]  0x0.0x42ee4b 8:0:20:f5:76:76
Kernel: 0x665830-0x6658b7: 0x4a1c8 backtrace_me+0x24(...)
Mar  1 19:26:45 [192.168.128.177.2.2]  0x0.0x42ee4b 8:0:20:f5:76:76
Kernel: 0x6658b8-0x66592f: 0x493fc panic+0x4c(...)
Mar  1 19:26:45 [192.168.128.177.2.2]  0x0.0x42ee4b 8:0:20:f5:76:76
Kernel: 0x665930-0x6659ef: 0x55334 AutoRenewDHCP+0x18c(...)
Mar  1 19:26:45 [192.168.128.177.2.2]  0x0.0x42ee4b 8:0:20:f5:76:76
Kernel: Top: 0x44cc4 proc_spawn_pid+0x3cc(...)
Mar  1 19:26:46 [192.168.128.179.2.2]  0x0.0x42f0de 8:0:20:f0:fd:60
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:46 [192.168.128.68.2.2]  0x0.0x42ee7d 8:0:20:f5:73:4
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:47 [192.168.128.33.2.2]  0x0.0x42ee82 8:0:20:f9:69:aa
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:50 [192.168.128.162.2.2]  0x0.0x42f238 8:0:20:b6:1:69
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:26:57 [192.168.128.69.2.2]  0x0.0x42f253 0:3:ba:d:99:f4
Kernel: panic:  AutoRenewDHCP:  IPA lease expired -- must restart
Mar  1 19:30:16 amen-ra-01 utauthd: [ID 607465 user.info] Worker3
UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection
reset by peer
Mar  1 19:30:16 amen-ra-01 utauthd: [ID 181342 user.info] Worker3
NOTICE: DISCONNECT IEEE802.080020f96897, pseudo.080020f96897 destroy
Mar  1 19:30:16 amen-ra-01 utauthd: [ID 791169 user.info] Worker3
UNEXPECTED: during send to: java.net.SocketOutputStream@1bca4f
error=java.io.IOException: Broken pipe
Mar  1 19:30:16 amen-ra-01 utauthd: [ID 151315 user.info] Worker3
NOTICE: DESTROY pseudo.080020f96897 lifetime=43693338
Mar  1 19:30:24 amen-ra-01 utauthd: [ID 607465 user.info] Worker3
UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection
reset by peer
Mar  1 19:30:24 amen-ra-01 utauthd: [ID 667050 user.info] Worker3
NOTICE: DISCONNECT IEEE802.080020c0c454, pseudo.080020c0c454 destroy
Mar  1 19:30:24 amen-ra-01 utauthd: [ID 118975 user.info] Worker3
UNEXPECTED: during send to: java.net.SocketOutputStream@11bee50
error=java.io.IOException: Broken pipe
Mar  1 19:30:24 amen-ra-01 utauthd: [ID 669981 user.info] Worker3
NOTICE: DESTROY pseudo.080020c0c454 lifetime=2099658
Mar  1 19:30:28 amen-ra-01 utauthd: [ID 607465 user.info] Worker3
UNEXPECTED: Terminal.readMesages: java.net.SocketException: Connection
reset by peer

Rebooting the server accompolished nothing.

 From the web admin interface for the SunRay Server Software, restarting
the service gave the following in the messages file:

Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] Restarting
SunRay services
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] stopping
authentication manager
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting
session manager
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting
device manager
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting
printer service
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting
serial service
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] # Using local
policy
Mar  1 22:11:07 amen-ra-01 UTPOLICY: [ID 702911 user.info] starting
authentication manager
Mar  1 22:11:07 amen-ra-01 utauthd: [ID 396523 user.info] main NOTICE:
SmartCardConfigData: LDAP contains no smartcard configuration files
Mar  1 22:11:07 amen-ra-01 utauthd: [ID 253120 user.info] main NOTICE:
SmartCardConfigData: read 17 smartcard configuration files from
directory file: /etc/opt/SUNWut/smartcard/probe_order.conf
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 353254 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/Payflex-All.cfg: 237
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 762100 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/MondexMM2.cfg: 89 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 192469 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/JavaBadge.cfg: 144 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 582636 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/OpenPlatform.cfg: 144
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 462772 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/CyberflexAccess.cfg: 104
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 240283 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/ActivCardGold.cfg: 100
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 783214 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-MPCOS.cfg: 145
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 522253 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-MPCOS-3DES.cfg:
124 tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 658487 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/GEMPLUS-GPK4000.cfg: 138
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 293412 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/PKCS15.cfg: 106 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 313178 user.info] main NOTICE:
SmartCardConfigData:
/etc/opt/SUNWut/smartcard/SpanishUniversity-TIBC.cfg: 98 tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 486291 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/GD-SMARTCAFE.cfg: 74
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 858185 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/GD-STARCOS.cfg: 74 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 884807 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/BullTB.cfg: 114 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 163784 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/MondexUNU.cfg: 67 tokens
processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 524863 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/Cryptoflex.cfg: 144
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 651628 user.info] main NOTICE:
SmartCardConfigData: /etc/opt/SUNWut/smartcard/UnknownCard.cfg: 63
tokens processed
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 723974 user.info] main NOTICE:
Loaded module /opt/SUNWut/lib/modules/StartSession.m0
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 612231 user.info] main NOTICE:
Loaded module /opt/SUNWut/lib/modules/Authxlation.m1
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 709793 user.info] main NOTICE:
Loaded module /opt/SUNWut/lib/modules/ServerSelect.m2
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 723977 user.info] main NOTICE:
Loaded module /opt/SUNWut/lib/modules/StartSession.m3
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 723978 user.info] main NOTICE:
Loaded module /opt/SUNWut/lib/modules/StartSession.m4
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 745985 user.info] main NOTICE: 5
authentication modules loaded.
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 826448 user.info] deviceManager0
NOTICE: DeviceManager.getDeviceManager: Initiate callback to utdevMgrd
at localhost:7011
Mar  1 22:11:08 amen-ra-01 utauthd: [ID 914482 user.info] deviceManager0
NOTICE: DeviceManager.initiateCallback localhost:7010 established
communication
Mar  1 22:11:29 amen-ra-01 policy[1484]: [ID 702911 user.info] TIMEOUT!!!
Mar  1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info]
Mar  1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info]
amen-ra-01: Restarting servers... messages will be logged to
/var/opt/SUNWut/log/messages.
Mar  1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info]
amen-ra-01: ERROR: Service reset failed.  Host unreachable.
Mar  1 22:11:33 amen-ra-01 admincgi[5534]: [ID 702911 user.info]


I'm really at a loss and I have some critical service points down.

Any help would be greatly appreciated.

My server having paniced, myself having paniced, I'm now going to crash.
I'll look at this again and any replies at 7am EST.

TIA




---------------

Chris Hoogendyk

-
    O__  ---- Network Specialist & Unix Systems Administrator
   c/ /'_ --- Library Information Systems & Technology Services
  (*) \(*) -- W.E.B. Du Bois Library
~~~~~~~~~~ - University of Massachusetts, Amherst

<choogend@library.umass.edu>

---------------
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Mar 8 17:38:56 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:26 EST