---------- X-Sun-Data-Type: text X-Sun-Data-Description: text X-Sun-Data-Name: text X-Sun-Content-Lines: 160 I asked (heavily trimmed): > We'll have a Power Outage, too long [for the UPS]. > One Server *has* to use Stuff from the other, so unless I manage to let > it boot a certain Time *after* the Power is back, it'll hang. The UPSes > don't offer such a Functionality. > -- Can I (and if so, how) use the PROM Monitor to introduce a Delay > between Powerup and automatic Boot? > -- If yes, can the Delay be varied? > In Case it matters: A Classic Clone and a SPARCstation 10, both running > 4.1.3_U1B. I don't have the PROM Revisions at Hand ... Short Version: 1) Yes, both can be done. In certain Situations, it HAS to be done. Instructions below. 2) Best Approach: Don't have your Servers depend onto each other. Note that the usual NFS-mount-from-each-other Deadlock can be broken, see Instructions below. 3) Next best Possibility: Have Boot delayed by introducing "sleep"s into /etc/rc* (SunOS). 4) If done at the appropriate Location, you can even loop until the primary Server's up; ping comes to Mind. 5) Another neat Trick: Have the primary Server be diag-switch?=false, selftest-#megs=1, while the secondary is diag-switch?=true. 6) Buy better UPSes next Time. Long Version: 0) General First off, I had Requests why I didn't specify what "Stuff" the Secondary needed. At the Time of Writing, that Server was booting off the Net, NFS mounting everything, NTPing, what have you (except for having local Disks for Mail Spools, anon FTP and WWW Pages); Back when that Server was "only" serving Mail, that seemed like a sensible Decision (no big Deal if a Mailserver chokes on Booting and stays dead for a Feast Day, few Disk Space). Before the Power was taken down, I managed to get / and /usr to a local Disk so that the Machine can boot single User without needing its Brother - but since my CD Drive is currently someplace else, I wasn't sure whether I could do that in Time. 1) Boot Delays in the PROM I had Reports from People that were forced to do such a Thing because some SCSI Device took longer to init than the CPU needed to do the POST so that the CPU tried to setup the SCSI Bus before the Device was ready. I did *not* try it because I succeeded in tweaking /etc/rc* to Taste. I have attached the two Howto's I received; Special Thanks to Peter Marelas and Willi Triantafyllou! 2) How to decouple Servers The top Reason why my Servers depend onto each other (more precisely: Why I have several Servers in the first Place) is that I don't want any Kind of User Activity on the File- and NIS Server. It hasn't too powerful a CPU, FSes mounted from a r/o Export are harder to subvert, etc.. Con- sequently, I need a second Server (using the first Server's NIS Data, if not more) for any Kind of "centralized but User-controlled" Service - like Mail. In a less secured Setting, the usual Dependency between Servers is that they want to NFS mount from each other. This causes a Deadlock (which, in Turn, is usually broken with intr/bg/soft Options) because in /etc/rc.local, the Servers want to mount *before* they export anything. According to Niall O Broin, this can be turned around without Harm, removing the Deadlock altogether! 3) sleep'ing Boot Scripts Once I had / and /usr copied onto a local Disk, built a new Kernel (the old one booted from Net but swapped onto local Disk), and installboot'd, the Server was able to boot single User without ever accessing the Net - which means that it was under the Control of the /etc/rc* Scripts (SunOSism ;-) by then. This and tweaking the Scripts to do a sleep seems to be the most common Way to delay the Boot Procedure. 4) Checking for Server Presence from Boot Scripts An even better Way to delay the Bootup of a secondary Server is to check the Primary's Status actively, however, this mightn't work in some Settings. Since my Servers are on the same Segment, I was able to use /usr/etc/ping at a Point where the Network Interface is only 3/4 up (Netmask and Broadcast still to be set according to NIS Data). I'm not too happy about ping, as it'll respond long before the Server is *really* up, thus forcing me to do a long "blind final" sleep. However, I couldn't find another Software right away that would accept a Timeout Setting, work on FSes still mounted r/o, and return a meaningful $status ; Maybe I'll hack a "tftpping" or somesuch. Please find the modified Part of my /etc/rc.boot in the Attachment. The Reason why I did the sleep at a Point where the FSes are still mounted readonly is that this will delay Dirtying of the Disks as well; I feared that the Power would come and go several Times, possibly corrupting the Disks more and more. (But I do fsck. Can a fsck corrupt Disks beyond Repair if started and interrupted?) 5) Other Ways to delay Booting in the PROM Another Suggestion was that I set diag-switch?=true on the secondary and diag-switch?=false, selftest-#megs=1 on the primary Server. That's a good Idea, but my Secondary is a SPARCstation 10 which *screams* through POST while the Fileserver is a Classic+ which might lose that Race nonetheless. In General, I have diag-switch?=true on all Hosts (taking Hosts down just to test the RAM would probably be frowned upon, and it discourages those Users that think they can powercycle SPARCs like a PC in Spite of Threats of physical Correction ;-). 6) UPS Considerations While my UPSes don't, I've had three Responses indicating that APC UPSes allow Setting of a Delay between Mains Powerup and UPSed Powerup. Too bad that we've got a whole Department whose Job is to prevent you from buying anything else than cheapest-Offer-not-outright-unusable. :-{ OK, let's be fair: My two Server's won't stay on separate UPSes, either ... (I "borrowed" the second for the Outage.) 7) Epilogue Well, what happened to my Servers at last? I'm pleased to say that they obviously booted fine! (Power had been turned off and back on at least three Times.) I don't have a Log of the Events (hard to do with Disks mounted readonly), but the normal Logs show normal Activity after each Reboot. 8) Thanks to: Hal Stern stern@sunrise.East.Sun.COM Kevin Sheehan Kevin.Sheehan@uniq.com.au Glenn Satchell Glenn.Satchell@uniq.com.au David L. Markowitz dav@jasmine.litronic.com Gene Rackow rackow@mcs.anl.gov Niall O Broin nobroin@esoc.esa.de Torsten Metzner tom@plato.uni-paderborn.de Pell Emanuelsson pell@lysator.liu.se Brad Young bbyoung@amoco.com Tom Schmidt tschmidt@micron.com Geert Devos Geert.Devos@ping.be Ken Ferguson ssa@pvdsw.amat.com Chris Phillips chris@platform.com Cheryl L. Southard cld@astro.caltech.edu Peter Marelas Peter.Marelas@fulcrum.com.au Willi Triantafyllou wtrianta@htcomp.de Simon J. Gerraty sjg@netboss.dn.itg.telecom.com.au 9) Special Greetings: Chip Christian, whatever his Email is, and his Son. Hope you had a nice Birthday Party! :-))) Christopher L. Barnard for checking the NVRAM FAQ at my Request. Thanks again, J. Bern -- /\ /""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""\ / \/ bern@uni-trier.de (Size Limit!) | P.O. Box 1203 | Ham: \/\ / J. \ bern@ti.uni-trier.de (SUNAttachm.OK) | D-54202 Trier | DD0KZ / \ \Bern/ No Finger etc.; Use Mail (Subj. "##" for Autoreply List) and \ / \ /\ WWW. /\/ \/ \____________________________________________________________/ [ The following Data is a / some SUN Attachment(s). If you do NOT have ] [ SUN OpenWindows running, you probably have to decode it by Hand ... ] ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: Peter.Marelas X-Sun-Content-Lines: 50 From: Peter.Marelas@fulcrum.com.au (Peter Marelas) Yes you can add a delay.. Use the code below.. cat /etc/nvramrc.fth << EOF probe-all \ install devices install-console \ install console device banner \ output banner : abort-on-key key? \ true on tos if key has been pressed. if abort" Start delay aborted" \ Abort with message. then ; : timed-startup ( time --- ) 0 swap cr cr cr ." Press Any key to abort Timer" cr do ." Start in :- " i 3 u.r ." Seconds" h# 0d emit 1000 ms abort-on-key -1 +loop ; base @ decimal 40 timed-startup base ! EOF Then execute the following.. eeprom fcode-debug?=true eeprom use-nvramrc?=true eeprom nvramrc="`cat /etc/nvramrc.fth`" Each time the /etc/nvramrc.fth is altered, the last command must be executed so the code is written to nvram. Alter the value "1000" which is in milliseconds, to your liking.. -- The Fulcrum Consulting Group Peter Marelas - Consultant 12/10-16 Queen St, Melbourne VIC 3000,Australia Ph: +61-3-9621-2100 PGP Key -> finger maral@fusion.sprint.com.au Fx: +61-3-9621-2724 ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: Willi.Triantafyllou X-Sun-Content-Lines: 34 From: wtrianta@htcomp.de (Willi Triantafyllou) Hi Jochen! Sun gave me this (for SSA). I did not test, so no guaranties. Hope this helps. >ok nvedit 0::wait_for_ssa 1:80 0 do 2:i . space 3:500 ms 4:loop 5:; 6::wait_for_ssa control C >ok nvstore >ok setenv use-nvramrc? true >ok reset defer boot 4 minutes. tr + + Willi Triantafyllou Hellmer & Triantafyllou Tel.: +49-711-931893-0 Computer-Systeme GmbH Fax : +49-711-931893-17 Dornierstrasse 30 AppleLink : GER.XDH0003 73730 Esslingen GERMANY Internet : wtrianta@htcomp.de + + ---------- X-Sun-Data-Type: default X-Sun-Data-Description: default X-Sun-Data-Name: rc.boot.changedpart X-Sun-Content-Lines: 33 # --- Lots of Stuff above this Line --- # --- Situation: / and /usr mounted readonly, ifconfig -ad auto-revarp up # --- done, "appropriate" fsck's done, now interpret fsck's Exit Value case $error in 0|2) # # Everything looks good. # # Wait for Fileserver to come up (EXPERIMENT SECTION) # /usr/etc/ping Primary 1 >/dev/null 2>&1 nerr=$? if [ $nerr -eq 0 ]; then echo "### Server is up ###" else echo "### Server is down ###" until /usr/etc/ping Primary 1 >/dev/null 2>&1 ; do sleep 10 done echo "### Server is back ###" fi echo "### Waiting 10 min ### Hit ^C to continue immediately ###" intr sleep 600 echo "### Continue Boot. ###" # # Finish the single user setup which will remount the # file systems read-write and do other work which can # be done only on a writable root file system. # sh /etc/rc.single # --- Error Handling below this Line ---