Summary: Rebooting many systems

From: phil hoff (pah2824@kramden.nyu.edu)
Date: Wed Dec 09 1992 - 18:59:41 CST


Here are they many responses I have recieved from my original article
below. As you can see there are a couple of different ways to reboot
many machines
Subject: Re: rebooting many systems

|> I have over 100 workstations that NFS mount /home and /apps.
|> From time to time we have to reboot all of them. However I have run into
|> problems with the following script:
|>
|> #!/bin/sh
|> set -x
|> for host in `cat host.list.new`
|> do
|> ping $host 40
|> if [ "$?" -ne 0 ]
|> then
|> echo "Can't ping $host">>not_rebooted
|> else
|> rsh $host reboot&
|> fi
|> done
|>
|> If I don't background the reboot it seems to wait for the machine to come back up before it moves to the next machine. However when I run this script , because it never gets a return from the rsh , It seems to build up alot of zombie processes and after i|> t hits 50 machines I get the error
|> "Can't fork process"
|> I think this means it has too many processes, so I then have to reboot the server and run the script on the last 50 machines.
|> here is a sample of my ps -aux
|> 697 p0 Z 0:00 <defunct>
|> 699 p0 IW 0:00 rsh nyc85 reboot
|> 701 p0 Z 0:00 <defunct>
|> 703 p0 IW 0:00 rsh nyc49 reboot
|> 705 p0 Z 0:00 <defunct>
|> 707 p0 IW 0:00 rsh nyc53 reboot
|> 709 p0 Z 0:00 <defunct>
|> 711 p0 IW 0:00 rsh nyc232 reboot
|> 713 p0 Z 0:00 <defunct>
|> 715 p0 IW 0:00 rsh nyc60 reboot
|> 717 p0 Z 0:00 <defunct>
|> 719 p0 IW 0:00 rsh nyc190 reboot
|> 721 p0 Z 0:00 <defunct>
|> 723 p0 IW 0:00 rsh nyc175 reboot
|> 725 p0 Z 0:00 <defunct>
|> 727 p0 IW 0:00 rsh nyc30 reboot
|>
|> Someone out there can reboot over 100 machines cleanly
|> any help would be appreciated!!
|>
----------------------------------

>From Barry Margolin

rsh -n $host "reboot </dev/null >&/dev/null &" &

You have to close the standard I/O streams on the remote end to cause it to
close the connection. You're getting zombies because the rsh process is
waiting for EOF from the remote system, which never comes because it's shut
down (actually, I don't understand why the connections aren't closed
cleanly when the reboot command sends a SIGQUIT or SIGKILL to every
process).

To avoid timing problems, replace "reboot" with "(sleep 1; reboot)". This
way you can be pretty sure that the connection will be closed before the
reboot happens.

-- 
>From Rob Montjoy                   		- Rob.Montjoy@UC.Edu

Try using the -n option to rsh. This may or may not help your problem but it can not hurt..

BTW are you using the automounter? If not you may want to use it. I will eliminate a lot of problems. Take a look at the Berkeley automounter(amd)..

------------From Dan Balza

I had the same problem. To solve it I rcp a short reboot script to the SUN and then rsh the script. Below is the important part of the script:

# Build the reboot script

cat > /tmp/blort << _EOF_ #!/bin/sh (sleep 30; /etc/reboot) >/dev/null 2>&1 & exit 0 _EOF_

for host in `cat $REBOOT_LIST` do /usr/etc/ping $host > /dev/null 2>&1 if [ $? -ne 0 ] then echo "************************* $HOST did not respond" \ "to ping..." echo "" echo "PLEASE MANUALLY CHECK ON WORKSTATION $host" echo "Manually reboot $host so as not to run this procedure" \ "again" echo "" echo -n "Enter RETURN to continue this script: " read reply else echo "Rebooting $host....." rcp /tmp/blort $host:/tmp/blort rsh $host sh /tmp/blort fi done

rm /tmp/blort -------- >From Elaine Lai (elaine@iss.nus.sg), Institute Of Systems Science,

use this: rsh $host -n "shutdown -r +3 >& /dev/null < /dev/null &"

instead of: rsh $host reboot&

--------------

>From Peter Wirdemo

We have a simular situation like you have ... we want to reboot all our clients once a week or so. On every "File-Server" there is about 20-50 clients. We put a little script in a good place on the "File-Server" and a line in "cron" ...

Regards Peter Wirdemo ## # DEBUGGING: # To run this script without executing any commands, set the # DEBUG environment variable to "echo DEBUG--". This will # echo the commands instead of executing them. # # sh: DEBUG='echo DEBUG-- ' ; export DEBUG # csh: setenv DEBUG 'echo DEBUG-- ' # PATH=/bin:/usr/bin:/usr/ucb:/etc:/usr/etc:/usr/etc/install # HOST=`hostname` # if [ `whoami` != "root" ] ; then echo "You have to be root to execute $0." exit 1; fi

for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 \ 20 21 22 23 24 25 26 27 28 29 30 do echo '----------------' ${HOST}${i} '---------------------'

i ${DEBUG} ping ${HOST}${i} 5 if [ $? = 0 ] ; then ${DEBUG} rsh -n ${HOST}${i} ${*} \>\& /dev/null \& fi done exit 0;

----------- >From Andrew C. Burnette acb@ncsu.edu

rsh hostname 'shutdown -r -1 </dev/null >/dev/null 2>&1 &'

shutdown is to make sure rsh has time to exit voulanterely. sleep 20; reboot should work as well. We have 700 machines, and the 'best' way we have found is to place an entry in the crontab for each machine to reference a script on the 'server' (actually, many servers). these scripts(there are several run at different times) can be changed, modified, or whatever you like to make the machines reboot, install new software or whatever system maintenance needs to be done. Our machines all reboot Sunday or Monday, in the early hours of the morning. Good Luck, p.s. our boot scripts copy in a new crontab (unless a specifically modified on exists) at each reboot. Same with passwd files and so forth. Sysadmin become real easy then. -- ******************************************************************************

Shaun McCullagh (RA) Why not >From

for host in `cat host.list.new` do ping $host 40 if [ "$?" -ne 0 ] then echo "Can't ping $host">>not_rebooted else rsh $host -n at now + 2 min /usr/etc/reboot& fi

done

-------

>From Claus try

It works for me with shutdown -r +1 instead of ------ >From Andrew Hay the idea behind my suggestion is to do the backgrounding on the remote machine, instead of the server.

-- + While I have not tried them all I am sure one if not all will work or at least should work. Thank you all for the great responses.

Phil Hoff



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:06:54 CDT