reboot summary

From: Lyle Miller (lmiller@aspensys.com)
Date: Fri Jun 23 1995 - 07:53:27 CDT


Sorry for this not going through the first time. I don't know why it
didn't yet, but here's what I recieved from all the good folks out there
regarding reboot schedules:

Here's the basic list of responses regarding reboots per several summary requests:

>From bminer@lnd.state.az.us Wed Jun 21 11:39:45 1995
Date: Mon, 19 Jun 95 07:43:19 MST
From: Bob Miner OPS <bminer@lnd.state.az.us>
To: root@aspensys.com
Cc: bminer@lnd.state.az.us
Subject: Re: reboots

This is a question that I have had since we installed our first Sun in 1987.
We have found that a WEEKLY reboot of our entire network of Suns (now at 38
machines of various flavors, all running SunOS4.1.3) keeps our network fairly
clear of hiccups. I have noted an increased frequency of "{halts and hangs"
when, for some reason, we fail to reboot on our regular Friday morning
schedule. All experimentation with this frequency has been subjective, and
may be slightly paranoid, but I preferto err on the side of conservatism in
this case. I hope that this reply is helpful.

******************************************************************************

>From sunman@criterion.com Wed Jun 21 11:39:55 1995
Date: Mon, 19 Jun 95 08:08:06 CDT
From: Aditya Talwar <sunman@criterion.com>
To: root@aspensys.com
Subject: Re: reboots

I have worked at large site's. Generally, reboots should be done
once a week. The weekly reboots help since everything is reset
at the beggining of the week. The time of the reboot should well
communicated so that everyone knows about them and shut's down
important process's via cron jobs. Also, reboots help if you have
installed new products/scripts. Reboots help in catching any problem
early on.

******************************************************************************

>From jhall@sqi.com Wed Jun 21 11:40:03 1995
Date: Mon, 19 Jun 1995 10:49:52 -0700
From: John Hall <jhall@sqi.com>
To: root@aspensys.com
Subject: Re: reboots

We don't have a reboot schedule, the power company does it for us (about
every three months). :-(

Some of our servers (on UPS) have been running for more than a year without
trouble. One danger sign is hung processes and hung sockets (lsof).

When your Sparc Classic hangs, can you use L1-A to get to a boot prompt?

If so, then your problem is more likely to be software related (soft hang).
If you cannot L1-A it, then your problem is most likely hardware or power
related (hard hang) and I would suggest replacing the whole unit. You
should also have it on a "clean" UPS. I have seen several of the "consumer"
UPS products actually produce worse power than the utilities. Another
source of hard hangs is grounding problems. Make sure all the peripherals
connected to your machine are plugged into the same power source as the
Sparc and have good grounds. Many of the "cheap" powerstrips either do
not have a ground connection internally, or they are broken on one or more
plugs.

Good luck.

*******************************************************************************

>From Birger.Wathne@vest.sdata.no Wed Jun 21 11:40:23 1995
Date: Tue, 20 Jun 95 08:56:37 +0200
From: "Birger A. Wathne" <Birger.Wathne@vest.sdata.no>
To: root@aspensys.com
Subject: Re: reboots

An Oracle server in Sweden (Sun SS10) just passed an uptime of 600 days,
if my memory is correct. So rebooting each day shouldn't be nessesary.
Our admin will reboot the user's CPU servers every 3-4 months.

*******************************************************************************

>From mike@trdlnk.com Wed Jun 21 11:40:38 1995
Date: Tue, 20 Jun 95 12:33 CDT
From: Michael Sullivan <mike@trdlnk.com>
To: root@aspensys.com
Subject: Re: reboots

The best thing is to get a crash dump. See the instructions on using
savecore. If the machine is just freezing, rather than panicing on its
own, you may be able to force a panic to dump memory for savecore after
the lock-up by hitting STOP-A, and then issuing the sync command to the
monitor. Perhaps with a crash dump Sun will be better able to find the
problem.

>Regarding PANIC!, the book explaining how to find out the cause of
>crashes and hangs now on the market from SunSoft Press:

Yes, this book sounds interesting -- we are ordering a copy too.

>But, in the meantime, can anyone discuss their reboot schedules (if any)
>with me? How often should a scheduled reboot take place?

Once or twice a year, when you install a new SunOS version is about right.
We have lots of Suns under heavy use that go 6 months without a reboot.

> What are the
>indicators? What _good_ things does a reboot do for a Sun box,
>specifically?

Nothing, unless you are encountering an OS bug that is causing a
resource leak, or corrupting kernel memory structures, etc.
In this case, you should get the bug fixed, not depend on reboots.
Many (most?) OS bugs that cause crashes and lock-ups are not dependent on
the time since the last reboot and could happen just as easily 5 minutes
after a reboot as at any other time.

> Why do reboots help with overall system performance--or do
>they?

They don't (in a properly functioning OS).

***************************************************************************

>From Kimberley.Brown@UK.Sun.COM Wed Jun 21 11:53:30 1995
Date: Mon, 19 Jun 1995 09:34:42 +0100
From: Sr OS Product Support Eng - SunUK <Kimberley.Brown@UK.Sun.COM>
To: lmiller@aspensys.com
Subject: Reboot schedules

Hi Lyle,

Your email was forwarded to me, only cos you mentioned Panic! init.

You should not be having to reboot on a regular basis.

Systems which REQUIRE rebooting, usually require it due to resource
allocation problems which the admin can't free (or figure out how
to free). Sometimes these allocation problems are actually due to
kernel bugs where the resource is being "leaked" -- used but never
freed after use.

On both Solaris 1 and Solaris 2 (BSD SunOS 4.X and SVR4 SunOS 5.X)
there are recommended lists of patches. The recommended lists
often include a jumbo kernel patch which addresses known bugs in
the kernel, security fixes, and other fixes to bugs which cause
trouble. You should ask your Sun contact for the recommended list
or check your SunSolve CD or nearest SunSolve database server.

>From there, you may want to monitor resources... such as netstat -m
for mbufs. However, your best best is to force a panic when the
system hangs. Enable savecore (make sure you have room to store
a copy of memory), L1-A the system when it hangs (disconnect keyboard
if that doesn't work), type "Sync" at the > or ok prompt which will
trigger a panic zero. Send the resulting file to your SunService
engineer for analysis. (((Do NOT send it to me please!)))

Good luck!

Kimberley

PS - The book is very good. Of course, I'm biased! :-)

           ```
          ( o o)
     ---oOO--()--OOo------------------------
    | |
    | Lyle E. Miller |
    | Senior UNIX Systems Analyst |
    | |
    | ASPEN SYSTEMS CORPORATION |
    | 1600 Research Boulevard |
    | Rockville, Maryland 20850 |
    | |
    | voice: 301 251 5375 |
    | fax: 301 251 5767 |
    | |
    | e-mail: lmiller@aspensys.com |
    | |
     ---------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:27 CDT