SUMMARY: Solaris 2.4 Prestoserve problem

From: Nick Murray (nmurray@csd.abdn.ac.uk)
Date: Sun Jun 04 1995 - 11:04:11 CDT


Sorry for the delay in this summary, but (1) I wanted a resolution to the
problem before posting it, and (2) it appears the problem may affect a
number of sysadmins who won't be aware of the problem.

 In parallel with this posting, I logged a fault with Sun UK. Their
recommendation was that Prestoserve 2.4.1 (even patched with 101714-03)
was unstable on some platforms with Solaris 2.4 and should be disabled.
"Version 2.4.2 is out real soon now". I was content to wait for the new
driver (which is likely to take about 6 months to get to us), until I
received a message from Greg Earle. He pointed out a bug report 1208206 -

 Bug Id: 1208206
 Category: presto
 Subcategory: software
 Release summary: 2.4.2
 Synopsis: Combination of Prestoserve and ODS highly unreliable on filesystem recovery.
.
.
.
 Work around:
Customer cannot afford to lose data. Prestoserve has been deinstalled on all their systems.
This resulted in severe degradation of system performance.
 History:
         Submitter: dan@aus Date: 05/30/95
         Dispatch Operator: bugtraq Date: 05/31/95

(those interested should read the whole bug report - I can forward you a
copy if you can't get it off SunSolve)

Interestingly, it states the problem occurs with any combinations of ODS
3.0/4.0 and Prestoserve 2.4.1/2.4.2. - so waiting for the new driver won't
fix the problem.

Armed with this information I spent yesterday experimenting with this
problem. I found that the hangs on reboot do indeed only occur if a
metadevice has been accelarated with Prestoserve and there are pending
writes in the NVRAM.
 After all these tests I am confident to use Prestoserve on non-metadevices
again. I have now installed ODS 4.0 (I can't resist GUIs) and have
transferred filesystems that get heavy write loads to normal UFS partitions.

Thanks to:- earle@isolar.tujunga.ca.us (Greg Earle)
            zaitcev@lab.ipmce.su (Pete A. Zaitcev)

----- Begin Original Message -----

This afternoon the power was accidentally disconnected from our SPARCserver
1000. The problem is it hung upon reboot. Running the boot command manually
with the verbose flag showed it hung immediately after printing the message:

'Prestoserve: writing dirty buffers'

 The LEDs on the system boards continued to 'walk' normally, but <BREAK>
didn't work on the console, and power cycling the console which usually
brings you back to the boot monitor did nothing either. The only way I
could get the thing to actually boot was to zero the NVRAM cache from
the boot monitor. This of course lost all the data in the cache and
meant lots of random corruption of the filesystems!

 After further testing, I could find nothing wrong with the Prestoserve
NVRAM, it worked fine on rebooting, the diagnostics showed no problems
and if the NVRAM cache was flushed upon normal shutdown the reboot was OK.

 The last time this kind of improper reboot happened was when the system
was running Solaris 2.3, and then it wrote the dirty buffers in a couple of
seconds and continued to boot.

 Has anyone any ideas about this? At the moment I have Prestoserve disabled,
it was enough of chore fixing the corruption once.

 System configuration:

SPARCserver 1000, 3 CPUs, 4MB NVRAM, 192 MB memory
Solaris 2.4, Prestoserve 2.4.1 patched with 101714-03

Patches:

101714-03 101909-05 101945-23 101983-01 102020-02 102070-01 102294-01
101753-01 101910-07 101950-01 101985-01 102024-01 102105-01 102330-03
101829-01 101911-04 101959-02 101987-02 102030-02 102108-01 102357-01
101878-01 101920-01 101959-03 102001-03 102035-01 102112-01 102479-01
101878-03 101920-02 101961-07 102002-01 102037-01 102119-01 102922-01
101879-01 101921-04 101969-04 102002-03 102039-01 102134-01
101880-03 101922-04 101973-10 102003-01 102042-01 102196-01
101902-01 101923-03 101975-01 102004-01 102044-01 102216-01
101905-01 101925-01 101977-03 102007-01 102048-01 102226-07
101907-02 101933-01 101979-03 102011-02 102057-12 102277-02
101907-04 101945-10 101981-01 102016-01 102062-03 102286-01

Thanks in advance.

Nick Murray
Computer Officer
Department of Computing Science
University of Aberdeen
Scotland

----- End Included Message -----



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:26 CDT