SUMMARY: SDS (ODS) 4.0 logged file system crash

From: Steve Phelps (steve@epic.co.uk)
Date: Tue Feb 11 1997 - 06:22:23 CST


Many thanks to John Stoffel <jfs@fluent.com> and Greg Price <greg@defcen.gov.au>

The solution appears to be patch 103153 (UFS patch) revision 11 or later.

>From greg@defcen.gov.au:

>
>Yes, we've seen this. I will get you the exact details in a couple of hours,
>but you need a late version of the ufs patch. We were seeing this problem on
>a large filesytem Raid5 with ufs logging. We put the current version of the
>ufs patch on and the error moved to "freeing free inode", they had a bug in
>the patch so they released another beta copy of the patch that fixed the prob.
>
>We're not sure about the original situation, but we narrowed the freeing
>free inode prob to people deleting large files, 1.5GB in our case on a
>ufs logging filesystem. Will be back soon with details.
>

The original question:

>
> We are currently using ODS (Disk Suite) 4.0 to create logged file systems.
> Every couple of months one of the servers running a logged file system will
> crash with a panic similar to the following:
>
> panic: free: freeing free block,dev = 0x154000b, block = 24440, fs =
> /home/jupiter/raid2
>
> The server will then crash repeatedly with this message until:
>
> the relavent meta-device is cleared using metaclear
> then unmounted
> then has zeros written to the logging device using dd
> and then re initialised using metainit
>
> it will work fine if we metaclear the device and remount it as a normal UFS
> file system; there is no corruption in the file system itself.
>
> This has happend on several servers running Sol 2.5 and 2.5.1.
>
> We have patch 102580-13 installed on all servers running ODS.
>
> A typical logging device is 64mb and a typical master device is 30GB. We
> have had the problem when the logging device has been mirrored and when it
> has been stand-alone. Each server has at least eight meta db replicas
> spread across controllers and disks.
>
> The file systems involved are different on different servers where the same
> problem happens; different types of SCSI controller, different disks,
> different RAID controllers etc..
>
> The servers are all 100Base-T NFS servers to a wide variety of NFS clients
> (Suns, Macintosh, PCs, NFSV2 & NFSV3).
>
> Anyone seen anything like this before, or have any ideas on what the problem
> could be?
>
> TIA
>
> Steve Phelps.
>
>



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:46 CDT