(original query at end)
Well, it looks like we've found the problem - the disk in question was
a Maxtor 669Mb disk, (something I neglected to mention in the original
posting, unfortunately). This is a known problem in the older Maxtor
669Mb drives - someone here mentioned it to a friend, who had a similar
problem, and put me on the right track to finding the problem.
Suggested solutions from the net were partition table overlap, hardware
error on disk (it passed all the tests I could find to throw at it), or
applying UFS & NFS jumbo patches. None of these helped, although it was
probably for the better to apply the patches anyway.
A call to the hardware maintenance mob, and a new drive is on the way.
Thanks to all those responded, and to those who responded with similar
problems, check whether its an old Maxtor disk. I've included the article
from the Symptoms and Resolutions database in SunSolve (it didnt show up
in my original search through the database, as I wasnt looking for the
right keywords) Also, the article only talks about Sybase, with only one
mention that other programs can cause the same problem.
So it looks like this is the problem.
] Collection: Symptoms and Resolutions
] Document: 2945
] SRDB ID : 2945
] SYNOPSIS : Sybase db errors caused by bad PROM on 669mb drives
] DETAIL DESCRIPTION : The PROM code on the Maxtor XT-8760S (669mb) Disk
] Drive can cause data corruption under certain software
] This problem was detected using Sybase DBMS. Sybase
] validates returned pages and issues an error 605 if
] the page is corrupt. This causes the user session to
] exit. The problem has occurred with other DBMS
] Permanent data corruption can occur.
] SOLUTION SUMMARY : The drive needs to be replaced. Place a call to
] Complete information is available in Field Information
] Notice #I0123-1.
] SYMPTOMS : "error 605" when running Sybase
] KEYWORDS : sybase, prom, error, 605, maxtor, data, corruption
] BUG REPORT ID : 1049469
] PRODUCT : disk_admin
] SUNOS RELEASE : 4.1, 4.1.1
] HARDWARE RELEASE : Sun4, Sun4c
] ISO-9001 STATUS : Uncontrolled
} We have a SS2 acting as an NFS server for /usr/local/bin for around 10
} other sparcstations. Very recently, its /usr/local filesystem has been
} getting corrupted in a very strange manner. It appears to be moving the
} files around on the file system - eg 'gcc' suddenly became a copy of
} 'ps2ascii', another binary would switch places with a different one.
} running fsck produced _lots_ of DUP warnings, and fixed most of the
} damage back up, but several hours later, it starts happening again. The
} other thing that is _really_ wierd is that it seems to be happening
} only to ones whose inode numbers are in a certain range (like within
} the range 101870 to 101900 - this may just be a coincidence, but its
} certainly strange. The machine also crashed last night with a
} 'filesystem inconsistency' on the /usr/local filesystem (the one with
} the above problems.) This is the first crash we've had in months.
} Its running a more-or-less vanilla 4.1.2 kernel, is this a known problem,
} and is there a patch? (I searched the sun-managers wais source, but found
} nothing appropriate)
} Many, many thanks in advance
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:07:58 CDT