SUMMARY (partial) incomplete read or write - what do they mean?

From: ray@isor.vuw.ac.nz
Date: Wed Dec 15 1993 - 22:37:42 CST


I wrote:
>
> I am worried about the following messages that have been cropping up in
> my messages file ever since this system was powered down and rebooted
> around a planned power outage. I couldn't find anything in SunSolve (or
> AnswerBook).
>
> System: SS2 64MB with Prestoserve
> Sun 1.3GB drive (root, 96MB swap, /usr, etc.)
> sd0 is a Sun0104 used as a secondary swap disk (bringing total
> swap space to nearly 200MB)
> ----
> :
> Dec 11 17:27:48 aqua vmunix: sd0: <SUN0104 cyl 974 alt 2 hd 6 sec 35>
> :
> Dec 11 17:41:46 aqua vmunix: sd0: incomplete read- retrying
> Dec 11 17:41:47 aqua last message repeated 29 times
> Dec 11 17:41:47 aqua vmunix: sd0: incomplete read- giving up
> :
> ... repeated at random intervals of 0 seconds to 9 hours (the system was
> not particularly busy in the weekend)
> :
> Dec 13 10:30:06 aqua vmunix: sd0: incomplete write- retrying
> Dec 13 10:30:08 aqua last message repeated 29 times
> Dec 13 10:30:08 aqua vmunix: sd0: incomplete write- giving up
> :
> Dec 13 10:39:31 aqua vmunix: sd0: incomplete read- retrying
> Dec 13 10:43:51 aqua vmunix: sd0: incomplete read- giving up
> Dec 13 10:43:51 aqua vmunix: panic: error in swapping in u-area
> :
> ----
>
> After the reboot, the problem persists, perhaps associated with heavy
> disk activity, or perhaps only when the secondary swap partition is
> being heavily used. Note that there are always 30 retries, and apart
> from the messages, the system appears to behave normally (except for the
> panic, which I assume is associated). The system has been stable for the
> previous 8-9 months.
>
> Any ideas what they mean, and any suggestions for repair?
>
> Ray Brownrigg ray@isor.vuw.ac.nz

and received 4 responses, 1 pointing to the stiction problem (but this
disk appears to be working normally). The other 3 responses suggested a
reformat. So I tried that with the following outcome:

1) format/analyze/read (or /refresh or /test or /write) all complete OK,
BUT at the same time the console is spewing out the above messages. I.e.
format does not find anything wrong, but the I/O subsystem does! This
implies to me that the message
"vmunix: sd0: incomplete write- giving up"
does not get reported to format as an error return.

2) I then tried format/format, which completed the format pass in about
4-5 seconds (very quickly I thought, but I could hear the disk chattering
away), then as soon as it starts verifying, the messages start on the
console again (but still no errors reported by format).

3) One final piece of evidence, cat or dd will report a write error eg:
# cat > /dev/sd0a
1234567890
cat: write error: I/O error
#
and newfs reports an error (after doing a lot of work on the disk with
nothing appearing on the console, perhaps because it is working with the
raw partition?):
# newfs /dev/rsd0c
/dev/rsd0c: 204540 sectors in 974 cylinders of 6 tracks, 35 sectors
        104.7MB in 61 cyl groups (16 c/g, 1.72MB/g, 768 i/g)
super-block backups (for fsck -b #) at:
 32, 3440, 6848, 10256, 13664, 17072, 20480, 23888, 26912,
 30320, 33728, 37136, 40544, 43952, 47360, 50768, 53792, 57200,
 60608, 64016, 67424, 70832, 74240, 77648, 80672, 84080, 87488,
 90896, 94304, 97712, 101120, 104528, 107552, 110960, 114368, 117776,
 121184, 124592, 128000, 131408, 134432, 137840, 141248, 144656, 148064,
 151472, 154880, 158288, 161312, 164720, 168128, 171536, 174944, 178352,
 181760, 185168, 188192, 191600, 195008, 198416, 201824,
cg 0: bad magic number
cg 0: bad magic number

It seems to me all I can do now is replace the drive. I guess I should
check the internal physical connections first. Any other suggestions?

Thanks to:
sozoa@atmel.com (Steve Ozoa)
Mike Cross <cross@engfs.med.ge.com>
miguel@dt.fee.unicamp.br (Miguel A. Rozsas)
Dan Stromberg - OAC-DCS <strombrg@hydra.acs.uci.edu>

Ray Brownrigg ray@isor.vuw.ac.nz



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:33 CDT