SUMMARY: dump aborts unexpectedly

From: Ayrton Sargusingh (asargusi@sbrsim.ed.dreo.dnd.ca)
Date: Thu Apr 06 1995 - 06:20:08 CDT


REQUEST:
-------

> Does anyone know what the following means. I attempted a dump on a
> filesystem and got the following messages while the dump was in
> progress. I' not asking how to resolve it. I think a good fsck will
> do it. But I do want to know what caused this error and what is
> meant by 'bread from /dev/rsd0d [block -959786260]: count=8192, got=-1'.
> Thanks.
>
> I'm running Solaris 1.1.2 (no patches) with ODS 1.0a patched into
> the kernel.
>
> DUMP: dumping (Pass III) [directories]
> DUMP: dumping (Pass IV) [regular files]
> DUMP: bread: lseek fails
> DUMP: (This should not happen)bread from /dev/rsd0d [block -959786260]: count=8192, got=-1
> DUMP: bread: lseek fails
> DUMP: (This should not happen)bread from /dev/rsd0d [block -892832704]: count=8192, got=-1
> DUMP: bread: lseek fails
> DUMP: (This should not happen)bread from /dev/rsd0d [block -857415974]: count=8192, got=-1
> DUMP: bread: lseek fails
> DUMP: (This should not happen)bread from /dev/rsd0d [block -422389156]: count=8192, got=-1
> DUMP: bread: lseek fails
> DUMP: (This should not happen)bread from /dev/rsd0d [block -892838712]: count=8192, got=-1
>
> Since I cannot bring the system (server) down at this point, I can
> only attempt the following command. Here are the results:
>
> # fsck -p /dev/sd0d
> /dev/sd0d: PARTIALLY TRUNCATED INODE I=11355 (SALVAGED)
> /dev/sd0d: 778334821 BAD I=11355
> /dev/sd0d: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
>
> The filesystem in question is /var. I cannot umount it.

RESPONSES:
---------

Thanks to the following people who responded:

Steve Elliott <se@comp.lancs.ac.uk>
johnh@gerbil.umds.ac.uk (John Hearns - System Manager)
gdonl@gv.ssi1.com (Don Lewis)
Dave Mitchell <D.Mitchell@dcs.shef.ac.uk>
sgs@hoccson.ho.att.com (bl0312300-steve.scott(HOE403)777)
"Christopher L. Barnard" <cbarnard@CS.UChicago.EDU>
gregr@cibc.com (Greg Roberts)

Steve Elliot, John thought there were serious disk problems not just
file system problems. They suggested doing an fsck and perhaps
a format/repair depending on the circumstances.

Don, Greg and Dave thought it was only filesystem problems and a
simple reboot (and fsck) would fix it. They also thought fsck'ing an
active file system was a bad idea. However, if that is OK for / and
/usr, why not /var. That too is a system directory that I simply
can't umount. Dozens of user processes are using /var. If I'm killing
off those processes just to umount /var, I might as well kick all
users off and reboot the machine.

Dave went on to add possible reasons for this problem:

* disk problems (check /var/adm/messages for scsi-related errors)
* Kernel bugs
* something related to ODS ????
* config problems, eg overlapping partitions
* finger trouble, eg writing a tar file to /dev/rsd0d (believe me,
  I've seen it!)

It could be item #3 above, but not any of the others. I've checked
them out.

Steve Scott gave me some worrisome news:

        I have had this error many times; expect to replace your
        disk, That's the only thing I've been able to do to
        eliminate this error for good.

I hope you're wrong Steve.

Christopher, my dumps are aborting after such a message. It is very
much a concern.

I just rebooted my system, and voila! I'm back! I had to reboot a
few times and even got a panic the third time. But after fsck'ing
all the file systems, it appears that things are stable now.
Perhaps ODS is screwing up the filesystems since it has only been
a week since my last reboot. Thanks to all the respondents.

-------
Ayrton Sargusingh, SBR System Manager, RSD | phone: 613-998-2932
Defence Research Establishment Ottawa (DREO) | fax: 613-998-4560
3701 Carling Ave., Ottawa ON, Canada K1A 0Z4 | email: asargusi@sofkin.ca

-------
Ayrton Sargusingh, SBR System Manager, RSD | phone: 613-998-2932
Defence Research Establishment Ottawa (DREO) | fax: 613-998-4560
3701 Carling Ave., Ottawa ON, Canada K1A 0Z4 | email: asargusi@sofkin.ca



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:21 CDT