(LONG) response to files filled with nulls

From: brossard@litsun.epfl.ch
Date: Sun Apr 01 1990 - 05:25:28 CDT

        First of all, the quality of responses more than makes up for
having to go through 10-15 e-mail a day from sun-managers, thanks to all
those who answered.
    Upon copying (with cp?, and tar) files/directories, sometime file
contents get replaced with nulls. The files stay the same sizes, they
are NO errors reported, but when we try to use the file, we realize that
they are void of contents. fsck reports no problems.
--->>>> The original file ALSO (sometimes ? all the time ?) sees its
--->>>> contents vanish. But no directories lost yet, just files.
    We are NOW running all rw partitions using hard mounts, although I
have been told that this should have no impact. Also we do not use
    Our configuration (in case that helps anybody at SUN working on this
problem), all running 4.0.3:
        SUN 3/260 with 2x688MBytes SMD drives,
        2 x SUN 3/60 with disks, (one mounts /usr via nfs)
        8 x SUN 3/60 diskless.
    We had no problems while our 3/60's with disk were running 3.5.
To quote Kevin R. Underriner:
> A call to Sun produced the reply that this was a known bug that has been
> assigned highest priority.
    The patch tape "NFS file corruption patch tape version 2.0" addresses
a number of known problems and seems to reduce the number of occurences
of this problems.
    The most pertinent info came from Mark Morrissey:
From: Mark Morrissey <markm@bit.uucp.fr>
I just saw your note regarding null-filled files. You are indeed
correct. The problem _is_ an NFS problem. We have been working
with Sun for over a year on a solution. You should request a copy
of NFS Patch Tape 2.0 from your local office. This should help
alleviate the problem, but will not cure it :-(. We are working
with senior Sun engineers and the Customer Service Escalation team
to resolve the problem.
Try the patch tape and then see if things are bearable. When we get
a good fix and receive official notice that is is available, we will
post to sun-managers.
good luck!
Mark Morrissey email: ...harvard!ogicse!bit!markm
Unix Systems Administrator -or- bit!markm@cse.ogi.edu
Bipolar Integrated Technology -or- (503) 629-5490
    I'm not including all the replies I got, but thanks for all of them.
note that included below is a copy of the
--->>> description of the patch
mentionned above, Jay Lessert included it with his e-mail.
Lines starting with AB> are my comments.
Bengt Skyllkvist - Sun Sweden <bengts@sweden.sun.com>
From: Hal Stern - Consultant <halstern@sun.com>
this is a known problem with NFS and the VM system in SunOS 4.0.
there are a number of open bugid's on it.
in the case you describe, it's possible for a write to fill in a
disk block with a "hole" (which becomes nulls) if the partition
holding the block is full.
the *best* solution is to upgradeto sunos 4.1 as soon as you can.
--hal stern
  sun microsystems
  northeast region consulting group
AB> I have had problems with full partitions, but it has also occured
AB> on partitions that weren't.
AB> As for upgrading to 4.1-beta (which we have), if the problem hasn't
AB> a solution yet for 4.0, chances are that they haven't been fixed for 4.1
From: kevin%idacrd@Princeton.EDU (Kevin Underriner)
Last year, we were experiencing the same problem. It was very intermittent.
A call to Sun produced the reply that this was a known bug that has been
assigned highest priority.
Around a month ago we received a patch tape for some NFS binaries. Sun said
this may clear things up, but they weren't sure. We haven't seen the problem
since then, however we were seeing it so intermittently, this doesn't mean
Try another Sun rep. See if you can get the patch for the NFS binaries.
Kevin R. Underriner (609) 924-4600
IDA/CCRP kevin%idacrd@princeton.edu
Princeton, NJ USA
From: Jay Lessert <jayl@bit.uucp.fr>
Sounds like bugid 1026933, "NFS confused file", no fix, no patch, "#1 priority
bug inside Sun". (I actually have reason to believe that last, BTW).
Actually, there is a patch, which we are running, which had zero effect.
I'm told that at least one other site sees a 2-3X reduction in number of
read errors when running the patch. This patch (actually a set of patches
which addresses a number of other problems as well) is called: "NFS file
corruption patch tape version 2.0". I've included a description of the
patchset at the end of this message.
AB> Note that this a different problem that we've heard about in the past:
AB> contents of files are intermixed. We are seeing no read errors and
AB> we've had no confused files -> just "empty" ones.
It can happen on *any* Sun client, but is apparently extremely dependent on
data/activity/phase-of-moon/etc. We've seen it happen at least once on every
piece of Sun hardware running 4.0.X we have, except 3/50's. (4/280, 4/60, 3/260
When it happens here, it acts like a "mode"; that is, the affected client
will start seeing NFS read corruption on reads from all servers (randomly,
though, not consistently).
What we see on the client is that:
   An NFS read will be partially or totally replaced with what appears to
   be random data from the client's NFS buffer cache (that is, the corrupt
   data is either part of another recent NFS read/ write or is a bunch of
   nulls, or both).
   The file appears (to ls, vi, emacs, etc.) to be the same length as the
   "true" file on the server.
   The corruption appears to be a "mode"; that is, once it happens, it
   continues to happen randomly/sporadically for NFS reads on that client
   from *any* server (if there is more than one server). It can only be
   cleared (temporarily) by a reboot.
   If one forces the client's NFS read cache for a given file to be
   flushed (say, by rewriting the file *on the server*), the next NFS read
   will often be correct.
   It can happen on any Sun3/Sun4/Sun4c client running any 4.0.X OS.
   The problem appears to be exacerbated by client applications that read
   and write large numbers of small files very quickly.
Your perception that SunOS 3.x is "ok" and the problem started with 4.x is
completely accurate. 3.x NFS is/was a totally solid implementation. 4.0
NFS is junk (no exaggeration)!
There is hope, however. BIT is a (the only?) test site for work on this
problem, as Sun is unable to reproduce it internally. We have been
working directly with Sun kernel engineers for the last month and are
currently running an experimental test kernel on one of our 3/260's which
seems to eliminate the corruption completely. This test kernel has other
problems (it was just an experiment), but I believe that Sun *does*
finally have a solid handle on how to implement a clean fix. No
guarantees, of course, but I'm guessing they'll ship a "production
quality" patch sometime in the next two months.
Jay Lessert {ogicse,sun,decwrl}!bit!jayl
Bipolar Integrated Technology, Inc.
503-629-5490 (fax)503-690-1498
Description of NFS Patch #2
 README: U.S. Answer Center 12/11/89
         NFS file corruption patch tape version 2.0
 Problem description:
This patch tape consists of a collection of bugfixes related
to NFS file corruption.
 Fix description:
 Install instructions: Note: sun4c is not available.
AB> ... installation instruction removed from insert ...
NOTE: If there are diskless clients, you will also want to
      put these .o files in the appropriate /usr/share/sys/`arch`/OBJ
      directory and rebuild the kernels that the diskless clients
 Bug Id: 1026933
 Release summary: 4.0, 4.0.1, 4.0.3
 Fixed in releases: 4.1 [NOT TRUE! Jay Lessert]
To: jayl@bit.uucp
Subject: nfs patch descriptions
Status: R
Here is the only thing that the engineer had in the way of descriptions.
Is this what you want?
This is an explanation of the various bugs on the patch tape.
        There are two bugs under this bugid:
                1. stale filehandle
                2. rexd did soft mounts
        The first one has to do with the fact that an NFS filehandle
        contains a "generation" number of a file. There is a chance
        that a client can get a filehandle for a file, the file can
        be re-created on the server, and then the client tries to
        access it. The recreated version of the file has a new
        "generation" associated with it, so the client gets a stale
        filehandle error.
        The second bug was that rexd did its mounts "soft" instead of
        hard, which meant that if a write failed and the application
        was not expecting this, then things appeared "corrupted" during
        a pmake. This probably resulted in files not found.
        bug: readahead was being done on non-cached files.
        symptom: inconsistency between clients for file locked files.
        result would probably be that the data the clients wrote would
        be similar to if the file had not been locked: unexpected
        final data.
        loaded/slow server or busy network and "corrupted files"
        what happens: when the server doesn't respond fast enough (for
        whatever reason, the net could be clogged or the server itself
        could be very busy) the client retransmits its requests. Some
        requests are non-idempotent: they can not be applied more than
        once without messing something up. The following are non-
        idempotent requests and why:
                remove, rmdir, rename:
                        if the first request succeeds but the client
                        times out waiting for the reply and retransmits
                        the request, then the second request will fail
                        (because the first one succeeded and file is no
                        longer there) and the client will think the
                        request failed even though it didn't. This is
                        fairly common since the successful request is
                        more likely to take longer and timeout than the
                        two things can happen here:
                        1. we expect the operation to fail if the file
                            already exists (if we are doing an O_EXCL
                            "exclusive" create). If the client doesn't
                            get its response and tries again, the first
                            request may have succeeded and the second
                            try will report failure back even though the
                            file was successfully created.
                        2. for non-exclusive creates: file truncation
                            can occur if a retransmitted create is
                            serviced after the first write to the file
                            because create "zero's" the file if it
                        has the same semantics as "exclusive" creates.
                write, setattr:
                        if a successful request has completed but took
                        a few retransmissions, then the client continues
                        to write (setattr) to the file and it's possible
                        that one of the retransmissions can show up and
                        un-do a write (setattr).
        final symptoms:
                I think there are two threshholds of server load that
                trigger two sets of failure:
                        1. the failure of remove, rmdir, exclusive
                            create, mkdir, rename: this can happen
                            fairly easily just because the server takes
                            a little too much time to respond.
                this first set of failures will look like failed rm's,
                create's, etc., and what it looks like to the user
                depends on how the application deals with the failure,
                and if it further screws things up that the request
                didn't actually fail.
                If the files being created are read-only, then create
                failures sometimes get "permission denied" messages.
                If many files are being removed at once (such as "rm *"),
                then if one of the files appears to fail, rm reports a
                message "internal synchronization failure". The file
                should still end up removed, though.
                        2. write, setattr, non-excl create truncation:
                            this requires a very busy server or a very
                            congested network because the retries get
                            out of order.
                this second set of failures will look like corruption
                in the file that is being written to. The type of
                corruption could be null's, probably not in regular
                amounts (like a whole blocksize), or maybe even just
                wrong data.
                unfortunately, none of my testing was on a congested
                network, so I don't know what that would look like, but
                the second set of symptoms takes a server that is paging
                as fast as it can due to low memory or a load that is
                several times higher than the first set of symptoms.
        This bug is caused by a wrong page being paged in. It can
        happen to pages backed by swap, so it causes corruption only
        to executables that modify their text. From the bugfixer:
                The symptom would be a program starting up, dirtying
                its stack or data space, then using the stack/data and
                finding different values there than what it had
                originally written. Depending upon the program, this
                could cause it to barf and die, but from (say) chasing
                a bad pointer, not from executing garbage instructions.
                (Although a function returning through a corrupted stack
                frame could branch into data and look like it was
                executing garbage instructions.)
                The only program to solidly evidence this problem was
                Sundiag. As the bug report describes, the system has
                to be very busy (lots of disk activity AND lots of
                process creation/destruction) for this to happen. If
                your problem is occuring on a lightly-loaded system,
                then it can't be blamed on this bug.
        This bug is caused by a dirty bit on a page not being set
        so the page does not get written out. It only occurs on the
        3/260, 4/2xx hardware types. It appears as a write that
        doesn't get done even after the data should be flushed, so
        depending on what the application is writing (how much at a
        time) it could appear as though something is missing in the
        file or the final data is wrong.
        This bug is caused by errors that are occuring but not being
        set in the RPC reply. What happens is the read actually failed
        but the client doesn't know so it passes a "successful" read up
        to the application which results in nulls in the file (and the
        number of nulls should be exactly what the read size was).
        This is the infamous "confused file" bug that Rutgers reported.
        The symptoms are a file is read by a client and a page of it (8k)
        will be a page from another (unrelated) file. Fixes were made
        having to do with allocating pages at interrupt level which
        reduced the frequency of this bug, but did not eliminate it!
        That's all the patch tape has.
        *** WARNING *** this bug is not fixed by the patch tape.
From: bien@aerospace.aero.org
This sounds like one of many NFS problems that Sun is working on. Here's
my log of my interaction with Sun on the subject. As you can
see we haven't gotten very far since last July. On the bright
side, they stopped trying to dublicate the problem at Sun
about 6 weeks ago and are working with a site in San Jose who
is experiencing the problem. Last I spoke to Rohit (about a week
ago), he thought they were zeroing in on the problem but still
hadn't started working on a solution.
Note that there are a couple fixes that help (but don't totally
fix) the nulls in file part of the problem.
I'll send another letter with the README files from all
the patches we've rec'd to date.
AB> not included here due to its length
Good Luck!
SO 334534, changed to 353139, changed to 366054, changed to 412979
AB> Contents abbreviated, for length sake
Problem: Files occassionally get munged on NFS client. Usually,
          shows up as munged/lost mail on wisteria. Has happened
          on fungus.
SO 334534 Pravin (pravin@sun.com) 415-336-1028
?/?/89: Tried running without automounter on wisteria to see
         if that would help -- it didn't.
7/23/89: Rec'd new /sys/OBJ/sun[34]/nfs_vnodeops.o
          See ./nfs_vnodeops.o.Sun3_patch and ./nfs_vnodeops.o.Sun4_patch
8/8: Munged file problem reappeared, called Pravin
AB> ....
12/12: talked to Al Lopez -- there is a new patch tape available
        that fixes most of the problems (like nulls in files) but
        doesn't totally fix the problem of one file showing up
        as another file (but does reduce this problem). He's
        Fed Ex'ing the tape to me today. They are working on
        a fix for the last problem. The patch replaces about
        9 kernel modules.
12/12: Talked to Huan -- he closed the old SO so I had to open
        a new one -- SO 412979
12/13: Patches for Sun 3's and 4's rec'd and installed on all
        the 3's and 4's except sparc. See ./nfs_munged_files_fix/PATCH3/README
        for details on the patch.
12/14: Reassigned to Al Lopez 415-336-5262 alby@sun.com
12/20: Many systems crashed with biod panics -- see
        nfs_munged_files_examples/messages.Dec20 for example
        /usr/adm/messages entries (wisteria, zapodid, lotus).
        Alt crashed with a "panic: setrun" and the Sun 4c's crashed
        with ldata faults in biod.
AB> ...
1/12/90: Rec'd a new nfs_server.o for Sun 3, Sun 3x and Sun 4.
         See nfs_munged_files_fix/PATCH4/README. This patch
         should fix the problem of null's in files.
         Patch was incomplete -- he's Fed Ex'ing a replacement
         patch to me today.
1/12/90: Talked to Al Lopez -- they think the problem of the
         contents of one file showing up in another is a page
         allocation problem. A special task force of 5
         engineers was created yesterday to work on this
         problem -- it has been given the highest priority
         possible. He'll keep me posted.
1/15/90: Rec'd new nfs_server.o for Sun 3, Sun 3x and Sun 4.
         Installed on all the Sun servers except lotus.
         See nfs_munged_files_fix/PATCH4/README.
2/5/90: Sent mail -- still getting nulls in files.
2/6/90: Rec'd response -- he knows -- still working on it.
2/12/90: Finally reached Al Lopez -- Sun has created a new group --
         the "Corporate Technical Escalation Group" which works
         with the engineering group to get things resolved.
         The AnswerLine people are not going to do this anymore.
         I should talk to Rohit Aguar(sp?) at 415-336-1421 to
         find out the status. Called and left a message for
         him to call me.
2/20/90: Talked to Rohit -- they've *finally* given up trying to reproduce
         the problem up there and are having a couple sites run a diagnostic
         patch to help them figure out what is going on. He'll send me
         the patch via e-mail tonight or tomorrow morning.
2/21/90: Called Rohit -- there was a problem with the patch that they
         need to fix before sending it to us. He'll send it as
         soon as it's ready.
2/22/90: E-mail from Rohit -- did we get any patches yet? What
         type Suns do we have? Replied -- see
         His e-mail address is rohit@sun.com
From: David Collier-Brown <davecb@nexus.yorku.ca>
  Hmmn. We have bursts of the same problem, also over NFS, but this is on
SunOS 3.5. Specifically, when a file is being written by Gnu Emacs (or any
other editor that creates and copies to a "new" file from an "old" one), it
may suddenly become holey... And the user is usually not impressed with the
replacement of her work by digital holes (:-().
   It's currently considered a serious risk item here, so it looks like its
time to bug my poor field engineer once again. Monday, when he comes for his
--dave (any other information would be appreciated) c-b

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:05:56 CDT