SUMMARY: Sendmail/popd/NIS/mh problem - LONG

From: Rasana Atreya (Rasana.Atreya@library.ucsf.edu)
Date: Tue Aug 27 1996 - 17:38:56 CDT


Hi!

I still do not have a resolution (my call with Sun Tech support is still open
but seems like they do not have answers either), but I did get lots of good
suggestions, which might help other people.

I'm including my post and the responses below.

Thanks a lot to:

From: Chris Seip <seip@CS.UCDAVIS.EDU>
From: blymn@awadi.com.au (Brett Lymn)
From: ahill@lanser.net (Alan Hill)
From: JOHNSON G L <gjx@DSGJX4.DSRD.ORNL.GOV>
From: kpischke@cadence.com (Karlheinz Pischke)
From: Mary Riley <Mary.Riley@INFORES.COM>
From: daves@adnc.com (David Schiffrin)
From: Ye-Fee Liang <yliang@BCHYDRO.BC.CA>
>From matthew@imonics.com Thu Aug 15 21:19:17 1996
From: Milt Webb <milt@iqsc.com>
>From jackg@calfp.com Fri Aug 16 09:52:15 1996
From: Mark Bergman <bergman@phri.nyu.edu>
From: mshon@sunrock.east.sun.com (Michael J. Shon{*Prof Services} Sun Rochester)

Rasana

---------------------------------------------------------------------------
>Setup:
> Mail server: (with popd through mh mail - does that make sense?)- Sparc 10
> : sendmail DVIDA-1.5
> : SunOS 4.1.3_U
> : name: athena
> : is also our NIS master
>
> Problems
> -- Email corruption (mail being appended to). Some people use mh,
> Eudora.

        Sequentially arriving emails were getting appended to to the first email
        one after the other. I was asked to check if my mail spool partition
        was NFS mounted (it was) and if I was using the "actimeo=0" option to
        the mount command (I wasn't).

Other responses:

        - I also use MH mail on a Solaris 2.5 system, and my sendmail
        config file required a slight change before the MH's "inc" command would
        reliably separate individual messages. I don't know whether this could
        also affect Eudora, but it's worth a try.

        In your /etc/mail/sendmail.cf file, presuming it's a plain-vanilla kind
        of a mail setup, look for a line like this:

        Mlocal, P=/usr/lib/mail.local, F=flsSDFMmnP, S=10 ... and so on

        You want to add a capital "E" option to those "F=..." options, like so:

        Mlocal, P=/usr/lib/mail.local, F=EflsSDFMmnP, S=10 ... etc.

        Me: I did the above this morning. Haven't heard any complaints as yet.
        Too soon to tell, though.

        - This is the symptom of file locking failure. You don't give the
        details of your mailing system, so I can't give you any particular
        hints, except to look into the final deliver mechinism, and how your
        mail readers, and their support programs implement file locking.

        Me: Well, I do not understand enough to give details.

        - Don't know if this'll help, but I've seen something similar on a
        machine I have here that does this when it's mail spool directory fills
        up. The write of a message fails and corrupts than after some space is
        available the next message is 'glued' to the previous corrupt one. When
        you view you see both.

        Me: Nope, everything was fine.

        - I'll take a stab at it. Is the filesystem where the mail gets written
        automounted. If so, add the mount option "noac".

        Me: We do not auto-mount anything.

        - Maybe your mail box is corrupted; try this:
        # mv /var/mail/rasana /var/mail/bac.rasana

        Me: Did this, didn't work.

>
> -- We have aliases of the format
> admin:a, b
> admin: :include:/var/mailalias/admin
> When email is sent to admin in format 1, the people on the list
> receive email *all* the time. When the second format (include) is used,
> they get it only some of the time

I was informed that this makes sense...if the client machine doesn't recognize
the alias, it won't deliver mail. Was asked to check the mail headers on mail
sent via the second form (the "included" alias) to see which clients were able
to expand the alias. But this error "mysteriously" stopped (for ever, I
hope).

>
> -- Some machines do not recognize the second format anymore, example:
> On client1:
> /usr/lib/sendmail -v -bv admin
> admin... aliased to :include:/var/mailalias/admin
> :include:/var/mailalias/admin... including file /var/mailalias/admin
> :include:/var/mailalias/admin... Cannot open /var/mailalias/admin:
> No such file or directory
> admin... aliasing/forwarding loop broken

Me: I was asked to check if the file /var/mailalis/admin existed on the machine
client1 and if it readable by the sendmail process.

Me: /var/mailalias/* exists only on our mail host, and it is not NFS mounted by
our clients. All our outgoing mail is directed to the mailhost by all
clients, and sent out from there. All incoming email comes to the mail host.
I did notice that if I NFS mount this directly onto a client, it works fine.
What I'm not sure about it why it broke suddenly. I'm told this was set up in
1993 and has been working fine since.

Other response(s):

        - if you have included the new database in sendmail 8.7.x
        (compiled it yourself), NIS can NOT handle the new database.
        For NIS you will have to use the "original" sendmail (8.6.x) of
        Solaris 2.x The hack is made in the NIS makefile:

        [ < this is original
> this the modified one ]
        <
        aliases.time: $(ALIASES)
        ...
        @/usr/lib/sendmail -bi -oA$(YPDBDIR)/$(DOM)/mail.aliases;
        ...
        <
>
        aliases.time: $(ALIASES)
        ...
        @/usr/lib/sendmail.ORG -bi -oA$(YPDBDIR)/$(DOM)/mail.aliases -C/etc/mail/sendmail.cf.ORG;
        ...
>
        Take care if you don't have the same situation as I have this hack will
        not help anything ...

        Me: We use NIS and we have Solarix 2.5 clients, but our mailhost/NIS
        server is still SunOS (4.1.3_U1), so the above will not work. Besides,
        this has worked for years.

>
> On client2:
> /usr/lib/sendmail -v -bv admin
> admin... deliverable: mailer TCPrelay, host athena.library.ucsf.edu,
> user admin

Me: Was asked if there was a user admin on client2, or if client2 had
/var/mailalias/admin. No to both.

Was told that client2 is doing something entirely different; forwarding the
mail to a server which (presumably) DOES have access to the alias file.
sendmail.cf may be different on these two clients.

It isn't. Also, we were having the problem of aliases not being recognized on Macs and PCs using popd.

>
> -- /var/spool/mail on mail server, has an additional file
> ".username.map". I've never seen this before. Yes, these users are the
> ones having problems with the include format, and email corruption
> mentioned above. I deleted them, and problems continue.

Response: Those files are from the pop program (which pop are you using?) If the
files are 0 length, you can delete them, if they are non-zero, they
have the user's e-mail that was transfered from their "regular" mail
spool file, but not delivered (usually due to an interruption in the
POP retrieval process--a modem disconnect, client workstation crash,
etc.). If the .map files are not empty, do not delete them, as those
messages may have been removed from the regular mail spool file, but
not delivered. I doubt that the .map files are related in any way to
the problem with expansion of the alias.

>
> -- /var/adm/messages on my mail server:
> Aug 22 08:23:01 athena sendmail[1366]: gethostbyaddr: kingcrab.informix
> .com != 1 58.58.16.26
> Aug 22 08:23:01 athena sendmail[1366]: gethostbyaddr: kingcrab.informix
> .com != 1 58.58.16.26

Response: This is an alert that the machine identifying itself as
"kingcrab.informix.com" doesn't have the IP address "1 58.58.16.26".
Note the space between the 1 and the 5. I don't know where that's
coming from (bad DNS data, sendmail bug, gethostbyaddr() problem,
etc.), but it's true--kincrab.informix.com does not have that
address...it's really 158.58.16.26.

> Aug 22 12:19:21 athena popd[5546]: lost connection

Response: The lost connections indicate that a pop process ended unexpectedly--a
modem dropped, a client workstation crashed, someone exited from the
shell under which they were running mh, etc. These lost connections
result in the .map files being left behind.

> Aug 22 13:11:00 athena mail_not: error No such file or directory

Response: This may be an error from the attempt to find a non-existant
/var/mailalias/admin file.

>Someone attempted to reply to my email from outside our domain and got these
>messages:
>
>mail /var/spool/mail/atreya cannot append
>(-rw------- 1 atreya ccs 820 Aug 26 10:43 /var/spool/mail/atreya)
>
>mail cannot open dead.letter
>(-rw------- 1 atreya ccs 0 Aug 26 09:39 dead.letter)
>
>554 atreya service unavailable
>
>I *am* getting from from outside (from some domains, atleast). My mail also
>goes out, but when someone attempts to reply, it bounces.
>

Me: No response.

>This is from my /var/spool/mail/core:
>
>strings core | grep -i Fail
>
>
>DBM store of %s (size %d) failed, alias rebuild failed
>%s: arpatounix: asctime failed: %s
>%s: getservbyname(smtp) failed, sleeping
>%s: getservbyname(smtp) failed, sleeping
>makeconnection: funny failure, addr=%lx, port=%x
>%s: gethostbyname(%s) failed, sleeping
>flock failed for db %c [%s], fd %d
>Non-authoritative answer or name server failure
>previous server failure for %s, skipping lookup
>getmxrr: res_search failed (errno=%d, h_errno=%d)
>%s: getmxrr: Name server lookup failed for %s
>getcanonname: res_search failed (errno=%d, h_errno=%d)
>%s: getcanonname: Name server lookup failed for %s
>%s: getcanonname: dn_expand failed for %s
>Host Name Lookup Failure
>nlist %s failed: %s
>open %s failed: %s
>read %s failed: %s
>----- rule fails
>no match, failing...
>ambiguous match, failing...
>%s: unlink-fail %d
>res_search failed
>res_query failed
>res_query: mkquery failed
>socket failed
>connect failed
>write failed
>read failed
>read failed
>SERVFAIL
>load_dom_binding: malloc failure.
> _times_power failed due to exponent %d %d %d leftover: %d
>Failed. Return.
>Failure - remove markers.
>Failure - remove markers.
>Failed - remove mark

Responses

        - Looking at words in the strings output I'd say you're running some
        kind of socket/ pop mail that isn't configured/installed right...but
        that's a WILD guess and even if correct won't actually help you to fix
        whatever the problem is.

        Me: No idea.

        - looks like a core generated by a crashing sendmail. strings may not
        be the right tool to find what you're looking for. I am unsure myself
        what you are looking for. perhaps you would have some better success if
        you were a bit more specific. most of the 'strings' you've pulled out
        are from error messages I recognize from sendmail. They're static
        strings compiled into sendmail. If indeed you're trying to find out why
        sendmail crashed, dbx would be a better tool. It allows you to see the
        contents of the stack as the process dumped core.

        Me: unfortunately, I have this habit of blindly removing core files.
        Now that I've learnt about dbx, I won't. But for this time, it too
        late.

        - Is your DNS running?

        Me: Yup

        - This, unfortunately, tells us nothing. A core dump is a copy of the
        memory image of the program at the time that it failed. You need to
        run something like dbx on the core to attempt to locate the problem.
        Do you have any idea what sendmail was doing when it dumped this core?

        Me: Nope.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Rasana Atreya Voice: (415) 476-3623 ~
~ System Administrator Fax: (415) 476-4653 ~
~ Library & Ctr for Knowledge Mgnt, Univ. of California at San Francisco ~
~ 530 Parnassus Ave, Box 0840, San Francisco, CA 94143-0840 ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:09 CDT