SUMMARY: Problem with aspppd(1) on Solaris 2.6

From: Jeffery Small (jeff@cjsa.cjsa.com)
Date: Fri Oct 24 1997 - 11:45:10 CDT


I apologize for taking such a long time producing this summary, but
the problem proved to be difficult to solve and I have just nailed down
the complete solution.

The original problem:
-------------------------------------------------------------------------------
I have been using a ppp setup successfully on Solaris 2.4, 2.5 & 2.5.1.
I recently upgraded to Solaris 2.6 and started having a problem where
aspppd continues to constantly redial and connect to my ISP.

I have a 5 minute inactivity timeout set which works as expected. However,
10-15 minutes after the link is dropped, my system reconnects. I have been
unable to determine what is causing this, but the result is that I am
racking up huge amounts of totally unnecessary daily connect time. I have
had to resort to manually stopping aspppd and then having to manually
start/stop it each time I need a connection. This is a real pain and I
would like to get the system back to its old reliable self where aspppd
sits quietly until some explicit connection request was made, times out
as expected, and then sits quietly until the next explicit request.

The configuration:
        SPARCstation 20/61
        Running the standard aspppd(1) bundled with Solaris 2.6
        DNS (in.named) not active on this system
        Connecting to worldblazer modem on /dev/ttya

[...more description deleted...]
-------------------------------------------------------------------------------

The solution:
-------------------------------------------------------------------------------
The problem turned out to be a combination of the following two things:

1: Thanks to Partick Bigos <patrick.bigos@sun.com> at Sun Customer Support
    and to Joe Garbarino <jgarb@erim-int.com> for suggesting this solution.

    I started monitoring nscd(1M) activity by uncommenting the "logfile"
    line, setting the "debug-level" to a value to 10 in /etc/nscd.conf,
    and restarting the daemon (cd /etc/init.d ; ./nscd stop ; ./nscd start).
    I also ran snoop(1M) on the connection (snoop -d ipdptp0 | tee logfile).
    What I found was that nscd was generating "keepalive" operations on
    various remote sites which had recently been contacted. This would
    cause DNS connections which necessitated a reconnection of the link.

    Apparently the name service cache daemon (nscd(1M)) has the hosts cache
    enabled by default. In the /etc/nscd.conf file the line:

            # enable-cache hosts no

    is commented out. By removing the hash, and then restarting the daemon
    as discussed above, most of the spurious aspppd connections were halted.
    Why this works is still a mystery to me. If anyone knows why host
    caching is causing keepalive connections to remote hosts, I would really
    appreciate hearing the reasons so I can have a better understanding of
    what nscd(1M) is actually doing. Also, what changes were made between
    Solaris 2.5.1 and 2.6 that caused this problem to show up now?

2: After implementing the above, I notices that there were still a few
    asppp connections occurring when there was no obvious reason. I tracked
    the problem down to a couple of find(1) commands which were being run
    nightly from root's crontab. After many hours of manual investigation,
    I discovered that there was a new Solaris 2.6 directory "/xfn/_x500"
    which I had never noticed before. If you do anything to this directory
    such as ls(1) the contents, it would cause a remote connection. This
    xfn directory is part of the Federated Naming System (SUNWfns) and the
    _x500 subdirectory is probably created by the addition of the FNS Support
    For X.500 Directory Context (SUNWfnsx5) package.
    
    The immediate solution was to stop the find commands from descending
    down into this directory. The better solution may be to remove this
    package if it is not required.

With these two changes, my machine has been sitting quietly for over 36 hours
now, so it appears that the problem is solved.
-------------------------------------------------------------------------------

Credits:
-------------------------------------------------------------------------------
I really want to extend my sincere thanks to the following people for
responding to my request for help. Many of the suggestions were very
useful in getting me pointed in the right directions to finally track
down the ultimate solution. (In particular, I discovered that the
normal output from snoop is *not* the same as the packet data saved to
a file when using the -o flag. This realization turned snoop into a
useful monitoring tool! :-))

    Joe Garbarino <jgarb@erim-int.com>
    Cheryl L. Southard <cld@astro.caltech.edu>
    Jonathan Loh <jloh@futon.sfsu.edu>
    Erwin Fritz <efritz@glja.com>
    John W. Funk <jwf@ccuc.on.ca>
    Daniel R. Falconer <drf@dedalus.net>
    Daniel Kluge <danielk@tibco.com>
    Richard Skelton <rich@brake.demon.co.uk>
    Casper Dik <casper@holland.Sun.COM>
    Martin Huber <hu@garfield.m.isar.de>
    Bob Bridgham <robert_bridgham@b-e-s-t.com>
    David Crane <david.crane@east.sun.com>
    Ken Corum <Sun Customer Support>
    Patrick Bigos <patrick.bigos@sun.com>
-------------------------------------------------------------------------------

Summary of suggestions:
-------------------------------------------------------------------------------
Here, I will quickly summarize the suggestions submitted by the above
people.

Joe Garbarino <jgarb@erim-int.com>

    * Set the "keep-hot-count" entry to 0 in the /etc/nscd.conf. The man
      page says:

        keep-hot-count cachename value
          This attribute allows the administrator to set the number of
          entries nscd(1M) is to keep current in the specified cache.
          value is an integer number which should approximate the number
          of entries frequently used during the day.
      
      This suggestion would probable also work since this would likely be
      equivalent to disabling the cache.

Cheryl L. Southard <cld@astro.caltech.edu>

    * Touch the file /etc/notrouter. As the answerbook says:

        When the machine reboots, the startup script looks for the presence
        of the /etc/notrouter file. If the file exists, the startup script
        does not run in.routed -s or in.rdisc -r, and does not turn on IP
        forwarding on all interfaces configured "up" by ifconfig. This
        happens regardless of whether an /etc/gateways file exists.

    * Add a "default_route" line to the /etc/asppp.cf file.

Jonathan Loh <jloh@futon.sfsu.edu>

    * Turn on verbose aspppd logging by setting "debug_level" to a higher
      value (8 or 9) in /etc/asppp.cf.

Erwin Fritz <efritz@glja.com>

    * Reported having similar problems.

John W. Funk <jwf@ccuc.on.ca>

    * Try disabling routing discovery processes /usr/sbin/in.rdisc and
      /usr/sbin/in.routed.
      
      [I believe the presence of the /etc/notrouter file accomplishes this.]

Daniel R. Falconer <drf@dedalus.net>

    * Try stopping named(1M). [It was already disabled.]

    * Check for DNS resolution requests made to the remote nameserver
      (configured in /etc/resolv.conf.)

    * Consider removing the "domain" line from /etc/resolv.conf to stop
      local machine name lookups from going off site.

    * Setup named just to perform local name resolution and to run in
      debug mode to monitor DNS traffic.

Daniel Kluge <danielk@tibco.com>

    * Use netstat and snoop to track down a hanging TCP-connection, since
      TCP normally sends keepalive packets every 15 min, also if the
      connection has not been shut down correctly.

Richard Skelton <rich@brake.demon.co.uk>

    * Check that the network router discovery daemon is not running
      /usr/sbin/in.rdisc. You can hash it out in the file
      /etc/init.d/inetinit

Casper Dik <casper@holland.Sun.COM>

    * Run snoop on the link; perhaps it's a DNS request or some such?

Martin Huber <hu@garfield.m.isar.de>

    * Add "norip ipdptp0" to the file /etc/gateways. This prevents the
      transmission of rip-information over the ppp line, which could
      cause unwanted connections. [This was already implemented]

    * Try 'snoop ipdptp0' to see what is transmitted over the link.

Bob Bridgham <robert_bridgham@b-e-s-t.com>

    * Look for software regularly checking host, or doing a DNS request
      which will all bring up ppp.

    * Use a sniffer to see what are causing packets to try to go outside
      the machine.
-------------------------------------------------------------------------------

Thanks again to everyone for their help. All of it was greatly appreciated.

--
Jeff Small                 C. Jeffery Small & Associates    (206) 232-3338
jeff@cjsa.com              7000 E Mercer Way,  Mercer Island, WA     98040



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:06 CDT