SUMMARY (No Answers) Solaris Networking (Application Layer Snooping?)

From: Brett Thorson <bthorson_at_ekosystems.com>
Date: Wed Oct 10 2001 - 10:21:04 EDT
Well, I got two replies:
1) Check for interrupts.  They aren't sharing interrupts
2) Check auto-negotiation / Bad Hub

Well, neither of those really hit the nail on the head.  But I did start to
notice that this occurred consistently after about 15 days of uptime.  The
fix (yeah, this is a hack!).  I put in a cron job to "Down/Up" iprb0 (The
external network card, not the card that appeared to be having the problem)
and the system immediately responded.

After doing this, I noticed that the number of idle connections (reported by
netstat -a) dropped significantly.

Kinda strange, totally a hack, but it works!

--Brett

----- Original Message -----
From: "Brett Thorson" <bthorson@ekosystems.com>
To: <sunmanagers@sunmanagers.org>
Sent: Monday, October 01, 2001 10:52 AM
Subject: Solaris Networking (Application Layer Snooping?)


> Preamble: Sorry if this is a repeat; first version didn't go through.
> -------
>
> I've seen this twice now, and I am not even sure where to begin looking.
>
> Consider a Solaris x86 box (BOX) with two network cards.  No routing
between
> the two.
> iprb0 goes to the outside world
> iprb1 goes to an internal hub connected to devices (DEVS) that get their
> address via DHCP from BOX.
>
> I come up to one of these machines, and find that the application that
> communicates between the DEVS and the BOX is no longer communication. But
> DEVS does have a dhcp address from BOX.
>
> I try telnetting to DEVS from BOX.  No response.
> So I ping DEVS from BOX, no reply.
> Start snooping on iprb1.
>
> Ping DEVS from BOX. Ping (the little program) receives no reply, but I can
> see the packet going
> from BOX to DEVS, and I can see the packet replying from DEVS to BOX.
> However ping (the program) says no response.
>
> Unplug & reset DEVS.  DEVS gets a DHCP address no problem (Confirmed: it
is
> not re-using an old address, it actually gets assigned a new address from
> BOX).  This leads me to beleive (along with the initial communication
> architecture between BOX & DEVS) that UDB / Broadcast stuff is working and
> moving around.
>
> So that means (here is my jump/stab at it) that there is something going
on
> in the Sessions/Presentation/Application layer for things to not be
working
> right?!?!
>
> The routing tables haven't changed.  (I checked netstat -rn) The IP
> addresses haven't changed.
> Nothing changed (as far as I have determined) on the box for this to
> precipitate this problem.
>
> The first time that this occurred, I solved the problem (after these
> diagnostics) by rebooting BOX.
>
> The second time, I brought the card down and then back up with ifconfig,
and
> the system flew, it ran just fine.  I watched the snoop traffic on iprb1,
> and it looked exactly the same as when the system was failing.  The
routing
> looked the same.
>
> There weren't any errors in syslog or dmesg.  The only errors I really saw
> were java socket timeout failures (When it was trying to open a TCP/IP
> Socket after finding DEVS via broadcast)
>
> There isn't a whole lot of traffic in iprb1, but I did see a few Oerrs
> (about .1%) in a netstat -i
>
> If the packets are actually moving, but the apps aren't seeing them.
Where
> do I start poking my nose to see what's going on?
>
> And even so, would this mean that broadcast (UDP)stuff works, but peer to
> peer (TCP) stuff doesn't for some reason?  And why would ping not work?
>
> Any help, advice, or whatever would be great for this one.
>
> --Brett
>
>
Received on Wed Oct 10 15:21:04 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:32:33 EDT