SUMMARY: help w/ 2 DIFF subnets on one eth interface - routing and 4min startup

From: Adam Singer (adam-s@pacbell.net)
Date: Mon Oct 11 1999 - 23:49:19 CDT


Dear Sun Managers:

Sorry for the very very late quote - [apply usual lame excuses here]
I am posting a summary both to thank the four fine Managers who
responded and to explain the situation for anyone or archive searcher
who might be interested. Below, in this following order you will
find: my thank you list, my summary itself, some specific replies to
various posters, copies of the posters own suggestions/comments, and
at the end my original question.

Those to thank are, in order of appearance in my mailbox:
Michael Cunningham <malice@exit109.com>
eric <eric@outlook.net>
Darren Dunham ddunham@taos.com
Nickolai Zeldovich <kolya@zepa.net>
>Leonard, Roger" <rleonard@cvty.com>

There are several answers the first of which is to never change
several things at once (duh I knew that!). I even had this sucker
mirrored so I could have just broken the mirror and played with half
of it, but no I was *in a hurry*. Second answer is that I am in the
San Francisoc/Bay Area of California and our new ISP *and* old ISP
were having *major* router / traffic issues in the San Jose/ Mountain
View Area. While the server and our new ISP are no where near there
all our traffic was getting routed down there! This problem lasted
over 3 weeks and sometimes the traffic and times were excellent but
80% of the time the ping's showed packet loss and the traceroutes
showed latency. So we just got on a new ISP and had a stroke of
*really* bad luck. The timing was such that we'd never know *if* it
was the server or not until service got better. Things have been
rerouted, our old ISP is still having problems but the new ISP did
some changes (briefly all our traffic was going several hundred miles
north to some router in Seattle). Times are now excellent.

However we did make two changes. We applied the GLM patch and the TCP
patch. I believe we had the TCP but not the GLM patch and the TCP
patch was behind revision. I am sorry I do not have the exact number
but they are the patches that patch the stuff in /kernel/drv... so a
sunsolve search will get them. The application of these two patches
seems to have caused the minute to 4minute delay in activation on
bootup of the interfaces to go away. The servers work immediately
upon boot and this is the only other thing we changed. I am pretty
sure it was the lack or being behind in revision of the patches
because we were able to replicate the problem on our development LAN
and make it go away with the patches.

As to my question about the tweaking necessary to bind multiple
virtual IP's of *different* subnets to the same single hme0 interface,
while I received no replies either here or in the newsgroups, I did
find one Sun Infodoc on FDDI whose examples had IP's of different
subnets. All other docs just showed the same subnet. Regardless this
just continues to work without any routing tweaking or anything. As
the saying goes, it just works. I'd still kind of like to know
how/if/why but the web server has been going fine for the last several
weeks so it is the last of my many worries.

thanks again for your replies and patience,
Adam Singer

>Here are some replies to specific queries
To Eric I respond that, yes I did do something similar to tcpdump:
snoop, which I like better and it's native to Suns.

To Nikolai about fix-modes: I now am pretty sure it had nothing to do
with fix-modes but it was bad luck that this had to be the first
server we tried this program on (ouch ouch, don't ever do that again,
what are test machines for!). And from all I have and continue to
read about fix-modes in the various newsgroups it is and continues to
be a very good program. And the author Caspar Dik is one great and
helpful guru and has probably been doing this stuff longer than I've
been alive :)

To Leonard, Roger about auto-neg vs. forced full:
You are correct in your suggestion for most - this is a common gotcha
but curiously (or not) our ISP only offers a 10baseT connection to
their switch. My fault for not having specified that since the
interface is 100baseT (hme0).

>Here are the responders replies:
>From Michael Cunningham:
If its 4 minutes later.. that probably means that your routes
are updating correctly so it works. You might want to do a
netstat -r when it first comes up and after it starts working
and note the differences.

>From: eric
Have you done a tcpdump on the machine?
...
Possibly the virtual IP's are taking the static as the default
gateway *something like 0.0.0.0).

>From Darren Dunham (my Q is preceded by >)
> The problems are:
> 1. I don't understand how this is working except that the system
> see's each virtual IP and since it is virtual just sends data off on
> hme0 even if it is theoretically a different subnet.

In general, yes.. Incoming and outgoing traffic are not mapped.
Regardless of where incoming data arrives, the outgoing traffic willgo
through the routing table. If the table decides that the fastest way
is to go through the 206.123.1.254 gateway, then it'll have to send it
via hme0.
...
Hm.. I'm wondering how data is arriving at the 206.206 subnet.
Perhaps the ISP side is doing some strange dynamic configurations..
I'd snoop and see what the gateway to the 206.206 interfaces are.

>From: Nickolai Zeldovich
What's this fix-modes script you keep referring to? Sounds like
something just asking for trouble if it magically does things :)

>Leonard, Roger" <rleonard@cvty.com>

sounds like a auto-negotation problem. try setting both the switch
and the hme ports to full duplex versus auto-negotiate and see if that
cures the lag problem.
.
.
.
MY ORIGINAL POST:
> Greetings,
>
> I have just setup a webserver which has one main IP on one subnet and
> then 5 IP's on a completely different subnet. I have to do this
> because the ISP only has one IP on the same net as it's gateway. The
> other IP's are on the ISP's virtual LAN. They said we need to make
> the single IP the "main" IP since that is the net the Gateway is on.
> But they had no idea as to what, if anything special needed to be done
> for the virtual interfaces with IP's not even on the same subnet.
>
> The server is an Ultra 60 running Solaris 2.6 newly patched.
> hme0 206.123.1.20
> hme0:1 206.206.3.100
> hme0:2 206.206.3.101 etc.
> Gateway: 206.123.1.254
>
> By defining the one IP to be hme0 and then the other 5 IPs of the
> different subnet to be hme0:1 thru hme0:5, it *seems* to work (however
> it takes four minutes from a reboot to come alive). By work I mean
> that both sets of IP's see the outside world and are accessible from
> the outside.
>
> The problems are:
> 1. I don't understand how this is working except that the system
> see's each virtual IP and since it is virtual just sends data off on
> hme0 even if it is theoretically a different subnet.
>
> 2. The server comes up fine (configuring: hme0, hme0:1, etc.) and once
> the system is up, NOTHING works! Then after about 4minutes it
> magically works - this scares me. ifconfig -a and netstat -nr show
> all to be good - both while networking doesn't work (in those first
> four minutes) and when it magically starts working.
>
> Any ideas what is wrong? The only thing we did to the server (other
> than change the virtual ip's to ip's not on the same subnet) is run
> fixmodes to tighten the file permissions. We then undid the changes
> to fixmodes thinking that might have had something to do with it but
> alas, the problem remains. I check permissions in /devices against
> another system, as well as /etc/host*. I am not sure what the problem
> is.
>
> Is it possible the network is having trouble figuring out what is
> going on but finally something times out? Is there some process or
> state I can truss to see what is going on? All this is happening on
> boot up so I am not sure what to do. Doing the interfaces manually
> causes it to work fine, but it may be that we are just doing each
> manually and so are slow enough that the interface "rights itself".
>
> Any clues or pointers to infodocs in sunsolve, previous posts to
> usenet, or sunmanagers that I failed to find in searching would be
> greatly appreciated.
>
> thanks,
>
> Adam
>
> email: a d a m - s (a t) p a c b e l l d o t n e t
>

email: a d a m - s (a t) p a c b e l l d o t n e t



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:29 CDT