SUMMARY: Very Slow NFS access

From: jhartzen@csc.com
Date: Tue Oct 03 2000 - 06:10:04 CDT


The problem I have experienced was with forcing the HME interfaces to
100Mb/s Full Duplex. This was also suggested by most of the responses I
received. This fortunately happened to be the very next item on my list as
I was regressing my steps one-by-one and testing.

I received the following set of responses, all of which may be applicable
for other people even if not all was applicable for me. Thank you all for
assisting me with so little information:

Joe Flechter:
My first inclination would be to check for a duplex mismatch on the
ethernet
interface.
If you've manually fixed the speed at the server end the switch may not
have
followed
suit. The quick and gross way to fix this is to set the speed on the server
then unplug the cable, wait about 10 seconds then plug it back in again.
This will "encourage" your switch to
use 100MbFD. Better still if you have management software on your switch
then set both ends of the link.

Michael Arndt
Try mounting with so called hard mounts if not done already
Seems to me that your measures slowed down tcp Performance
so after some timeout a retry is startet, to get nfs data
if not yet ready the access time grows by some exponentially
working algorithms. If you mount hard, the NFS Client will wait ...

Somewhat strongly oversimplified, but seems youre in a hurrry for long
explanations

Justin Clift
Did you alter the /etc/services file?

I would guess that your problem is in one of the services in
/etc/inetd.conf. I can't think of any other network layer settings which
would affect NFS, unless you are running in a NIS+ environment and turned
off rpc. (bad thing to do).

A quick guess would be to re-enable ALL the services in /etc/inetd.conf (I
know most of these are redundant), paying careful attention to the rpc
ones,
then restart inetd (don't forget this!). IF that works, then disable them
one at a time, when you have time to.

Tom Davidson
 If you use cisco catalyst like I do, I would suspect a speed/duplex
auto negotiation problem. On SGI's, SUN's and NT boxes I've seen the
negotiations to a cisco switch end up screwed up. If you manaually force
the speed and or duplex on the switch, the computer may not change. Or
if you force it on the computer the switch may not negotiate properly.
Sometimes reboots still will not fix the problem if you have manually
set the speed and or duplex in either device. I've usually have to look
at the computers and see what they say they negotiated and then look at
the switch with the show ports command. When a machine has horrible
throughput, this is my first suspect. The computer will work okay on
small networking stuff, but when you try to move a file any bigger than
a meg, then you will see that is seems to take forever and you will get
errors. Make sure the switch and computer say the same duplex and speed.
You can try forcing and rebooting, but sometimes you have to force the
computer and switch to a setting and check to see if you can ftp a 1meg
file fast. If not, the settings are most likely mismatched. On SGI's, I
don't believe you can force the speed and duplex. The models I have
worked with didn't have divers that supported that. I could only see
what the sgi said it negotiated by #ifconfig -av. For the version of
solaris you may have to read the ifconfig man page to see what flags you
can pass to ifconfig to see the link information. Once the link
information is the same for a machine and it's particular port on the
switch, it should transfer a 1meg file in a flash, of course that's
assuming you network isn't dog slow already.

 I hope this information may help you. I can't imagine hardening a
machine will kill network speed. I do see you said you forced the speed
and duplex, which I believe is your problem.

Michael Maciolek
Hopefully you've solved this problem by now...

I'd focus on your network switches. If they autonegotiate, while
the Suns are forced to 100M/full, it's quite possible that you
have a negotiation failure and the switches are trying to talk
100/half.

That would readily account for the poor performance. I've seen
this with Cisco switches mixed with Suns, and I *always* force
the Cisco to 100/Full whenever I force the Sun's settings.

Matt Palmieri
Make sure your portmapper is still running. I think it starts out of a
script in /etc/rc2.d

Michael Hill
The first thing I suspect in cases of network slowdown is name
resolution. Did the hardening possibly alter /etc/nsswitch.conf ?

Theodore Tickell
I've not done anything like that with an e10k in multiple domains, but I
have had two sparcs side by side with a similar problem.

In my case it turned out that when I forced the interface to 100full, the
switch (in this case a catylyst - I ws surprised) stopped sensing
right. That in combination wiht my tcp/ip customizations caused havoc on
the box that i had 'finished' setting up. I was using an qfe card, by the
by, and everything else was pretty standard for a web server tuning. I
got okay performance reversing the tunings, but the big gain was obviously
reseting the interface - then the tunings helped.

Casper Dik
What type of switches do you use?

My experience is that certain switches will fallback to 100Mbps half-duplex
when Sun's are forced to 100Mbs full (forced means no negotiation;)

Try using auto negotiation instead.

Kris
Verify that you have auto negotiation off and force full-duplex!

* Set auto-negotiation to full force 100 on quad fast ethernet 0
set qfe:qfe_adv_autoneg_cap=0
set qfe:qfe_adv_100T4_cap=0
set qfe:qfe_adv_100fdx_cap=1
set qfe:qfe_adv_100hdx_cap=0
set qfe:qfe_adv_10fdx_cap=0
set qfe:qfe_adv_10hdx_cap=0

If you are using hme then replace the variable.

The settings which I did Use, and which I removed in order to solve the
problem, are:

* Ethernet - Force Full duplex and Auto Negotiate off
set hme:hme_adv_autoneg_cap=0
set hme:hme_adv_100T4_cap=0
set hme:hme_adv_100fdx_cap=1
set hme:hme_adv_100hdx_cap=0
set hme:hme_adv_10fdx_cap=0
set hme:hme_adv_10hdx_cap=0
* End of Ethernet mode settings

Finally, I have not yet tried to re-implement these settings.
Unfortunately because of work load this will have to wait until After these
domains have been productionalised and then will need to pass through
change control.

Johan Hartzenberg/GIS/CSC
02/10/2000 12:46 PM

Sent by: Johannes J Hartzenberg/GIS/CSC

To: sun-managers@sunmanagers.ececs.uc.edu
cc:
Subject: Very Slow NFS access

Hi,

I have just set up 4 domains on E10K. On 2 of these I performed a
"security hardening" and a set of customization steps.

The security hardining includes turning on TCP/IP logging, STRONG TCP=2,
process auditing, etc. I also stopped almost all the redundant and
obsolete services from /etc/inetd.conf The customization includes a horde
of steps, including forcing network ports to 100Mbps and Full Duplex.

I mention what looks to me like the most prominent changes made to
Networking setup.

Now I am busy reversing the steps I heve performed 2 or 3 at a time. On
those domains where I have not performed the hardening and customizing, I
get approx 1200Kb/s when copying CD from NFS share. On those where I have
performed the customization I get terrible speed and the cp -r command
starts to give errors like

NFS time out "still trying"

(This is not the correct error, sue me, I am in a hurry)

Does anybody KNOW what is causing this and can save me many many hours
worth of un-doing changes?

Thanx for your urgent responses, the Dba's are yelling at me because their
new domains is not available at the promised time

  _Johan

S
U BEFORE POSTING please READ the FAQ located at
N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq
. and the list POLICY statement located at
M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy
A To submit questions/summaries to this list send your email message to:
N sun-managers@sunmanagers.ececs.uc.edu
A To unsubscribe from this list please send an email message to:
G majordomo@sunmanagers.ececs.uc.edu
E and in the BODY type:
R unsubscribe sun-managers
S Or
. unsubscribe sun-managers original@subscription.address
L To view an archive of this list please visit:
I http://www.latech.edu/sunman.html
S
T



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:18 CDT