SUMMARY: (1) How much of a benefit are multi-cpus in a SS10?

From: O'Neal,Chris (onealwc@agedwards.com)
Date: Wed Sep 02 1998 - 15:44:37 CDT


SYNOPSIS: Study performance benefits of multi-cpus in a SS10.

HARDWARE: Sun SPARCstation 10 w/ 2 Ross 180 cpus, 256mb RAM, (2) 2gb
7200rpm SCSI-2 disk.

OS: SunOS 4.1.4 (Hopefully study of Solaris 2.6 <SunOS 5.6> will
follow)

ORIGINAL QUESTION:
How much of a benefit are multi-cpus in a SS10 on Solaris 2.6 running
Netscape?

SUMMARY:
The responses I received were very good and stimulated further thinking.
As parts to rebuild the second machine came in, I decided to do some
benchmarking before I proceed with a new OS. First - current machine as
is, Second - current machine with second CPU, Third - current machine
with second CPU and additional RAM. Not all benchmarking is completed
yet, but from the perspective of myself, my supervisor, and my users
the results were favorable.

A SS10 running SunOS 4.1.4 with two (2) Ross180 cpu experienced a 4.1%
improvement in average overall system load and a 30.7% increase of
average overall percentage of cpu ideal time. Though no single job got
done faster, more jobs got done in the same period of time, and general
slowness of applications response during times of high load decreased
noticeable by general user.

Many respondies pointed out that other areas such as RAM, disk, or
network might be a greater bottleneck than cpu utilization. In this
case I do not believe that this is so. I am increasing RAM to reduce
the Paging, but since Swap Out and Process Swaps are below 1, I do not
expect a performance gain as I do not see the machine as memory starved.
At this time the companies Internet needs are feed by a single T-1 to a
Firewall then to a switched ATM cloud were the machine is feed by a
dedicated 10mbit line. The network people tell me that I have as good as
I can get and the network traffic on this machines line is not heavy.

Bench marking yet to be done...
- Increase RAM by 64mb to 320mb (running this week).
- Increase RAM by 256mb to 512mb maximum (do next week).
- Change OS to Solaris 2.6 (want to do before Halloween).

QUALIFIER:
The benchmarking done which the above stated improvements are based on
were not done in a repeatable scientific conditions. It was based on
comparing one weeks worth of "real world" data with another. It does
not take into account such items as; changes in user habits, changes to
web sites, changes in the financial markets (Stock market drop greatly
increased Internet information demand beyond average weekly load), or
fluctuations in Internet traffic. However we are finding the results of
valuable in our decision making process.

BENCH MARK DATA:
Five day daily averages for week for single cpu with selective columns
averaged for weekly results.
HOST:spider OS:4.1.4 HARD:sun4m RAM:261696K SWAP:623200K
UP:10:56 CPU:SUNW,SPARCstation-10 (Single Ross180)

Proc Mem Swap Page Swap Cpu Sys Usr Msht Surf Proc Inod
File Swap Net Net Net Net Disks
Swap Free Out Out - - Load on on on Tbl Tbl
Tbl Tbl sErr iErr oErr Coll sd0 sd3
# Kb #Pg Kb/s %a %i Avg # # # %u %u
%u %u # % % % %
---- ------ ---- ----- ---- --- ---- --- ---- ---- ---- ----
---- ---- ---- ----- ----- ----- --- ---
0.00 68970 0.00 7.56 73 53 1.61 2 0 30 5 59
11 26 0 0.000 0.000 0.012 0.2 2.6
0.00 72485 0.00 7.11 71 51 1.95 2 0 32 5 59
12 28 0 0.000 0.000 0.013 0.4 4.0
0.02 65774 0.00 9.07 69 46 2.20 3 0 36 6 60
13 30 0 0.000 0.000 0.014 0.5 5.0
0.00 69774 0.00 17.33 68 48 1.84 4 0 36 6 59
13 31 0 0.000 0.000 0.014 0.5 5.6
0.00 65332 0.00 9.33 71 46 2.03 2 0 33 5 59
12 28 0 0.000 0.000 0.015 0.6 6.0
      ------ ----- --- --- ---- ----
--- ---
      68467 10.08 70 48.8 1.93 33.4
0.4 4.6

Five day daily averages for week for dual cpu with selective columns
averaged for weekly results. Stock market drop on 8/27 and 8/29 caused
greater office use of Internet beyond average weekly load. This is
indicated by increase of disk % from 4.6 to 7.98. Monthly average for
past three months is somewhere around 5.0.
HOST:spider OS:4.1.4 HARD:sun4m RAM:261696K SWAP:623200K
CPU:SUNW,SPARCstation-10 (Dual Ross180)

Proc Mem Swap Page Swap Cpu Sys Usr Msht Surf Proc Inod
File Swap Net Net Net Net Disks
Swap Free Out Out - - Load on on on Tbl Tbl
Tbl Tbl sErr iErr oErr Coll sd0 sd3
# Kb #Pg Kb/s %a %i Avg # # # %u %u
%u %u # % % % %
---- ------ ---- ----- ---- --- ---- --- ---- ---- ---- ----
---- ---- ---- ----- ----- ----- --- ---
0.00 65623 0.00 4.27 72 62 1.69 2 0 34 5 60
12 27 0 0.000 0.000 0.019 0.5 5.8
0.02 64861 0.00 9.24 70 64 2.04 2 0 34 6 60
13 29 0 0.000 0.000 0.024 0.7 8.3
0.00 62216 0.00 8.44 70 70 1.76 2 0 32 5 60
12 30 0 0.000 0.000 0.023 0.7 8.4
0.00 69922 0.00 13.24 70 68 1.60 2 0 32 5 60
12 30 0 0.000 0.000 0.022 0.8 8.5
0.00 78107 0.00 9.33 71 55 2.14 2 0 32 5 61
12 28 0 0.000 0.000 0.022 0.8 8.9
      ------ ----- ---- --- ---- ----
--- ----
      68245 8.90 70 63.8 1.84 32.8
0.7 7.98

ALSO SEE:
Performance Computing, October 98, The Migration From Sbus to PCI Local
Bus, by James Hwang (Clarifies for me many misunderstandings I had about
the Sbus and the PCI buses).
Sun Performance and Tuning, Sparc & Solaris, by Adrian Cockcroft (Have
hardware, how do I optimize for my task list).
Configuration and Capacity Planning for Solaris Servers, by Brian Wong
(Have task list, what hardware do I need).
System Performance Tuning, by Mike Loukides (general unix over view,
some material in my copy is a little dated).

THANK YOU:
I would like to thank the following people for their responses;

Rick Fincher <rnf@spitfire.tbird.com>
David Schiffrin <daves@adnc.com>
Kelly Setzer <setzer@telalink.net>
Matti <matti@fugue.jpl.nasa.gov>

DATE:
09/02/1998

RESPONSES ARE AS FOLLOWS:
........................................................................
.......................
Rick Fincher wrote,

>
> QUESTION:
> How much of a benefit are multi-cpus in a SS10 on Solaris 2.6 running
> Netscape?

Several issues here. For one, SunOS runs the operating system on 1 cpu
and other jobs on the least loaded CPU.

Adding one cpu will increase speed for lots of jobs by more than 2X
because the cpu not running the OS can run Netscape jobs.

The problem with this is that OS calls like the TCP routines used by
Netscape will have to run on the OS cpu. This creates a bottleneck.

With Solaris processes are distributed on all cpu's including OS
processes. Also dynamic libraries make better use of Memory. Adding lots
of CPU's means you need
to add the memory to support the extra jobs or swapping will bog the
machine down.

I'm not sure why you are seeing such a degradation with Solaris 2.6
except it is a memory hog compared to SunOS. You may be memory starved.

We had a SunOS (4.1.3) multi cpu machine that we upgraded to Solaris. It
ran smoother under Solaris and a little faster. The granularity of
Solaris is much
finer.

We had single cpu SPARC 10's running simulations. When we went to 4
cpu's we could run 4 jobs just as fast (a little faster actually because
of OS overhead) as 1 job on 1 cpu. They required about 100 meg of memory
each so we put 512 meg in the machine to avoid swapping. This was enough
RAM to run Solaris and 4 jobs
without swapping.

These simulations didn't do network access so you may get a bottleneck
running a bunch of Netscape jobs all accessing the network port.

You may be able to increase speed by adding network cards to the machine
and adding more IP numbers and have your users log into multiple names
using separate network adapters on the same physical machine.

You may also be able to increase throughput by adding a faster network
adapter, if possible in your environment.

Check perfmeter to see how much swapping and collisions are happening
both before and during peak times to try to identify your bottleneck.

Hope this helps!

........................................................................
................
David Schiffrin wrote,
I've seen considerable benifit moving from single to dual/quad
processors in SS20 machines. (similar to SS10). I've _ONLY_ done this
using Solaris 2.x

The multi-processor benifit you're likely to see is mostly handled by
the OS rather than netscape itself. Since the OS scheduler has more
processor resources to hand out to various users's netscape processes,
which can all share code pages in RAM, there should be considerable
advantage to such a setup.

I have seen a penalty comparing 4.1.[3|4] to solaris 2.3, I've stopped
comparing since the multi-processor benifit under Solaris is so
tremendous over SunOS.

The key issue here is one of sizing your machine for the load you're
going to present it with. If your server is expected to be a NFS server
and or Xclient application host to a large number of Xserver devices,
your biggest bottleneck is most likely the disk or network I/O. This is
why PCI/EIDE machines don't make good servers. PCI in 66MHz/64bit trin
is consierably faster than Sbus, but EIDE doesn't support things SCSI
does like command queueing and asychronous bus communications which are
well suited to multiple process/user loads. Things which can help a
machine better handle these sorts of problems are faster/more network
adapters and faster/more SCSI adapters and drives.

An excellent source is "Configuration and Capacity Planning for Solaris
Servers" by Brian Wong. This book covers the machines you're considering
in depth, as well as the first generation of Ultra machines.

I'm a bit curious about the testing you've done which shows Solaris 2.6
to be "much - much - much - much slower" as I said, I've seen a penalty
on single processor boxes, but everything sun4m and newer performs
reasonably well, when tuned for the environment, and the multi-processor
tuning in the Solaris kernal is a tremendous piece of work. [Our 2.6
testing was done on the only free machine we had at that time, an SS2,
CDE seemed to be the real killer. Our 2.5.1 experince has only been on
a single dual E3000 (good) and a single single SS20 (not so fast)], I
get more benifit from a single processor upgrade in a Solaris box than
under hp-ux, irix, or NT, and the benifits are scaleable to more and
more processors (I've not setup/run more than 14 or so, but I've no
reason to believe that it runs out anywhere near where I need to go).

I think you'd be pleasantly suprised with 2 or more processors in your
SS10 under 2.6 provided you can get enough RAM in the machine to keep
from stalling on pageins from disk. Some sar data from your current
machine would be helpful to figure out how to help the new one. I
wouldn't be suprised to see that at those peak times your cpu is
blocking on disk reads to retrieve from swap portions of processes which
were paged out due to real memory constraints.

Hope this helps

David also wrote,

I'll try and snip out the redunant bits, and reply to what I can of your
questions.
well, all I managed to snip was our previous correspondance, but here's
my thoughts.
>
> Hi David,
>
> Really good response thanks!
> ( I started writing this Friday, but am just now finishing it... sorry
> for the lag. I did not make it to the book store this last weekend...
> will try again shortly. Other response I have received so far are not
> as while thought-out as yours... again thank you)
>
> Some points:
> - "which can all share code pages in RAM,"
> In 2.6 how do I force "can" to "will" share code pages in RAM?

Solaris will initally share a code page, should a process change the
pages state to writeable, [! code ?!] the OS will fault and copy the
page transparently to the app.
Convieniently, Solaris will also do this with data. Should two (or more)
processes access the same data from disk, the OS will map the filesystem
inode to a vnode, as long as no process has opened the vnode for
writing, everyone uses the same copy.

>
> - "I have seen a penalty comparing 4.1.[3|4] to Solaris 2.3, I've
> stopped"
> Over the years, I have seen time test reports on Solaris 2.x where it
> started out slower then SunOS 4.x got faster and is now slower than
> SunOS 4.x on single CPU boxes. The latest return to slower speeds
than
> SunOS 4.x may be do to the GUI of CDE as the 2.6 testing was not done
in
> Openlook so my latest info is not apples to apples.
>
> - "PCI in 66MHz/64bit trin is considerably faster than Sbus"
> Yes its faster, but considerably faster? I am still thinking about
> that. My current understanding is that:
> + on new Sun small to mid-size boxs Sun is providing just one (1) of
the
> PCI slots at 66mhz the rest are at 33mhz.
> + not many cards are compatible with the single 66mhz slot.
> + when 66mhz slot runs at 66mhz the remaining PCI slots are adversely
> affected.

Perhaps considerably is an overstatement. There is, however more to bus
speed than width and clock. PCI impelements multi-cycle burst modes
which far exceed the 2 cycle burst of Sbus (64bits, 128 on 64bitSbus) I
am unaware of problems using the 66/64 PCI slots in small boxes. My
sun/PCI experiences are limited to the E450 and E250, and E[34]000 with
PCI I/O boards. I can't imagine that sun would have underengineered the
smaller boxes the way you've implied. Given the UPA bus, it should be
trivial to implement a 66/64 PCI bus. I think you are correct about the
single slot in the Ultra 5. And last on this bit, not many Sbus cards
work at 64 bits either. :) (source: wong, 1997 p240, 249)

> If I had SS20 (or SS10SX) then I would be more likely to move to
Solaris
> 2.4 or higher because this would give me four 64bit Sbus slots each at
> 50mhz. [Source Sun Performance and Tuning by Adrian Cockcroft]

Hmm, I'll buy 64 bit Sbus in a SS20/SS10SX but how do we clock them up
to 50Mhz ? My understanding is that the SS20 clocks the Mbus at 50Mhz,
but then divides by 2 for a 25MHz max Sbus speed. :)

> It is also my understanding that the much stated fact "SunOS 4.1.4
does
> not support multi-processors" is not true. Ross sells such configs
now
> and Sun sold such conf in the past (Sun MP360) and Sun4.1.4 /sys/`arch
> -k`/conf/GENERIC even has the following lines:
>
> #
> # This kernel may run on a multi-processor system;
> # include multi-processor support. This option is mandatory.
> #
> options MULTIPROCESSOR # Multi-Processor support
>
> What SunOS 4.1.4 will not do is mult-thread or split a single process
> between two processors. The OS scheduler places the next processes
onto
> which ever cpu is least used at time of placement (my understanding
is).

This is my understanding as well. Not that SunOS didn't support multiple
processors, but rather that the scheduling and resource usage was much
more efficient as you add processors to Solaris.

>
> - "An excellent source is Configuration and Capacity Planning for
> Solaris Servers by Brian Wong."
> I have not seen this book. It sounds like you recommend it? I will
> look for it this weekend. Do you have "Sun Performance and Tuning by
> Adrian Cockcroft"? If so, how do you compare the two books?

Yes, I recommend it. The Cockcroft book has practical useful information
on how to make it better, the Wong book goes into a different set of
details about why this or that works. The Cockroft book seems aimed at
"I've got this pile of hardware, what's the most I can get out of it"
while wong aims at "I've got all this to do, which pile of hardware
should I use".

>
> - "but everything sun4m and newer performs reasonably well, when tuned
> for the environment"
> Do you mean hardware configured to task or OS configured to task? I
ask
> this because my understanding of Solaris 2.x is that there is not much
> you can 'tune' with the OS as compared to the old /sys/arch
> -k`/conf/GENERIC files of SunOS 4.x. If this is not true and Solaris
> 2.x is 'tunable' then its pretty while hidden. Three of my 30
machines
> are 2.x. The only way I seem to be able to 'tune' them is throw
> hardware at it. SunOS has saved my hardware budget many times.
>
Both hardware and software. Solaris _IS_ tunable, but not as much as the
/sys/arch stuff, and it isn't documented as well. In general the default
configuration solaris picks based on what it finds in available hardware
on startup are pretty reasonable. Look for /etc/system and
/kernel/drv/*conf for a start. I'll agree that what goes in these files
is very very well hidden. :)

>
> - "provided you can get enough RAM in the machine to keep from
stalling
> on pageins from disk."
> I already have 256mb of ram in my existing Netscape server and I will
be
> updating the second one to 256mb also. My current understanding is
that
> the max ram you can put in a SS10 or SS20 is 256mb. Do you have any
> SS10 or SS20 with more than 256mb ram? If so, how are you doing this?

Yeah, 256 sounds like the right limit. I've got www servers here
running in 3/4GB of ram. yeesh. As I recall netscape expects some 1-4MB
of memory per process and 4-16 processes per site. This can run out
pretty quickly if you use the bigger numbers (4*16*4 sites) + OS
overhead + networking buffers you get the idea.

>
> - "Some sar data"
> SunOS does not have sar.

Ooops, I've been SVR4'd. No really I swear I used to be a BSD kind of
guy, really I did, It's only this job that I've gotten used to the AT&T
way...I meant to say vmstat.

Anyway, I hope I've helped clear this up more than I've muddied the
waters. I'm interested to hear how it goes for you, let me know if
there's anything else I can help with, or just how it all turns out, if
it's not too much trouble.

........................................................................
.....................................
Kelly Schiffrin wrote,
We had a uniprocessor SS20 with a 50MHz processor. It was serving mail
via pop for ~2000 users. We added a second 50MHz processor and response
time was reduced ten fold. Load average dropped from ~5 to >>1 during
peak times. This is with Solaris 2.5.1

Considering the nature of your load, additional RAM may be more
beneficial than a second CPU. Also, have you examined disk performance?
The cache activity of 70 instances of netscape must be pretty intense -
your disk (or even your network) may not be able to keep up.

You can buy nice SCSI controllers for these things [New Sun PCI boxes]
and get the best of both worlds ... SCSI+PCI. Use the ide drive as a
bookend.

I hate to even suggest this on sun-managers, but... For $4200 you could
purchase/build a PC and run linux or a BSD variant. Such a machine would
have substantially better performance than even 2 Ross processors. The
price/performance is pretty compelling.

best of luck,

........................................................................
..................
Matti wrote,
You actually want to use something like 'sar' to determine what is
happening with your machines.

The thing which may actually improve your preformance the most maybe
RAM. Typically slow downs are caused by the system needing to swap
memory.

Typically, multiuser unix machines are greatly benefitted with
additional CPUs. (esp. since each user is running separate instances of
netscape)

thx
matti siltanen

ps - TEST and watch with netscape4 -> so far i've encountered some odd
cases where one instance had hogged up 50% of ones CPUs resources while
idle. I dont know why yet.

Matti also wrote,

I dont have an SS10 - so I dont know how much memory you can put on it
:(

sounds like you've done just about everything except add the CPU. I dont
know SUN OS 4.x well, but you're gonna have to figure out if swap is
your problem, because IF it is, an extra CPU will not help enough.

In general you can expect another CPU to increase your CPU horse pwr
1.8x for unix boxes. (2x maybe??) NT boxes gain less for each CPU - BTW

========== Original Posting ===============

QUESTION:
How much of a benefit are multi-cpus in a SS10 on Solaris 2.6 running
Netscape?

WANTED:
Mainly...
1.) Input from Sun Managers who had applications running on a box with a
single cpu and then upgraded said box with an addition cpu. What
performance gains did you or your users noticed? Was it worth it?
2.) Input from any Sun Managers with multi-cpu experience.

MACHINE:
box: Sun Sparc 10
cpu: ???
mem: 256mb
hdd: (2) 2gb. 7200rpm fast nears w/ 1mg cache
os: ???

DETAILS:
Currently I have a SS10 with a ROSS 180 on SunOS 4.1.4 running Internet
services for 70 Netscape users via Xterminals. It is handling pretty
good except during peak loads which happen one - two times a day lasting
15 - 30 minutes. When it is loaded, it does not crash or kill processes
things just slow down (they don't stop).

Recently I had another SS10 with a SUN 50 on SunOS 4.1.3 migrated to
free none-use. I want to rebuild it and use it as a second Internet
server splitting my existing Netscape users between the two boxes.

The question I am wrestling with is: How to best rebuild the box for
Internet/Netscape/Xterminals?

* I like SunOS 4.1.4 but am open to using Solaris is has an overall
beneficial to the company (None found so far).
* I want to upgrade the cpu from a Sun 50 to a Ross 180.
* Do I go to Solaris 2.6? In my environment 2.6 is much - much - much
- much slower than SunOS 4.1.4.
* How while does Netscape 3.0 & 4.0.5 use multi-cpu in a single box?
How much faster does Netscape run on two cpus?
* Because Solaris 2.6 is so slow do I go with two (2) cpu? Will two
Ross 180 get me back too or beyond SunOS 4.1.4 performance levels?

All input via email is welcome. I think a Ross 180 listed for $2,100
and duel Ross 180s listed for $4,200. New Sun boxes with PCI bus and
EIDE hard disk drives need not apply, they just don't handle the
multi-user load of Xterminals as while as older Sun SCSI equipment.

Thank You

Chris O'Neal
Bond System Administrator
Sales & Investment Banking Technology
Marketing Services
A. G. EDWARDS & SONS, INC.
One North Jefferson, St. Louis, MO 63103
voice: 314-955-6178
fax: 314-955-4897
wire: SRN
email: onealwc@agedwards.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:48 CDT