Summary (long): Perf. probs SC2000/Sol 2.3/Oracle7.0.16

Date: Wed Jan 11 1995 - 02:36:27 CST

Just before christmas, I posted a question about the above problems we
had on a SparcCenter 2000. I received a lot of helpful hints, many of
them suggesting going to Solaris 2.4 or Oracle 7.1. Well, since this
was not possible due to reasons on the application side, we had to
find out other ways to get around it and up until now it looks rather

Many thanks to all that responded (if I missed somebody, please
apologize, our mail system ist not in the very best state ;-).

Here is the original posting:

: Hi all,

: we have set up a rather large DB machine and experience heavy
: performance problems. Maybe someone has helpful suggestions. Here the
: configuration:

: Sun SC 2000:
: - 12 processors (currently 11, one blew up ;-)
: - 1.8 GB RAM
: - 6 x 2.9 GB F/W differential SCSI for OS uses
: - 10 x SparcStorageArray with 30 GB (30 x 1GB) for DB
: 2 way mirrored, so 150 GB are there

: - Solaris 2.3 w/ patches (101318-54)
: - Online Disk Suite 2.01
: - Oracle 7.0.16

: First we tried with DB on filesystems and saw a heavy 'pumping effect'
: where the machine was running fine for 5 seconds, and then idling for
: about 20. It looks like this had something to do with an interaction
: of fsflush and the Oracle db writer, so we changed to raw devices.

: Now we don't see the pumping effect anymore, but we have a lousy read
: performance (i.e. a full table scan lasts about 10 times longer than
: on the Filesystem). Oracle db_block_size is 4 k and db_multi_block_read
: is set to 8, so Oracle should read in 32k blocks. iostat reports reads
: of 4k on the devices. And though no disk is more than 20-30% busy
: (delivering about 200-300kB/s), we see a *LOT* (about 60%) of CPU time
: spent in wio (wait for io). If we use dd on the devices with various
: block sizes, we see the expected results (about 1MB/s with 4 k blocks,
: at max 3.6 MB/s with 32k blocks read performance).

: This is also what we see when the application is running
: normally. About 15% user CPU, 15% system CPU, 60% wait for IO. No disk
: really busy (max 40% for 1 or 2 seconds on some disk). BTW: Also on
: writing in normal operation Oracle doesn't nearly peak out any disk,
: but we can see the same (~60% wait for io).

: We are now a little bit clueless what to do, because we don't really
: see the bottleneck. It is just that the machine is too slow. We will
: try fiddling with the db_block_size and db_multi_block_read, and with
: the mirroring parameters. We are just now setting up a test machine,
: to see whether disk striping will help.

: Help will be *MUCH* appreciated.

: Thanks and merry Xmas to all

And here are the results up to now:

After going to raw devices and seeing the bad read performance, we
made various test on fiddling with the Oracle db_block_size and with
disk striping. Oracle db_block_size didn't help much.

As many of the replies suggest, disk striping could solve the read
problem. We ended up in distributing the whole database across the 150
mirrored disks (in all 300 spindles). We used a striping factor of 32
KBytes, and always placed junks of 150 MB onto the same mirrored disk
(so for example a 900 MB tablespace would be striped across 2x6

This was a major win! It put up read performance almost linearly with
the number of disks used and didn't hurt write performance at all. Now
machine turns happily and reaches the needed IO amount needed. We
still do see (from the UNIX point of view) room for improvement, since
we still can not measure any single bottleneck. The machine is now
using about 10-20% system time, 30+% user time, the rest idling (or
sometime wait for IO). No single disk is near 100% busy.

As reply 12 suggests (from Rajesh of Oracle), we also tried turning
the use_readv parameter off (it was on before) and surprise, surprise,
performance went up drastically! A win of at least 30% in reading! Now
could anybody please explain me why the use of readv has such an

We assume that the application itsself now has room for improvement,
especially since a lot of the buffering has gone and Oracle seems to
be limited to a SGA size of about 520 MB. Normally we have around
600MB of unused RAM ;-) Does anybody have an idea how we could use
this memory?

Well anyway, the application runs now satisfactorily from the
performance point of view. We will try out various things though and
hope that we get it even to run faster than now.

Again, many thanks to all that responded!

Joe Ammann, EDBS-AMJ | Tel: +41 1 236 5897
Union Bank of Switzerland | Fax: +41 1 236 8120
CH - 8021 Zuerich | E-Mail:

Reply 1:

From: RFC-822/
Organization: Tauw Milieu BV


Agreed, the pumping effect could be caused by interaction between the
operation system caching and the caching done by Oracle. When you
changed to raw devices, did you use block devices or character
devices? In the latter case you disable any operating system caching.
Did you reallocate memory from the operating system cache to the
Oracle cache? Maybe you can tune the frequency of the syncs or
whatever they are called in Oracle.


: performance (i.e. a full table scan lasts about 10 times longer than
                      ^^^^^^^^^^^^^^^ is this a realistic test?


: to see whether disk striping will help.

We use disk striping. I think it helps but I don't know how much.


Hope this help. Happy New Year


Reply 2:

From: mig/crl/
Organization: CRL Dialup Internet Access (415) 705-6060 [Login: guest]

I don't know too much about Oracle as I am a Sybase user. But if you
are seeing 60% wio times striping should definitely help. It looks
like the bulk of your time is spent waiting on your I/O requests.
With the amount of memory that you have I certainly would consider
caching as much as possible to cut down on the physical I/O's. Have
you tried using proctool to collect data on what the machine is
actually doing? I wasn't aware that Oracle would take advantage of
the symetrical multi-processing built in to the kernel of 2.3. Also
having just completed a Sybase Performance and Tuning class I was told
by the Sybase rep that they had found that with the current release of
Sybase 10.0.x that performance improves between 1-4 processors. But
adding additional processing power beyond 4 will not yield the
expected gains. Well, I jumped on him to say the least having written
Solaris Multi-Threaded programs which yield almost a linear increase
in run time performance for each additional processor deployed.
Proctool will let you selectively shut down processors. Maybe Oracle
has the same lag in development for multi-processor systems that
Sybase has. In fact the trade rags have been touting Informix for
massively parallel systems. I'd be curious to know what you find. We
are currently using Sparc 1000's with a look ahead to the 2000 series
for future development efforts on a 50 G database but we are still
using 690's for dataservers in production. A bit conservative but it
is the government.

Mark Gross

Reply 3:

From: Goodwin_Thomas/Sun_Eng/
Organization: Sun Microsystems Computer Corporation

A couple of thoughts:

1. it will be generally worth going to Solaris 2.4, performance is
generally improved for databases.

2. that probably won't help your case, as it is more configuration
limited than cpu limited.

3. Do you in fact have a performance problem? are response times
bad? On MP machines, you will often see wait i/o when the machine is
essentially idle. If even one thread is waiting for i/o, all cpus
will report their idle time as wait i/o. There may not be enough
active users to keep all the cpus busy.

4. the problem may well be in the layout of your database. seeing
lots of wait i/o (remember it can be like idle) can mean that there
are locking problems in the database tables. looking at the oracle
monitor can help with this (look for lock/latch contention).


Reply 4:

(my answers to Thoms questions are marked with >>. Unluckily I never
got an answer from Thom, but then again, as you know, out mail

From: tness/cray_atlanta/


First, let me introduce myself.....

The first suggestion is that you find out when and how you can get
Solaris 2.4. We have been using this in our benchmark center for
several months and it is a major improvement. I do not know what
the availability is in Europe for Solaris 2.4 from Sun.

In your first posting you told a lot about your installation, but
I did not understand how you are using your database. I would
like to know:

        Is it for OLTP?

>> Mainly batch. Each night a file from the host is downloaded
>> about 2GB. These files are then processed and several
>> operations done on them. It was planned that a normal socalled
>> TagesEndVerarbeitung TEV (daily end processing) would take around 10
>> hours. With the DB on Filesystems it took 30-35 hours ;-) with DB on
>> raw devices it is around 20 hours.

>> We still need some increasement, since backup takes about 4 hours,
>> leaving nothing for unexpected events.

        How many users?

>> Neglectable. Some 5 or so doing some small queries. Every quarter a
>> general report is generated and then distributed to zillions of
>> people. But almost no OLTP load.

        Is all of the data accessed randomly?

>> Almost, yes. The records that are downloaded each day mainly contain
>> financial information on customer transactions. We can not really
>> guess which users will be doing transactions on which days. Of course,
>> over time a pattern will establish, but we don't know yet how we
>> should make use of this information.
        What is the ratio of reads to writes?

>> About 4:1.

        How complex are the queries?

>> Not very. If a customer transaction records arrives, the customer
>> record has to be retrieved, then the record containing booking
>> information about this transaction, and finally the customer record is
>> updated, and the transaction record inserted in the database. There is
>> more processing done afterwards (interest calculating, averages etc)
>> but there is no real problem there.
        What is the total data transfer rate that you are getting?

>> Depends heavily on the current state. On normal processing, we get
>> only some 600-700KB/s read (from about 5-7 disks). Write is much
>> worse, about 200-300KB/s on one single disk. We do "striping" by
>> distributing tablespaces onto various datafiles, but we see that this
>> isn't enough. We have made test with striping wit ODS and see dramatic
>> improvement.

>> Since setting up the DB takes about 2 days, we have now only 1 try
>> left ;-) We will setup the database for production striped over all
>> disks, with a striping factor of 8KB (the smallest we can get).

        What is the total data transfer rate that you need?

>> About twice what we get.

You are correct in the observation that the more spindles the better.
When we consider a customer application, we try to match the data
layout with the access patterns so that there are as many active
disks as possible. Striping is a major win for reads.

When you run iostat, how many io operations do you typically see on
the active drives? What are the typical service times? You said that
iostat reports 2k reads even though you have attempted to set it
for 8k. Is this still the current situation? If so, I can check with
our database support people to try to find out why. This would take
some time, so I won`t do it unless you are still having problems with

>> We see about 50-60 operations/s each 4kb (the Oracle block
>> size). Although Oracle documentation says db_multiblock_read should
>> cluster reads, we see no such effect. reads and writes happen with 4kB

I consider a 150 GB database to be fairly large. Obviously, without
knowing your usage, I don`t know just how large. If I met a potential
customer with a 150 GB database I would propose a much larger machine
than the 12 cpu/1.5 GB that you are using. We have installed base
machines with Oracle on 24-32 cpu`s and much more memory.

>> Well, we still seem to see a mostly idle machine!! It really looks
>> like the whole problem is due to Oracle reading/writing one thing
>> after the other and not parallelizing things. So we try to solve the
>> problem with ODS striping. the DB-writer of Oracle is the only process
>> that is doing something.

I hope any of this helps!

Thom Ness
Cray Research, Inc.
Atlanta, Georgia USA

Reply 5:

From: plovill/Lanier/
Organization: Lanier Inc., Tucker, GA


I suggest you find out if Solaris supports "direct I/O", and if it and
Oracle both support this on your platform, enable it. (Direct I/O is
a method of I/O to Unix files on Unix filesystems, but bypassing the
normal Unix kernel buffer cache mechanism.)

--  -or-  +1 (404)493-2255 (work)
Perry Lovill   Consultant/System Administrator   Lanier Worldwide, Inc.
 #define   DISCLAIMER_TXT    "I neither speak for, nor work for, Lanier, Inc."

-------------------------------------------------------- Reply 6:

From: scip3170/nus_leonis/ Try put this parameters in /etc/system

set nhbuf = 256 /* p_nbuf / nhbuf = between 4 and 8 */ set p_nbuf = 1024 /* roughly 128 * no of disk drives */ set nbuf = 1024

I/O performance may improve as much as 20 %

Happy New Year

-------------------------------------------------------- Reply 7:


: ... we have a lousy read : performance (i.e. a full table scan lasts about 10 times longer than : on the Filesystem).

No mention that your database administrators have placed tables and indices that would be accessed simultaneously on different drives. You certainly have enough drives to do this. Is this too obvious?

You know, if someone else came to you saying that something runs 10 times slower than it used to you would ask them "What did _you_ change?". Something definitely happened when you went from unix files to raw filesystems. I suspect table and index placement changed, perhaps to minimize effort or time for the switch to raw filesystems.

: This is also what we see when the application is running : normally. About 15% user CPU, 15% system CPU, 60% wait for IO. No disk : really busy (max 40% for 1 or 2 seconds on some disk). BTW: Also on : writing in normal operation Oracle doesn't nearly peak out any disk, : but we can see the same (~60% wait for io).

Have a look at the last five columns of iostat -x. I say this because you didn't mention disk service time (svc_t). My guess is that this will be way too high (about 20 milliseconds is 'normal') on some disks. Some man page reading on the other columns will also suggest things to investigate.

: ... We will : try fiddling with the db_block_size and db_multi_block_read, and with : the mirroring parameters. We are just now setting up a test machine, : to see whether disk striping will help.

I presume that the 60% wait for io is the 'wt' column from iostat. Since unix is reporting this big number I doubt if tuning Oracle parameters will be fruitful at this point, although having a look at Oracle's statistics and ODS is a good idea.

The stripes will have to be narrow and cover many disks to avoid having to know what the access patterns of the application are and working out good table and index placement. :-).


-------------------------------------------------------- Reply 8:


(unluckily was truncated by our mail system, thanks anyway)

-------------------------------------------------------- Reply 9:

(obviously there was another reply, which went somewhere near /dev/null ;-)


[ Stuff about performance on Sun/Oracle deleted ]

: 1. don't use the 2.9 drives; they're way to big. use 1gb drives : instead (yeah, real nice to know after you put up a lot : of $ for them).

This is a case of not thinking on the part of the buyer. A database needs lots of I/Os per seconds (commonly called iops). Iops are a function of the number of disk arms you have, not the number of gigabytes you have. If you can handle the admin load (use a logical volume manager) then your best bet is a large array of fast small drives, like 200-500 MB drives.

: 2. there's a paper out (not sure if it's white) that sun uses : internally for optimizing oracle to run on : contains a lot of info for setting parameters in the : /etc/system file. Some of the paramaters have to do : with mouse/monitor interaction...important on a sparc 5 : or 20, but a moot point on a 1000 or 2000.

: set slowscan=100 : set fastscan=16000 : set lotsfree=512 : set desfree=256 : set minfree=128

These guys control the pager's thinking. I'm a little confused by the values here - I think the alg hasn't changed much since the BSD days. Take a look at the 4.3 kern/vm_page.c at the function pageout() and you'll have a good idea of what is going on. The 4.4 stuff is different.

--- Larry McVoy (415) 390-1804

-------------------------------------------------------- Reply 10:


The SC2000 hardware has inherent bottlenecks which you will potentially see with your configuration. As one user has pointed out, the A+ edition kernel may help a little bit. Cray Superservers CS6400 overcame the hardware bottlenecks inherent in the SC2000 design, and gives you as many or more kernel enhancements as the A+ edition...

You may want to consider contacting your local Cray sales representative for more information.

Good Luck.


-- -- John Falkenthal Voice (619)625-3797 Cray Research Superservers, Inc. Fax (619)625-0641 -------------------------------------------------------- Reply 11: (via another inquiry, but might be interesting) From: Steven Schuettinger from SUN, who was with us for some days to help us > Problem: > A full table scan within Oracle is about 3 times slower then using Oracle > with UFS instead raw devices. > I just called UBS and they told me, that the read performance > sometimes is o.k and sometimes is about 3 times slower. this is undoubtedly due to the fact that UFS clusters reads into 56K blocks, while Oracle is probably set up with dbblocksize=2k. As such it is clustering up to (I think) 8K, which is around 8x less efficient. My recommendation for optimizing table scans: set dbblocksize=8k, and set multiblockread to something like 4 or 8. That will fetch 32K or 64K from the disk in each go. It can't hurt in this case because you already were running in UFS, so we know large blocks don't hurt too badly. if you stripe your tablespace, you can optimize this even further by setting multiblockread to 16 and putting it onto a 4-wide stripe with 32K interlace. This aligns the stripe width and the multiblock read width to be 128K and has been shown to be much faster than the deafults. Note: you have to dump and restore your DBMS to make the 8K option effective. -------------------------------------------------------- Reply 12: (via another inquiry, but might be interesting) (from an Oracle employee) The reason why you lost read effeciency is mainly due to UFS where the OS does readahead of disk blocks for mainly sequential accesses, but this as you know comes with the extra baggage that causes the "pumping effect" that you experienced on a heavily loaded system. When you moved to raw devices you gave up this readahead capability provided by UFS. Please do not expect Oracle to match the speed of dd, since there is a lot more processing going on between the Oracle/OS layers. You should try the following: 1) First, try increasing the db_file_multiblock_read_count to 32, then 64 and then 128....I think the max is 512. Remember that the read chunk size is also constrained by the OS at the device driver level. 2) I'm not sure what your SQL stmt look like, but make sure that the data in the rows that you are referencing are not chained ...consult the Admin Guide for more info on this topic. 3) Does your query do sorting in memory and/or the temp tablespace? If so, have the properties assiciated wit the temp tablespace or sort_area_size etc changed when you made the transition from UFS to raw devices? 4) try turning off the use_readv parameter, this could explain the discontiguous rowid phenomonon. 5) Try using ODS to stripe raw devices which correspond to "hot datafiles" using stripe size of say, 32K. This will help by: - improving your current read/write throughput - laying out your physical database to take advantage of Oracle 7.1 parallel features in the future Thanks, Rajesh --------------------- Oracle Corporation Sun Products Division Performance Group (415) 506-5537 --------------------- -- Joe

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:13 CDT