I week or so I ago I put out the following plea:
> We are on track to bring on line some new E420R machines and
>during our testing we have noticed something rather puzzling. One of
>our business systems performs a batch job to crunch a thing called a
>MRP. The problem we are seeing is that this MRP is taking just as
>long on the new hardware (4x450MHz processors) as it does on our old
>systems (2x200MHz Ultra2). This just does not gel with me - it should
>be faster, the disk is faster, the processors are faster so we should
>be seeing at least a halving of the time to run the MRP.
>I have watched things using vmstat, iostat and top. What I am seeing
>is that a lot of time on the new box is in sitting in iowait state
>with near to nothing spent in sys or usr. I cannot see what this
>process is waiting on - vmstat shows a little bit of page in/out but
>nothing spectacular (there is lots of free memory on the box, the page
>scan rate is mostly very low), the iostat shows the disk i/o is not
>very high and the disk service times are very low (sub 7ms), speaking
>of the disks - we have tested the throughput on the disks and we are
>seeing 10Mb/s to the disks (as seen by iostat) with a svc_t lower than
>10ms so I believe the disk subsystem is performing correctly. In
>short, I can see no reason for the machine to be waiting on anything.
>I have done a truss on the process that is talking to the oracle
>database when the MRP is in progress - it sends an IPC message,
>receives and IPC message and then does some small reads and writes
>(presumably to a pipe which has the oracle process at the other end of
>it) and just loops over that sequence.
>One thing I have tweaked was the maxpgio to increase the page rate
>because of the 10000 rpm disks but that had no noticeable effect.
>The machine is running Solaris 2.5.1 (sorry, no option to upgrade this
>at the moment) with the latest recommended patch cluster and Y2k
>patches. Has anyone seen anthing like this? Anything I have missed?
I got back some thoughtful replies, suggesting I check the shared
memory parameters in /etc/system (I had), look at the disk i/o using
iostat (I had), the MRP is written badly (it is, but we cannot change
that, alas), check the Oracle SGA size (dba did that).
In the end, with a lot of help from our local Sun SE and some
inspiration from the admins of the app we tracked it down to a bad set
up of tnsnames.ora - the machine was running in "test" mode in
parallel to the machine it was to replace, we had given it a different
name to allow us to do testing on the new box. From what I understand
the entry for the test machine name was set up so that oracle was
using the listener process (thereby making a network connection) even
though the database and app were on the same box. The app does lots
and lots of little reads and writes, the performance of which suck big
time over a network connection - hence the big slow down. Once we
fixed tnsnames.ora to make sure the app connected directly to the
oracle database things went much faster. It is still not great but
the performance is a definite improvement over the old U2.
Many thanks to:
Nelson T. Caparrosso
-- =============================================================================== Brett Lymn, Computer Systems Administrator, BAE SYSTEMS ===============================================================================
S U BEFORE POSTING please READ the FAQ located at N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq . and the list POLICY statement located at M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy A To submit questions/summaries to this list send your email message to: N firstname.lastname@example.org A To unsubscribe from this list please send an email message to: G email@example.com E and in the BODY type: R unsubscribe sun-managers S Or . unsubscribe sun-managers firstname.lastname@example.org L To view an archive of this list please visit: I http://www.latech.edu/sunman.html S T
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:14:16 CDT