SUMMARY: high iowait with raid5

From: Shane Hickey <shane_at_howsyournetwork.com>
Date: Thu Jan 30 2003 - 14:17:27 EST
Well, I got a ton of very helpful information.  Thanks to everyone who
responded.  I've posted the answers/comments that I received below and
the original message is at the bottom.  

In summary, it seems that software RAID5 (especially on a slower box)
probably isn't a great idea.  It is also, apparently, a worse idea to
have your RAID filesystem be /var on a mailserver.  What helped me,
though, was to create a spool directory on a non-RAID filesystem and
then symlink /var/spool to this new directory.  Now the box seems a tad
bit pokey, but it's able to keep up with incoming messages.  I was
seeing upwards of 3000 sleeping sendmail processes and now I only see
20-50 (which is more like it used to be).

Many people suggested ditching SW RAID entirely, or, at least, switching
to mirroring/striping.  Here's the individual comments.

-------------------------------------------------------------------
Jay Lessert: 
SW RAID5 is very, very slow on writes, no matter what platform it's on.

sendmail in particular does a *lot* of create-small-file-then-discard
operations.  Even on a fast file system this bottlenecks sendmail.
A SW RAID5 for sendmail /var/spool would not be a good thing.

Striped and mirrored would be OK.  Just not RAID5, no free lunch.

If it is an option, you might consider changing MTAs to postfix
(can be very sendmail-like) or qmail (not very sendmail-like).

Either will be much less of a resource hog compared to sendmail.


Joe Fletcher:
Software RAID5 is always going to put quite a load on 
the system especially in write intensive environments. 
The parity calculations are what's killing you. You have 
to do the numbers then do each I/O as many times as 
you have disks in the RAID set.

An E250 is not a particularly quick box to start with.
I know from experience that something like an old 
Proliant 3000 with 2x PIII-550s and a hardware RAID 
card running SuSE can run rings round a 420R with 
2x450MHz UltraIIs talking to A1000s.

I'd suggest either hardware RAID or moving to a simple 
striped config for performance though obviously you 
lose you resilience.

Martin Hepworth:
how many disks on the RAID5 set and over how many controllers? using 
RAID5 on less that 5 disks can result in this problem. Also increasing 
the number of controllers can help as you spread the load.

Glen:
With a mial server you should have bumped down the strip size!     
mailserver is small block i/o, which is faster with a smaller strip
size.    Also a Dns server will not affect i/o.   Are you using
Veritas?   If so convert to a striped pro.

John Timon: 
Raid 5 in software is always bad.  You will get very bad performance in
a
raid 5 setup in software.  If you must do raid 5 do it in hardware.

Or if not hardware then do mirror+stripe for good performance and fault
tolerance

Kevin Buterbaugh:
     I/O wait has meaning only on single CPU boxes.  On MP servers, it's
irrelevant since the CPU issueing the I/O request is free to do other
work while "waiting" for the I/O to complete.  In fact, it's quite
possible that when the I/O completes the process will resume on a
different CPU.

     I would also discourage the use of top.  It's not a Sun tool, they
don't support it, and I've personally seen it give incorrect
information.  If you're running Solaris 8 or later, then you can use
prstat instead.

     I'd recommend continuing to monitor your box.  Use sar or vmstat /
mpstat / iostat.  With iostat, the most useful options are "-xMn."  Look
for disks with service times > 30 ms or %busy > 25%.  Ignore statistics
for metadevices and concentrate on the statistics for the disks
themselves.  HTH...
-----------------------------------------------------------------------

Thanks to everyone for your help!

Shane



-----Original Message-----
From: Shane Hickey [mailto:shane@howsyournetwork.com] 
Sent: Thursday, January 30, 2003 12:09 PM
To: sunmanagers@sunmanagers.org
Subject: high iowait with raid5


Howdy all,
        I apologize if this has been covered.  I did some searching and
found similar questions, but not answers, and I'm in a bit of a bind.
Anyway, I have an e250 with a gig of RAM and two 296Mhz processors.  I
can
give more specifics if needed (I don't have it in front of me).  I'm
trying
to migrate the services of another e250 onto this box (mainly sendmail,
dns,
pop3).  The difference is that I've setup RAID5 on the new box using the
instructions I found at
http://www.pennasoft.com/articles/SolarisRAID.shtml.
        Sadly, I didn't document the process as well as I should have
and
I'm not very familiar with raid on solaris.  Anyway, I do recall that I
bumped the strip size up a bit.
        Anyway, what I'm seeing is tons of sleeping sendmail processes
and a
fairly high load.  It seems like a bunch of CPU is getting tied up in
IO?
Also, I'm watching my tmpfs dwindle away to nothing.
        Here are some particulars, please let me know if I can provide
information.  I'm more of linux/freebsd person that a solaris person, so
I
don't think I know all the diagnostic commands that I should.  

df -k 
--------------
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0t0d0s0    16468538 4207068 12096785    26%    /
/proc                      0       0       0     0%    /proc
fd                         0       0       0     0%    /dev/fd
mnttab                     0       0       0     0%    /etc/mnttab
/dev/md/dsk/d5       69585470 4677568 59341065     8%    /var
swap                 2033024      32 2032992     1%    /var/run
swap                 2033008      16 2032992     1%    /tmp

iostat
---------------
   tty        md5           sd0           sd6           sd7           
cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy
wt id
   0   74 414  37   65  497  15   55    0   0    9  760  87   28   11 12
57 20

top
---------------
load averages:  0.65,  0.55,  0.46                                    
10:07:27
2606 processes:2603 sleeping, 2 zombie, 1 on cpu
CPU states: 27.3% idle,  6.3% user, 13.3% kernel, 53.0% iowait,  0.0%
swap


Thanks in advance for any assistance,
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Jan 30 14:20:53 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:02 EST