[SUMMARY] System locks up on heavy i/o

From: J. A. Landamore <jal_at_mcs.le.ac.uk>
Date: Fri Nov 19 2010 - 11:15:27 EST
Sorry about the delay in posting a sumamry.

Thanks to those who replied.

I should have mentioned that the filesystem was UFS, however I think the
underlying problem is independent of the filesystem.

Two respondents have had very similar experiences, both with LSI chipset
RAID controllers, one with X4150s.
Under heavy load, caused by either heavy I/O or cable/disk problems
causing a lot of retrys, the controllers just give up.  The solution is to
ensure cables and disks are good, replacing disks when they start to show
errors and not waiting for them to fail, and if necessary to add another
RAID controller to spread the load.

Original post below, thanks for all your help (oh and btw we're going to
add a sleep in the loop to give everything a chance to catch its breath)

John

--original post --

We have 2 X4150 that act as NFS file servers.

Each has 8Gb memory and 6 140Gb disks configured as RAID10 with an
internal Intel SAS raid controller.  OS is Solaris 10u8

Under "normal" operation everything seems fine, it supports ~100 attached
NFS clients running eclipse.

Last Friday the user space reached 100%, everything was OK until we tried
to delete some expired user accounts.  This is done with a script that, in
effect, does:

for i in 1 to 30; do
/bin/rm -rf /export/home/user$i
done

When lightly loaded this script works fine, however on this occassion when
the system was working fairly hard after the first couple of accounts had
been deleted everything stopped.
Disk lights stopped flashing, existing NFS connections stopped working and
you could log in but never got a prompt.

It required a power cycle to recover.

This is all indictative of loosing connection to the disk in some form.

There is nothing in any log to indicate a problem.  Has anyone come across
anything similar or like to guess what may be happening.

-- 
John Landamore

Department of Computer Science
University of Leicester
University Road, LEICESTER, LE1 7RH
J.Landamore@mcs.le.ac.uk
Phone: +44 (0)116 2523410       Fax: +44 (0)116 2523604
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Fri Nov 19 11:16:36 2010

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:17 EST