SUMMARY: Hitachi SAN throughput Assistance - slight corrections

From: Johnson, Chad <cmjohnson_at_uslec.com> Date: Wed Jul 07 2004 - 15:27:51 EDT · This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:34 EST

I made some mental typo's when writing this.  I have corrected in this one
(EMC to Emulex, LP90003 to LP9002L-E).

First, thank you to the many replies I received, far too many to list here
without leaving someone out.

The problem is in the use of the Emulex LP9002S (sbus) 2GB fiber channel HBA.
The sbus version of this card has a throughput limitation of approximately
42Mb/second.  Appearently, this information is known within Emulex but not
generally released to the public.  The information stated that there was some
sort of hardware limitation on the sbus version of the card.

Our testing of this card against a JNI 1GB sbus card (FC-1063) in the same
server showed a throughput of 90 Mb/second using the JNI card.  This test was
done to the same LUNs.  This removed any doubt about the ability of the
switching equipment in the SAN or the test server in use.

The 2GB pci EMC card (LP9002L-E) tested in a v480 was able to attain a
sustained rate of 137 Mb/second.  As of yet we do not have a pci 2GB JNI card
to test against.

~~~~~~~~~~~~~~~~~ Original Message ~~~~~~~~~~~~~~~~~~
Hi all,

We are using a Hitachi 9500V series SAN Array (model 9585).  In our evaluation
testing we are only able to achieve 42MB/second sustained writes (peaks at
50MB/second) to this unit.  The host HBA's are Emulex LP9002S (2GB) using the
5.01 rev driver (as per Hitachi).  The HBA's are running through Brocade
switches and we can mount LUNs from the array just fine.  Everything is
running at 2GB.

Our testing currently consists of the following:
1.  'mkfile' to create a single large (500GB) file on the mounted LUN.
2.  Multiple 'mkfile's (10) each writing a 1GB file to the mounted LUN.
3.  dd if=/dev/zero bs=8192k count=100000 of=<path to mounted LUN>.
4.  Create a large file in memory (mkfile 2g /tmp/bigfile) then using the
above dd command write it to the mounted LUN.

The problem seems to be at the point of the LP9002S adapter.  In our testing
it was discovered that if test #1 was run alone it would hit the 42Mb/second
mark as expected, but if another test #1 was run at the same time using the
same controller writing to the same LUN or another mounted LUN on the same
controller, the result was 21Mb/second for each test #1.  This held true for 3
and 4 concurrent runs of test #1.  This was verified using test 2,3 and 4.

In other words:

Single test #1 writing to c10t0d2s0  -->  42Mb/second on the device as
reported by iostat and verified by the SAN switch monitoring.

Two concurrent test #1's to c10t0d2s0 --> 42Mb/second on the device.

Single test #2 to c10t0d2s0 --> 42Mb/second on the device.

Now here is where it gets interesting......

Two concurrent test #1's, one to c10t0d2s0 and one to c10t0d3s0 -->
21Mb/second for d2 and 21Mb/second for d3.  The two disks here are on the same
HBA so that means that 42Mb/second is the most that can go through this HBA at
any one time.

The SAN switches were eliminated as a problem by bypassing them and connecting
the HBA's directly to the HDS Array.  Still maxed out at 42Mb/second.

There are two HBA's in the testing host, the same tests were run on the second
HBA with the same results.  The only increase was in the fact that each of the
HBA's could sustain 42 Mb/second.

Does anyone have experience with the Hitachi SAN Arrays?  Has anyone had a
throughput problem like this before?  Any suggestions / solutions would be
greatly appreciated.  I will (hopefully) summarize my solution.  Below is my
lpfc.conf file for the adapter.

TIA,

Chad Johnson

~~~~~~~~~~~~~~~ begin lpfc.conf ~~~~~~~~~~~~~~~~~~
#
# COPYRIGHT 2002, EMULEX CORPORATION
# 3535 Harbor Boulevard, Costa Mesa, CA 92626
#
# All rights reserved.  This computer program and related documentation
# is protected by copyright and distributed under licenses restricting
# its use, copying, distribution and decompilation.  This computer
# program and its documentation are CONFIDENTIAL and a TRADE SECRET
# of EMULEX CORPORATION.  The receipt or possession of this program
# or its documentation does not convey rights to reproduce or disclose
# its contents, or to manufacture, use, or sell anything that it may
# describe, in whole or in part, without the specific written consent
# of EMULEX CORPORATION.  Any reproduction of this program without
# the express written consent of EMULEX CORPORATION is a violation
# of the copyright laws and may subject you to criminal prosecution.
#
# $Id: lpfc.conf 1.19 2002/06/03 16:08:49 mks Exp $
#
# Solaris LightPulse lpfc (SCSI) / lpfn (IP) driver: global initialized data.
#

# Verbosity:  only turn this flag on if you are willing to risk being
# deluged with LOTS of information.
# You can set a bit mask to record specific types of verbose messages:
#
# 0x1    ELS events
# 0x2    Device Discovery events
# 0x4    Mailbox Command events
# 0x8    Miscellaneous events
# 0x10   Link Attention events
# 0x20   IP events
# 0x40   FCP events
# 0x80   Node table events
# 0x1000 FCP Check Condition events
log-verbose=1;

# Setting log-only to 0 causes log messages to be printed on the
# console and to be logged to syslog (which may send them to the
# console again if it's configured to do so).
# Setting log-only to 1 causes log messages to go to syslog only.
log-only=1;

#
# +++ Variables relating to FCP (SCSI) support. +++
#

# Setup FCP persistent bindings,
# fcp-bind-WWPN binds a specific WorldWide PortName to a target id,
# fcp-bind-WWNN binds a specific WorldWide NodeName to a target id,
# fcp-bind-DID binds a specific DID to a target id.
# Only one binding method can be used.
# WWNN, WWPN and DID are hexadecimal values.
# WWNN must be 16 digits with leading 0s.
# WWPN must be 16 digits with leading 0s.
# DID must be 6 digits with leading 0s.
# The SCSI ID to bind to consists of two parts, the lpfc interface
# to bind to, and the target number for that interface.
# Thus lpfc0t2 specifies target 2 on interface lpfc0.
# NOTE: Target ids, with all luns supported, must also be in sd.conf.
# scan-down must be set to 0 or 1, not 2 which is the default!!
#
# Here are some examples:
#                WWNN             SCSI ID
# fcp-bind-WWNN="2000123456789abc:lpfc1t0",
#               "20000020370c27f7:lpfc0t2";
# Actual Bindings:
# BEGIN: LPUTIL-managed Persistent Bindings
   fcp-bind-WWPN="50060e8000c3b0e0:lpfc0t0",
                 "50060e8000c3b0e5:lpfc0t1",
                 "50060e8000c3b0e1:lpfc1t0",
                 "50060e8000c3b0e4:lpfc1t1";
#CMJfcp-bind-WWPN="50060e8000c3b0e0:lpfc0t16",
#CMJ              "50060e8000c3b0e5:lpfc0t17",
#CMJ	      "50060e8000c3b0e1:lpfc1t16",
#CMJ              "50060e8000c3b0e4:lpfc1t17";

#
#                DID   SCSI ID
# fcp-bind-DID="0000ef:lpfc0t3";
# BEGIN: LPUTIL-managed Persistent Bindings

# If automap is set, SCSI IDs for all FCP nodes without
# persistent bindings will be automatically generated.
# If new FCP devices are added to the network when the system is down,
# there is no guarantee that these SCSI IDs will remain the same
# when the system is booted again.
# If one of the above fcp binding methods is specified, then automap
# devices will use the same mapping method to preserve
# SCSI IDs between link down and link up.
# If no bindings are specified above, a value of 1 will force WWNN
# binding, 2 for WWPN binding, and 3 for DID binding.
# If automap is 0, only devices with persistent bindings will be
# recognized by the system.
#CMJautomap=2;
automap=0;

# fcp-on:  true (1) if FCP access is enabled, false (0) if not.
fcp-on=1;

# lun-queue-depth:  the default value lpfc will use to limit
# the number of outstanding commands per FCP LUN.  This value is
# global, affecting each LUN recognized by the driver, but may be
# overridden on a per-LUN basis (see below). RAID arrays may want
# to be configured using the per-LUN tunable throttles.
lun-queue-depth=20;
#CMJlun-queue-depth=30;
#CMJlun-queue-depth=128;

# tgt-queue-depth:  the default value lpfc will use to limit
# the number of outstanding commands per FCP target.  This value is
# global, affecting each target recognized by the driver, but may be
# overridden on a per-target basis (see below). RAID arrays may want
# to be configured using the per-target tunable throttles. A value
# of 0 means don't throttle the target.
#CMJtgt-queue-depth=0;
tgt-queue-depth=512;

# lpfcNtM-lun-throttle:  the maximum number of outstanding commands to
# permit for each LUN of an FCP target that supports multiple LUNs.
# The default throttle for the number of commands outstanding to a single
# LUN of a multiple-LUN target is lun-queue-depth. For a target that
# can support multiple LUNs, it may be useful to specify a LUN throttle
# that differs from the default.
# Example: lpfc0t17-lun-throttle=48;
# says that each LUN on target 17, interface lpfc0 should be allowed
# up to 48 simultaneously outstanding commands.
#lpfc1t39-lun-throttle=10;
#lpfc0t40-lun-throttle=30;

# lpfcNtM-tgt-throttle:  the maximum number of outstanding commands to
# permit for a FCP target.
# By default, target throttle is diabled.
# Example: lpfc0t17-tgt-throttle=48;
# says that target 17, interface lpfc0 should be allowed
# up to 48 simultaneously outstanding commands.
#lpfc1t39-tgt-throttle=10;
#lpfc0t40-tgt-throttle=30;

# no-device-delay [0 to 30] - determines the length of
# the interval between deciding to fail back an I/O because there is no way
# to communicate with its particular device (e.g., due to device failure) and
# the actual fail back.  A value of zero implies no delay whatsoever.
# Cautions:  (1)  This value is in seconds.
# (2)  Setting a long delay value may permit I/O to build up,
# each with a pending timeout, which could result in the exhaustion of
# critical Solaris kernel resources.  In this case, you may see a fatal
# message such as
#           PANIC:  Timeout table overflow
#
# Note that this value can have an impact on the speed with which a
# system can shut down with I/Os pending and with the HBA not able to
# communicate with the loop or fabric, e.g., with a cable pulled.
no-device-delay=1;

#
# +++ Variables relating to IP networking support. +++
#

# network-on:  true (1) if networking is enabled, false (0) if not
# This variable will be set during the installation of the driver
# via pkgadd.
network-on=0;

# xmt-que-size:  size of the transmit queue for mbufs (128 - 10240)
xmt-que-size=256;

#
# +++ Variables common to both SCSI (FCP) and IP networking support. +++
#

# Some disk devices have a "select ID" or "select Target" capability.
# From a protocol standpoint "select ID" usually means select the
# Fibre channel "ALPA".  In the FC-AL Profile there is an "informative
# annex" which contains a table that maps a "select ID" (a number
# between 0 and 7F) to an ALPA.  If scan-down is set to a value of 0,
# the lpfc driver assigns target ids by scanning its ALPA map
# from low ALPA to high ALPA.
#
# Turning on the scan-down variable (on = 1,2, off = 0) will
# cause the lpfc driver to use an inverted ALPA map, effectively
# scanning ALPAs from high to low as specified in the FC-AL annex.
# A value of 2, will also cause target assignment in a private loop
# environment to be based on the ALPA (hard addressed).
#
# Note: This "select ID" functionality is a PRIVATE LOOP ONLY
# characteristic and will not work across a fabric.
#CMJscan-down=2;
scan-down=0;

# Determine how long the driver will wait to begin linkdown processing
# when a cable has been pulled or the link has otherwise become
# inaccessible, 1 - 255 secs.  Linkdown processing includes failing back
# cmds to the target driver that have been waiting around for the link
# to come back up.  There's a tradeoff here:  small values of the timer
# cause the link to appear to "bounce", while large values of the
# timer can delay failover in a fault tolerant environment. Units are in
# seconds. A value of 0 means never failback cmds until the link comes up.
#CMJlinkdown-tmo=30;
linkdown-tmo=60;

# If set, nodev-holdio will hold all I/O errors on devices that disappear
# until they come back. Default is 0, return errors with no-device-delay.
nodev-holdio=0;

# If set, nodev-tmo will hold all I/O errors on devices that disappear
# until the timer expires. Default is 0, return errors with no-device-delay.
nodev-tmo=0;

# Use no-device-delay to delay FCP RSP errors and certain check conditions.
delay-rsp-err=0;

# Treat certain check conditions as an FCP error.
check-cond-err=0;

# num-iocbs:  number of iocb buffers to allocate (128 to 10240)
#num-iocbs=1024;
num-iocbs=2048;

# num-bufs:  number of ELS buffers to allocate (128 to 4096)
# ELS buffers are needed to support Fibre channel Extended Link Services.
# Also used for SLI-2 FCP buffers, one per FCP command, and Mailbox commands.
num-bufs=1024;

# topology:  link topology for initializing the Fibre Channel connection.
#          0 = attempt loop mode, if it fails attempt point-to-point mode
#          2 = attempt point-to-point mode only
#          4 = attempt loop mode only
#          6 = attempt point-to-point mode, if it fails attempt loop mode
# Set point-to-point mode if you want to run as an N_Port.
# Set loop mode if you want to run as an NL_Port.
topology=2;

# Set a preferred ALPA for the adapter, only valid if topology is loop.
# lpfc0-assign-alpa=2;  Request ALPA 2 for lpfc0

# ip-class:  FC class (2 or 3) to use for the IP protocol.
ip-class=3;

# fcp-class:  FC class (2 or 3) to use for the FCP protocol.
fcp-class=3;

# Use ADISC for FCP rediscovery instead of PLOGI.
use-adisc=0;

# Extra FCP timeout for fabrics (in seconds).
fcpfabric-tmo=0;

# Number of 4k STREAMS buffers to post to IP ring.
post-ip-buf=128;

# Set to 1 to decrement lun throttle on a queue full condition.
dqfull-throttle=1;

#Use dqfull-throttle-up-time to specify when to increment the current Q
depth.
# This variable is in seconds.
dqfull-throttle-up-time=1;

# Increment the current Q depth by dqfull-throttle-up-inc
dqfull-throttle-up-inc=1;

# Use ACK0, instead of ACK1 for class 2 acknowledgement.
ack0=0;

# cr-delay: Coalesce Response Delay
# This value specifies a count of milliseconds after which an interrupt
response
# is generated if cr-count has not been satisfied. This value is set to 0
# to disable the Coalesce Response feature as default.
cr-delay=0;

# cr-count: Coalesce Response Count
# This value specifies a count of I/O completions after which an interrupt
response
# is generated. This feature is disabled if cr-delay is set to 0.
cr-count=0;

# Used only by i386 FCP (SCSI)
flow_control="duplx" queue="qfifo" disk="scdk" tape="sctp";

# Solaris/x86 only:  select allocation of memory for DMA.  THIS VARIABLE
# CAN AFFECT WHETHER LPFC RUNS CORRECTLY ON AN X86 PLATFORM.  The Solaris
# DDI specification mandates the use of ddi_dma_mem_alloc when allocating
# memory suitable for DMA.  This memory comes from a pool reserved at
# boot-time and sized by a variable called "lomempages"; this variable
# may be set in /etc/system.  The variable defaults to a small value, e.g.,
# 36 pages, which isn't nearly enough for LPFC when running IP.  Typically,
# we've cranked the value up to 1100 pages or so.  But this pool represents
# precious "low memory" on a PC -- memory below the 16M address boundary.
# This memory is also needed by the OS and other drivers.
#
# On some machines, we can get away with using kmem_zalloc instead of
# ddi_dma_mem_alloc, thus avoiding the requirement to use lomempages.
# However, this trick is NOT portable!  Some x86 systems absolutely need
# to use lomempages for their DMA.
#
# So... if you think your x86 system is one of those that requires the
# use of lomempages, set this variable to one.  Be sure to pick a suitable
# value for lomempages in /etc/system; the value depends on how many of
# the various kinds of buffers you allocate for IP and SCSI.  Otherwise,
# set this variable to zero and relax, as then lpfc can allocate the
# memory it needs without further input from you.
use-lomempages=0;

# Old Open Boot Prom (SPARC): if your SPARC doesn't have a sufficiently
# recent version of OBP, it may be unable to probe and identify a
# LightPulse adapter.  You will need to use the following workaround.
# Important note:  you can't just use the following three lines "as is"!
# Refer to the Solaris LightPulse Device Driver documentation for details.
#reg =  0x00801000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, # PCI
#       0x02801010, 0x00000000, 0x00000000, 0x00000000, 0x00001000, # SLIM
#       0x02801018, 0x00000000, 0x00000000, 0x00000000, 0x00000100; # CSRs

# link-speed:  link speed selection for initializing the Fibre Channel
connection.
#       0 = auto select (default)
#       1 = 1 Gigabaud
#       2 = 2 Gigabaud
link-speed=2;

# MultiPulse configuration
#
# multipulse-fcp="lpfcXtYdZ:tr1: ... :lpfcXtYdZ:trN:route_flags";
# Where X, Y, and Z are the devices lpfc DDI interface, target, and LUN
numbers.
# tr1 thru trN, when N is up to 4, are the paths component of the traffic
ratio
# route_flags MUST be a 4 digit hex number for the following flags:
#  Load balancing flags (1 means cold standby)
#  MPL_PC_TYPE_FAILOVER           0x0001 /* cold standby */
#  MPL_PC_TYPE_TRAFFIC_RATIO      0x0002 /* paths balanced by traffic ratio
*/
#  MPL_PC_TYPE_DYNAMIC_LUNQ       0x0003 /* Dynamic balancing algrithnm used
*/
#  MPL_PC_TYPE_DYNAMIC_TGTQ       0x0004 /* Dynamic balancing algrithnm used
*/
#  MPL_PC_TYPE_DYNAMIC_HBAQ       0x0005 /* Dynamic balancing algrithnm used
*/
#
#  General flags
#  MPL_PC_CFG_AUTO_FAILBACK       0x0010 /* failback to primary path is auto
*/
#  MPL_PC_CFG_VALIDATE_INQUIRY    0x0020 /* Vaildate all paths with inquiry
*/
#  MPL_PC_CFG_INQUIRY_HEARTBEAT   0x0040 /* Validate path inquiry heartbeat
*/
#  MPL_PC_CFG_FAIL_ABORT_TASK_SET 0x0080 /* Send ABORT_TASK_SET on failover
*/
#  MPL_PC_CFG_FAIL_LUN_RESET      0x0100 /* Send LUN_RESET on failover */
#  MPL_PC_CFG_FAIL_TARGET_RESET   0x0200 /* Send TARGET_RESET on failover */
#
# The following example uses cold standby as path control and automatic
failback
#
# Here is a sample configuration to setup lpfc0 target 0 lun 1 failover
# to the MultiPulse device, lpfc1 target 0 lun 1 and lpfc0 target 0 lun 2
# failover to the MultiPulse device, lpfc1 target 1 lun 4.
# multipulse-fcp="lpfc0t0d1:0:lpfc1t0d1:0:0010",
#                "lpfc0t0d2:0:lpfc1t1d4:0:0010";
#
# BEGIN: MultiPulse managed entries
#
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers