SUMMARY: too many disks on one controller?

From: Marina Daniels (Marina.Daniels@ccd.tas.gov.au)
Date: Sun Dec 15 1996 - 18:44:46 CST


Thanks to:

Anthony Vialle
Jon Anderson
Russ Poffenberger
Brad Young
Greg Polanski
David Theroff
Ric Anderson

*** ORIGINAL QUESTION ****

>> I have a SPARC-20 running Solaris 2.3.
>> (and yes, we will be upgrading to solaris 2.5 within a couple of months)
>>
>> I get the following error messages sometimes when running backups with
>> ufsdump -
>> my understanding is that there is not a problem with the disk, it's just that
>> there is too much traffic on the controller - is this correct? Do I need todo
>> anything about it?
>>
>> Are there any rules for 1) how many disks 2)order you have disks, on a
>> particular controller? We have 5 disks on controller 0 (no error messages)
>> and 4 on controller 1 which is producing the error messages
>>
>> (It's disk 8 below, that's giving the errors)
>>
>> AVAILABLE DISK SELECTIONS:
>> 0. c0t0d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> new1
>>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@0,0
>> 1. c0t1d0 <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>
>>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@1,0
>> 2. c0t2d0 <SEAGATE-ST15150N-0017 cyl 3696 alt 2 hd 21 sec 108>
>>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@2,0
>> 3. c0t3d0 <SUN1.05 cyl 2036 alt 2 hd 14 sec 72>
>>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0
>> 4. c0t5d0 <SUN2.1G cyl 2733 alt 2 hd 19 sec 80> new2
>>
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@5,0
>> 5. c1t1d0 <SEAGATE-ST42100-7614 cyl 2549 alt 2 hd 15 sec 97>
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@1,0
>> 6. c1t2d0 <SEAGATE-ST32550N-0016 cyl 3495 alt 2 hd 11 sec 109>
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@2,0
>> 7. c1t5d0 <SEAGATE-ST41650-6050 cyl 2070 alt 2 hd 15 sec 89>
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@5,0
>> 8. c1t6d0 <SEAGATE-ST42100-8224 cyl 2549 alt 2 hd 15 sec 97>
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@6,0
>>
>>
>> ***The errors:
>>
>>
>> Dec 13 05:02:10 kite unix: esp: pkt_state=0xb<XFER,SEL,ARB> pkt_flags=0x4000
>> pkt_statistics=0x3
>> Dec 13 05:02:10 kite unix: esp: cmd_flags=0x10422 cmd_timeout=60
>> Dec 13 05:02:10 kite unix: WARNING:
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000 (esp1):
>> Dec 13 05:02:10 kite unix: Target 6.0 reducing sync. transfer rate
>> Dec 13 05:02:10 kite unix: polled command timeout
>> Dec 13 05:02:10 kite unix: esp: State=DATA Last State=DATA_DONE
>> Dec 13 05:02:10 kite unix: esp: Latched stat=0x91<IPND,XZERO,IO>
intr=0x10<BUS>
>> fifo 0x83
>> Dec 13 05:02:10 kite unix: esp: last msg out: <unknown msg>; last msg in:
>> IDENTIFY
>> Dec 13 05:02:10 kite unix: esp: DMA csr=0x40040010<INTEN>
>> Dec 13 05:02:10 kite unix: esp: addr=fc01d000 dmacnt=0 last=fc01b000
>> last_cnt=2000
>> Dec 13 05:02:10 kite unix: esp: Cmd dump for Target 6 Lun 0:
>> Dec 13 05:02:10 kite unix: esp: cdblen=10, cdb=[ 0x28 0x0 0x0 0x2a 0x64 0xc0
>> 0x0 0x0 0x10 0x0 ]
>> Dec 13 05:02:10 kite unix: esp: pkt_state=0xb<XFER,SEL,ARB> pkt_flags=0x4001
>> pkt_statistics=0x3
>> Dec 13 05:02:10 kite unix: esp: cmd_flags=0x10422 cmd_timeout=60
>> Dec 13 05:02:10 kite unix: WARNING:
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@6,0 (sd21):
>> Dec 13 05:02:10 kite unix: SCSI transport failed: reason 'data_ovr': r
>> Dec 13 05:02:10 kite unix: etrying command
>> Dec 13 05:02:10 kite unix: WARNING:
>> /iommu@f,e0000000/sbus@f,e0001000/dma@2,81000/esp@2,80000/sd@6,0 (sd21):
>> Dec 13 05:02:10 kite unix: SCSI transport failed: reason 'reset': retr
>> Dec 13 05:02:11 kite unix: ying command
>>

ANSWERS: (in no particular order)

1) I'd check cable lengths and termination (especially if non-sun
disks are involved; some OEMs ship the drives with internal
termination enabled which causes all kinds of weird errors
when mixed into a string with an external terminator).

Regular SCSI is limited to 6 meters of cable on paper, but I find
12 feet to to be a more realistic limit in practice. And remember
to consider the length of cable inside any enclosures as well.

**********

2)
Number of disks is not limited other than obvious target address limitation
ie targets 0 - 6 (7 disks) I have operated Dual CPU SPARC 10 under 2.4 with
five disks on a single controller plus an exabyte and a CD-ROM so I don't
think this is your problem.

The SCSI bus would be what I would be suspicious of, most likely a marginal
cable (or possible terminator), which causes problems under load but not in
general use. First thing I would try is swapping cables (Is the problem
disk last in the chain?).

****

3)
If you are using narrow scsi-2, the rule is 8 devices on the bus, including the
controller. If you use wide scsi, you can put 16 devices on the bus. There is
no designated order on any bus, including the location of the controller. Any
device can be anywhere.

Sice you have 5 disks on c0, you have 6 devices, which is fine. The maximum
bus length for a narrow, scsi-2 bus is 6 meters.

I think you should probably check the bus length, including internal cabling,
and also check your connectors and cables.

4)
Sun recommends no more than 4 per controller. controller 0 would be expected to
be a trouble spot given your config, but the problem appears to occur on
controller 1.

Even with just 4 (I assume external), SCSI cable length and termination become
critical factors. I would use the highest quality shielded cables and forced
perfect terminators. Try to keep the cables as short as possible. Keep in mind
that external enclosures have internal cabling that must be taken into account.

5)
I'd check the cable length and the termination for that SCSI chain. See if
there is any way to reduce the total length.. Also, check termination and
power to the drives..



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:18 CDT