SUMMARY:SCSI hang

From: Rashad Al-Yawir (rashad@ii.uni.wroc.pl)
Date: Tue Mar 28 1995 - 18:41:41 CST


Hi all,
        Two weak ago I posted the following question

> We keep receiving the following message on the console of our NFS
>server. The machine is SparcStation 10 running Solaris 2.3 with the following
>patches installed :
> 101378-10 101331-03 101615-01 101556-01 101615-02
> 102034-01 101331-05 101317-12
>
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
> Connected command timeout for Target 2.0
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000 (esp0):
> Target 2.0 reducing sync. transfer rate
> polled command timeout
> esp: State=DATA_DONE Last State=DATA
> esp: Latched stat=0x91<IPND,XZERO,IO> intr=0x10<BUS> fifo 0x80
> esp: last msg out: <unknown msg>; last msg in: IDENTIFY
> esp: DMA csr=0xa6240310<EN,IN,INTEN>
> esp: addr=fc008f08 dmacnt=8000 last=fc008d08 last_cnt=2000
> esp: Cmd dump for Target 2 Lun 0:
> esp: cdblen=6, cdb=[ 0x8 0x13 0x6b 0xec 0x10 0x0 ]
> esp: pkt_state=0x7<CMD,SEL,ARB> pkt_flags=0x4001 pkt_statistics=0x41
> esp: cmd_flags=0x10422 cmd_timeout=60
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@2,0 (sd2):
> SCSI transport failed: reason 'timeout'
> retrying command
> WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/esp@f,800000/sd@3,0 (sd3):
> SCSI transport failed: reason 'reset':
> retrying command
>
>Can someone tell what is causing this.
>
>There are 3 disks and ExaByte tape driver connedted to SCSI.
>

Thanks to
Dan Stromberg <strombrg@hydra.acs.uci.edu>
Kevin Sheehan <Kevin.Sheehan@uniq.com.au>
Jean Y. Edgar <jye@meaddata.com>
Pat McMillan <pdm@pivot.sbi.com>
Daniel W. Fitzwilliam <dwfitzwi@ccifl.com>

I receieve the followin answers :

-----------------------------------------------------
From: Kevin.Sheehan@uniq.com.au (Kevin Sheehan {Consulting Poster Child})

a) cables - they should be short, and ideally impedance matched
b) termination.
c) some third party disks need to have tagged queueing turned off.

----------------------------------------------------
From: jye@meaddata.com (Jean Y. Edgar)

Our Sun representative recommended that on our Sparc10s that have a
lot of SCSI traffic as well as a lot of disk I/O we use le1 instead
of le0. I've only made that change on two of my servers so far, so
I don't know yet if it really makes a difference.

---------------------------------------------------
From: pdm@pivot.sbi.com (Pat McMillan)

How much cable are you using? I have heard that you should try to keep it under
3 meters (9ft). I was getting alot of resets before I installed patch 101378-10.
It appears as if I had a usage problem but you might want to look at shorting up
the cables since you have four devices. You might want to spread things out to
another controller.

---------------------------------------------------
From: dwfitzwi@ccifl.com (Daniel W. Fitzwilliam)

Sounds like a SCSI termination problem. Make sure the SCSI bus is terminated. Also, make sure it is terminated only once, preferably at the end of the SCSI bus or on the last SCSI device.

---------------------------------------------------
From: Dan Stromberg - OAC-CSG <strombrg@hydra.acs.uci.edu>

Try turning off "tagged queuing" first. if that doesn't help, try
turning off "fast scsi".

Turning off "tagged queuing" isn't that big a deal, but if you need to
turn off "fast scsi" to get the error to stop, check your cables. If
fixing up cabling issues doesn't allow you to run "fast scsi", you
might want to speak with your hard drive vendor about getting a
replacement.

#SCSI option value to set the corresponding bit to 1
#Disconnect/reconnect 0x008 (bit3=1, starting with bit 0)
#Linked commands 0x010 (bit4=1)
#Synchronous transfer 0x020 (bit5=1)
#Parity 0x040 (bit6=1)
#Tagged Queuing 0x080 (bit7=1)
#Fast scsi 0x100 (bit8=1, or bit 9 if starting with 1)
#Wide scsi 0x200 (bit9=1)

# 3f8 wide, fast, tagged, linked, synch, parity
# 378 wide, fast, linked, synch, parity
# 1f8 fast, tagged, linked, synch, parity
# 178 fast, linked, synch, parity
# 078 linked, synch, parity
# 058 linked, parity

# inspect with:
# adb -k /kernel/unix /dev/mem
# scsi_options/X
# $q (to exit adb)
----------------------------------------------------

When I check length of the cables, I found that it execeded 3 meters. I think that
this was the problem. I will move one of the disks to another machine.

Thank you again.

Rashad Al-Yawir



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:10:20 CDT