SUMMARY (update): SCSI errors

From: Mark C. Farone (farone@gainesville.fl.us)
Date: Tue Nov 25 1997 - 11:23:42 CST


At 12:30 PM -0500 11/20/97, Mark C. Farone wrote:
>SUMMARY OF PROBLEM:
>
>SCSI tagged queuing cmd timeout errors (see orignal post below).
>
>ATTEMPTS AT A SOLUTION:
>
> o Disable tagged queuing (TQ) for the entire system:
> -Add this to /etc/system.
> set scsi_options=0x80

Apparently, this scsi_option turns off many scsi options such as fast,
wide, and disconnects. With disconnects disabled, tagged queuing will be
unavailable. To turn off TQ (as per the FAQ), the line should be:
   set scsi_options & ~0x80
However, this didn't work in my situation and it was a coinicidence that
"=0x80" worked. Another possibility might be to try "=0x178" instead.
Some scsi_options are listed in /usr/include/sys/scsi/conf/autoconf.h.

Thanks to Casper Dik <casper@holland.Sun.COM>.

>Temporarily turning off TQ provided a quick solution, but subtantially
>degraded performance. Solaris also sent 10 Warning messages to politely
>acknowledge that TQ was disabled.
>
> o Throttle the number of TQ commands:
> -Add this to /etc/system:
> forceload: drv/esp
> set sd:sd_max_throttle=10
>
>This reduced the number of allowed TQ commands, but I still received
>timeout errors.
>
> o Disable TQ for a specific target or controller:
> -Add this to /kernel/drv/esp.conf for a specific target
> name="esp" parent="/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000"
> reg=0xf,0x800000,0x40
> target1-scsi-options=0x58
> scsi-options=0x178;
>
>This option was supposed to turn off TQ for the specific target, but it
>turned off TQ for the entire controller. I tried tweaking it, but I wasn't
>able to turn off TQ for a specific target. I don't know if it was my poor
>kung-fu or the extent of the problem.
>
>BOTTOM LINE:
> 1. One of the disks did not seem to properly support tagged queuing.
>Turning off TQ on the entire controller was necessary to support this
>not-fully-SCSI-2 disk.
> 2. The DAT drive (which I didn't suspect at first) is having SCSI-level
>hardware trouble. To say it another way, this DAT causes the same TQ
>errors on *other targets* when attached to my test system (a SS10).
>
>KUDOS TO:
>David Schiffrin <daves@adnc.com> (thanks for the resend, too)
>Joel Lee <jlee@thomas.com>
>Sanjay Srivastava <sanjays@netcom.com>
>bismark@alta.Jpl.Nasa.Gov (Bismark Espinoza)
>
>
>At 2:35 PM -0500 11/12/97, Mark C. Farone wrote:
>>I have a SparcStation20 running Sol2.5.1, primarily as a host for Sybase
>>SQL Server.
>>
>>Periodically for the past 2 weeks, when writing into the raw disk used by
>>Sybase at c1t5d0s5, I get this message:
>>
>>Nov 12 13:42:23 sun3 unix: WARNING:
>>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>>0/esp@0,80000 (esp1):
>>Nov 12 13:42:23 sun3 unix: Disconnected tagged cmds (8) timeout for
>>Target
>>5.
>>Nov 12 13:42:24 sun3 unix: 0
>>Nov 12 13:42:24 sun3 unix: WARNING:
>>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>>0/esp@0,80000/sd@5,0 (sd20):
>>Nov 12 13:42:24 sun3 unix: SCSI transport failed: reason 'timeout': re
>>Nov 12 13:42:25 sun3 unix: trying command
>>Nov 12 13:42:25 sun3 unix: WARNING:
>>/iommu@f,e0000000/sbus@f,e0001000/dma@0,8100
>>0/esp@0,80000/sd@5,0 (sd20):
>>Nov 12 13:42:25 sun3 unix: SCSI transport failed: reason 'reset': retr
>>Nov 12 13:42:25 sun3 unix: ying command
>>
>>
>>What I have tried:
>> 1. Upgraded to new harddisks (which I had planned to do anyway).
>> 2. Tried new cables.
>> 3. Tried new active terminators.
>> 4. Tried reseating the card.
>> 5. Tried putting the disks on another controller (c0). In this case, I
>>get the same error, just specific to c0. In fact, I moved everything off c1
>>and put them all on c0 (which, btw is where the / fs lives).
>>
>>For what it's worth, currently I have a DAT drive at c1t4, and harddisks at
>>c0t3, c1t1 and c1t5.
>>
>>It appears that regardless of the controller, disks, or cables, I get this
>>error which points to the raw disk used by Sybase.
>>
>>An impact of this problem is that Sybase blocks all other spid's until the
>>SCSI times out (between 1-2 minutes!) during which the spid is unkillable.
>>
>>Of course, this isn't happening on any other machines with exactly the same
>>hardware and software setup.
>>
>>Thanks for *any* help.

--
Mark C. Farone                               Why read when you can
Systems Analyst, Gainesville Sun             Just sit and stare at things?
farone@gainesville.fl.us

The above message is not a corporate endorsement, edict or policy.



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:12:10 CDT