SUMMARY - SCSI Bus Errors

From: Ralph Howard (rahjr@orca.tpsinc.com)
Date: Fri Nov 25 1994 - 22:21:04 CST


Sun Managers,

Sorry for the delay in summarizing my question, but I had
been out sick for a week. Anyway, I received a number of responses
to my post, but no direct answer (though Chris Lawrence did point
me in the right direction) to the question I was asking. My
original post follows:

------

I am receiving the following errors on my SPARCstation 10:

Nov 10 09:47:12 hopey vmunix: esp0: bad sequence step (0x6) in selection
Nov 10 09:47:12 hopey vmunix: sd0: SCSI transport failed: reason 'reset': retrying command

I have one internal disk and 3 external disks on this machine.
>From what I have read on this list in the past, I have noted that

        - the SS10 supports fast, narrow SCSI-2, but it has
            a "single-ended" SCSI bus.
        - if any one device on the the "single-ended" SCSI
            bus supports fast SCSI then the entire bus
            becomes sensitive.
        - the SCSI bus will only operate as fast as the
            slowest drive on it.

I know that at least one of my disks is fast SCSI-2. So, I assume
that the errors are due to the fast disk trying to synchronize its
speed down. My question is:

        What exactly does "bad sequence step (0x6) in selection"
        mean? What is (0x6)?

Thank you in advance.

        - ralph howard

------

I was looking for an explanation of what "(0x6)" is as opposed to say
"(0x7)" which I have also seen in the past. Thanks go to the following
for responding (their posts are included at the end):

        frode@read-well.no (Frode Stromsvag)
        Christian Lawrence <cal@soac.bellcore.com>
        Richard_Ravich@internet.microp.com (Richard Ravich)
        Birger.Wathne@vest.sdata.no (Birger A. Wathne)
        "David M. Di Gioia" <David.Digioia@gain.com>
        hanson@pogo.fnal.gov (Steve Hanson)

---------------

From: frode@read-well.no (Frode Stromsvag)

I've experienced exactly the same problem and would like to know
what the message means

Nov 10 09:47:12 hopey vmunix: esp0: bad sequence step (0x6) in selection
Nov 10 09:47:12 hopey vmunix: sd0: SCSI transport failed: reason 'reset': retrying command

So, please make a summary.

I experienced this when we got a SS20 clone (Tatung COMPstation 20) with
a 50 Mhz SuperSPARC in it. Attached to it were 3 disks. One of these
initiated these meesages. This also happened when I put this disk and
a boot disk internally (to reduce cable length). When moving this
disk to another workstation (whether SS10 (Axil 311), IPC or SS2) the
problem did not appear.

So, only this Seagate Wren 8 disk (ST41651) on this SS20.

I did not find an explanation. My workaround was to put this disk
on the SS10 and take another from the SS10 to put on the SS20.

Please keep me informed!

Frode Stromsvag email: frode@read-well.no
READ Well Services a.s.
Ravnsborgveien 56
P.O. Box 25
1364 Hvalstad Phone number: +47-66982240
NORWAY FAX number: +47-66982022

--------------

From: Christian Lawrence <cal@soac.bellcore.com>

        - the SS10 supports fast, narrow SCSI-2, but it has
            a "single-ended" SCSI bus.

yes, as opposed to differential. single ended has one physical
wire (transmission line) to propogate each signal

        - if any one device on the the "single-ended" SCSI
            bus supports fast SCSI then the entire bus
            becomes sensitive.

yes, this is true because the higher frequency demands better impendance
matching on the lines. I always use "forced perfect terminators" on these
adapters...they essentially adjust impedance levels (resistance/capacitance)
to compensate for many cable variants, etc..

        - the SCSI bus will only operate as fast as the
            slowest drive on it.

not true. fast drives can go fast if they want to ( .... its negotiable and
depends on firmware-driver interaction). what does happen, however, with mixed
env's is that bus cycles are lost and the bus is blocked as the controller
waits for settling time to access the slow guys.

    Nov 10 09:47:12 hopey vmunix: esp0: bad sequence step (0x6) in selection

        What exactly does "bad sequence step (0x6) in selection"
        mean? What is (0x6)?

this is related to bus arbitration targeted at adjusting the synchronous
period (i.e. speed). this proceeds as a series of steps (like a state machine)
and as part of the handshake an invalid code was returned - probably from
noise floating on the line.

You might want to get your hands on some of those terminators (mine are from
NuData).

---------------

From: Richard_Ravich@internet.microp.com (Richard Ravich)

Ralph,

I have been thru this problem a couple of times. The solution is that there is
a SUN patch that you need to install to solve it. Unfortunately, I'm in the
process of moving offices and I can't get top my old notes to give you the exact
patch number, but I'm sure if you contact your local SUN sales/technical person,
they can tell you the patch number.

I believe (but not too sure if this is totally accurate) that there were some
bad SCSI controllers that were made that showed this problem.

Best regards from Micropolis.

Richard Ravich
Sr. Applications Engineer
Micropolis Corporation

---------------

From: Birger.Wathne@vest.sdata.no (Birger A. Wathne)

The SCSI chain negotiates speed with each device. So your fast device should
run in fast mode. The problem may be that you have some old
device that doesn't negotiate properly, in which case you should
perhaps force you chain to synchronous mode, but I would rather start
looking for cabling problems, terminators, etc. If sd0 is an internal disk
delivered with the system, it should definitely run at 10Mb/s.

Birger

---------------

From: "David M. Di Gioia" <David.Digioia@gain.com>
>
> I am receiving the following errors on my SPARCstation 10:
>
> Nov 10 09:47:12 hopey vmunix: esp0: bad sequence step (0x6) in selection
> Nov 10 09:47:12 hopey vmunix: sd0: SCSI transport failed: reason 'reset': retrying command
>
> I have one internal disk and 3 external disks on this machine.
> >From what I have read on this list in the past, I have noted that
>
> - the SS10 supports fast, narrow SCSI-2, but it has
> a "single-ended" SCSI bus.

True.

> - if any one device on the the "single-ended" SCSI
> bus supports fast SCSI then the entire bus
> becomes sensitive.

True.

> - the SCSI bus will only operate as fast as the
> slowest drive on it.
>

False; a speed is negotiated with each drive on the bus; a slow drive will
transfer at a slow rate, and a fast drive will transfer at a fast rate
UNLESS there are problems with the SCSI bus or device which cause the
device to "synch up" at a lower speed than it is capable of.

> I know that at least one of my disks is fast SCSI-2. So, I assume
> that the errors are due to the fast disk trying to synchronize its
> speed down. My question is:

No. Many times, errors (due to noise on the bus, usually from mismatched
cable impedances, cables too long, incorrect terminator types, etc.) CAUSE
the devices on the bus to resynch. at lower rates (rates low enough to be
"reliable"). They may even drop down to asynch.

> What exactly does "bad sequence step (0x6) in selection"
> mean? What is (0x6)?

You need a patch; I don't know the number, but the problem is fixed by the
patch. Check SunSolve, or call ANDATACO for the patch if the disks came
from them.

Good luck.
-DMD

---------------

From: hanson@pogo.fnal.gov (Steve Hanson)

In info.sun-managers you write:
You don't say what version of the OS you're running. This is likely an OS bug,
and most likely you need a patch. Let me know what OS you're running.

_______________________________________________________________________________

Ralph Howard
Telecommunications Premium Services
120 Wood Avenue South, Suite 404 Voice/Fax: (908)632-3855
Iselin, New Jersey 08830 rahjr@tpsinc.com



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:09:15 CDT