Summary: nfs server not responding - SCSI transport failed

From: Dan Penrod (
Date: Mon Feb 12 1996 - 14:51:16 CST

Here is my original query, the solution is below...

>Sun Managers:
>I'm having a problem with our nfs servers named 'elroy' and Sun Support is,
>as usual, totally worthless. Machines remotely using elroy's disk get the
>following message.
> NFS server elroy not responding still trying
> NFS server elroy ok
> NFS server elroy not responding still trying
> NFS server elroy ok
>...resulting in abysmally slowwww performance. I've tried rebooting both
>elroy and its client machines with no improvement. The SunSolve database
>shows no such known problem under Solaris 2.4, which elroy currently runs.
>Looking at elroy:/var/adm/messages I notice the following error messages...
> Feb 5 14:01:03 elroy unix: eout for Target 0.0
> Feb 5 14:01:03 elroy unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/
> espdma@f,400000/esp@f,800000/sd@0,0 (sd0):
> Feb 5 14:01:03 elroy unix: SCSI transport failed: reason 'timeout':
> retrying command
> Feb 5 14:01:03 elroy unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/
> espdma@f,400000/esp@f,800000 (esp0):
> Feb 5 14:01:03 elroy unix: Disconnected tagged cmds (3) timeout for Target
> 0.0Feb 5 14:01:03 elroy unix: WARNING: /iommu@f,e0000000/sbus@f,e0001000/
> espdma@f,400000/esp@f,800000/sd@0,0 (sd0):
> Feb 5 14:01:03 elroy unix: SCSI transport failed: reason 'timeout':
> retrying command
>This might explain the inability to nfs serve the disk at target 0.
>The SunSolve database shows a reported bug #1194263 which appears to be
>identical. I've attached that html document. It offers no fix but does
>suggests one possible workaround...
> "set sd:sd_max_throttle=10"
>Any idea where can I make this configuration change?
>Anyone know if this problem is hardware or software?

The answer is that the changes can be made to the /etc/system file. After
you edit the file you must reboot. I received a lot of different suggestions
as to what to put in that configuration file which I will describe below.

The other answer is that it's hardware and software. Configurations to
software can change the way hardware is accessed. No, there doesn't seem
to be a patch... there is a hardware solution. (Dotty Pon) writes... "shorten your scsi cables."
I tried that... no go. (Bismark Espinoza) writes... "check network load
and NFS parameters, also tagged command queueing." Well, it's not the
network, it's definately a scsi problem. Later I discuss how to handle
command queueing. (Mike Salehi) writes... "You can put those changes
in /etc/system and reboot." Right. Thanks. (James Coby) writes... "take a look at /etc/system
file." Right again. Thanks.

Casper Dik <casper@holland.Sun.COM writes...
>The solaris FAQ says:
>3.29) I have all kinds of problems with SCSI disks under Solaris 2.x
> They worked fine under SunOS 4.x.
> Append this line to /etc/system and reboot:
> set scsi_options & ~0x80
> This turns off Command Queuing, which upsets rather a lot
> of SCSI drives.
> In Solaris 2.4 and later you can set those options per SCSI
> bus. See isp(7) and esp(7).
> For some disks, all you need to do is decrease the maximum number of
> queued commands:
> forceload: drv/esp
> set sd:sd_max_throttle=10
He also say to check the scsi-cables and terminators. This was a really
good answer so I copied the whole thing here. (John Baldwin) writes...
>This message indicates the the system sent data over the SCSI bus, but the
>data never reached its destination because of a bus reset..
>I have seen this problem before when you mix different devices on the same
>scsi chain ex. fast 10 Mb/s and 5 Mb/s scsi disks or devices...check and make
>sure the devices on the chain are consistent, with controller...check lenght
>of cable, check termination...make sure your power supply is consistent,
>check target addresses for conflicts...I am also sending you additional info
>in a seperate e-mail...The configuration that you mentioned should be added
>to the /etc/system file and then reboot....
Yea, good point. I'm sure this is common these days.

Kent R Arnott <> writes... "im getting the same
>problems if you find a solution let me know i have tried several things
>and can not get them to work..."
Here you go Kent hope one of these things helps you. (Glenn Satchell)
>If possible the fast scsi-2 devices (ie most disks less than two years
>old) should be the first things on the bus, and slower scsi-1 devices
>at the far end (relative to the cpu).
>The possible workaround goes in the file /etc/system, and the system
>then needs to be rebooted for it to take effect. But I'd investigate
>the cables and hardware first.
I tried changing the order of devices. No good. I've also swapped out all
the cables and terminators. Nope. (Kevin W. Thomas) also writes... "check
cables and termination." Nope, that's not it.

"Daniel M. Quinlan" <> writes...
>Well, I'd take a look at the length of the scsi chain that disk is
>on and also what other scsi devices are on that chain. I believe
>you're not supposed to mix certain kinds of scsi devices, and that might
>be some of the problem. There's a very interesting program "scsiinfo"
>which you can get from which might tell you something
>useful about the other things on the chain. Another possibility of
>course is that you're just looking at a hardware failure.
I did download the scsiinfo utility. It's a small text based unix command
which I found to be very interesting. I recommend everyone keep a copy
of scsiinfo on every machine. Thanks Daniel.

Henry Katz <> writes...
>the set command goes in /etc/system and configure a kernel parameter
>in the sd driver, you may also want to turn off SCSI tagged command queueing:
>set scsi_options = 0x378
This is interesting. I wonder how this is different from...
 set scsi_options & ~0x80 ??? They both claim to do the same thing. Hmmm.

Jens Fischer <> writes... "turn off tagged command queuing in
/etc/system. Can't remember the entry..." Thanks Jens... Apparently it's
set scsi_options = 0x378 or
et scsi_options & ~0x80 (Vahsen Rob) writes... "we installed patches but they
didn't help..." Well Vahsen, maybe something here will help. Good luck.

Anderson McCammont <> writes... "configure in /etc/system...
check cables and terminators..." Thanks.

Roger Salisbury <> writes...
>It may be you scsi_options have the drive set to tagged fast scsi-2
>which is the default in 2.4, does the drive support tagged queuing??
>Does it tell the O.S. it does when in fact it doesn't do the sun tagged
>queuing?? You can disable tagQ with set scsi_options = 0X178 in /etc/system
>The FAQ has a more in depth explaination.
Huh? What? ...set scsi_options = 0X178. I thought it was 0x378 or ~0x80...
Ok. I'd still like to know the difference. (Sean McInerney) writes..."It's a cable or terminator"
Good guess but wrong. Thanks anyway.

shish Parikh <ashish@Savantage.Com> writes... "I ran into a similar problem a
last Friday. The problem was that my SCSI cables were not properly seated."
I guess this is very common. It wasn't my problem though.

vitec! (John R. Sutton (214-997-4123)) writes...
"I have seen the same problem corrected by addng the following line to
/etc/services set scsi_options & ~0x80 which turns off command queing"
Another vote for ~0x80. Okay.

vitec! (John R. Sutton (214-997-4123)) writes...
"The 'set sd.......' line needs to be added to the /etc/system file"
Apparently I was the only one who didn't know about /etc/system. Not anymore!

Akile Sahin <> writes...
>You asked a question about above subject at 7th February. You haven`t
>sent SUMMARY yet. I am forward to wait the solution of this problem. Because
>I come accross this problem on our system.
Sorry for the delay. Hope something here helps you out (Niall O Broin - Gray Wizard) writes...
>You set the variable in /etc/system, and to answer your question, the
>problem can be both hard and software i.e. your OS (soft) and the disk(hard)
>are not happy together. The variable change suggested may help to bring them
>to harmony - I've seen similar suggestions for various SCSI problems with SCSI
>under Solaris before now.
Yea, since Solaris has come out the old problems are gone and new problems
have replaced them! Figures.

I found that I had a sony scsi cdrom burner on the scsi chain and that when
I removed it from the chain all my problems went away. I haven't bothered
to try putting it back yet. I'm happy to leave it off. I may try turning
off the command queuing later if/when I need to put it back on. On another
machine with similar problems my solution was to add a second SBus SCSI
card to distribute the load, shorten the scsi cable, and isolate problems.
This has helped. Both nfs servers that have given me similar trouble have
2 9GB single-partition disk drives on them. This must be a contributor,
which is sort of a side-effect of Solaris since SunOS doesn't support the

Thanks Sun Managers!


