SUMMARY: ufsdump breaks nfs server

From: Gérard Henry <ghenry_at_cmi.univ-mrs.fr>
Date: Thu Dec 02 2004 - 09:25:19 EST
many thanks to people responding (6 responses) all responses were very 
detailed and profitables (i hope my vocabular is good!?)
late summary, but i had to read, understand, and did some tests and help 
users...
My config is not well stabilized, but according to suggestions, i choose :
1) to continue to use ufsdump, use mettaoffline/metaonline to dump the 
submirror only.
2) pluj LTO2 in E250 directly (i will do at the end of this year)

Thanks to
Graham Wood
Vincent Cojot
E (?)
Ric Anderson
Russell Page
Govind Kulkarni

I give all responses because i think it will be usable for another people
Some comments:
- using gnu tar instead of sun tar (why not star from j. schilling?)
- use of fssnap shall be a solution (look at 
http://www.sun.com/bigadmin/content/submitted/backup_filesystem.html) 
but man talks about: "it will fail if the file system can't be 
write-locked", don't understand why?
- add 2nd network between E250 and U10, not possible at this time, i've 
a second card on E250 but not on U10
- don't use "autoneg", a guy from sun said the same thing, i have to do 
some measures

Graham Wood
 >Do you think that ufsdump is the cause of problem?

Whether it is or it isn't, your dump will be incorrect if the filesystem 
is active.  If any of the files change inodes between the start of the 
dump and when that inode is dumped, you'll get the wrong data.

This (probably) will only cause that file data to be lost.  However, it 
could also cause data to be moved between users (during the dump, file 
for user 'a' is deleted, and another file is created for user 'b' that 
re-uses the same inode(s)).  It's quite unlikely that this would affect 
a directory, but it's possible.

ufsdump should not be used on a live filesystem.  A "better" answer 
might be to use tar (I prefer gnu tar for various reasons that I've 
forgotten, except that sun's tar barfs on some links & fifos).


 >> When i detached a submirror, it tooks  9 hours to reconstruct the
 >> mirror, so it's not a solution for daily backups.

Yeah - the lack of a DRL or other "quick resync" means that this is 
likely to be a major issue.

The LTO can perform at much higher speeds than the network (15MB/s is 
easy for the tape, 10MB/s is pushing the network), with the result that 
the network will be your "bottleneck".  If you're trying to use nfs over 
the same link, then there will be contention.  If you've got a duplex 
mismatch (or indeed are just on half duplex) then you'll lose packets, 
and in the case of a UDP (rather than TCP) mount, this could easily 
cause that problem.  If you've got full duplex throughout, you should 
only see slow access - not failuers, and switching to TCP should help 
with this too.


 >> - sol1: plug LTO2 directly on E250, anybody knows if it will solve the
 >> "NFS server scr not responding still trying" problem?

Almost definitely - depending on whether the ufsdump is locking the 
partition (which I don't think it is), but as I said, I'd avoid using 
ufsdump if possible.  This is the best option, since it removes the main 
bottleneck - the backplane of the E250 shouldn't be stressed, and 
although the load average will go up (since it can work faster) the 
machine shouldn't be unresponsive at all.


 >> - sol2: don't use ufsdump. fssnap?

I've not used fssnap, but if you've got enough space to use it, then it 
is probably the best answer (combined with the above), since it will 
give you a consistent view of the disks at the time the backup is made, 
rather than the possible mess you can get with the other methods.


 >> - sol3: trying ufsdump from U10, on nfs mounted partition?

Won't work.  ufsdump can only dump a local ufs partition.
-------------------------------------------------------------------------------

Vincent Cojot
Je choisirais la solution1 sauf si cela represente une charge trop 
grande pour le E250. Il faudra sans doute lui rajouter une carte SCSI 
pour mettre le LTO-2 seul sur son bus SCSI afin d'iviter problhmes de 
performance et resets du bus intempestifs (prends la X6758A de SUN).

Les solutions 2 et 3 ne sont pas envisageables car:

2) meme avec fssnap if faut quand meme faire du ufsdump.

3) Ne marchera pas car ufsdump va refuser de dumper un FS non-local.

-------------------------------------------------------------------------------
E:
I am not an NFS expert, but ...

"When i detached a submirror, it tooks  9 hours to reconstruct.."

Rather use metaoffline. This keeps track of changes and does not require a
reconstruct.
	metaoffline <mirr> <submirr>
	mount -o ro </dev/md/dsk/<submirr> /<tmp_mount>
	ufsdump ...
	umount /<tmp_mount>
	metaonline <mirr> <submirr>

"sol3: trying ufsdump from U10, on nfs mounted partition?"
Ufsdump does not work on an NFS mounted partition  :-(

Off course, the most efficient way of doing this would be to mount the tape
directly on the E250. I do not believe you can fully feed an LT02 via 100mbs
ethernet. I believe the network "chokes" prior to the tape fully achieving
backup speed.

To proove, add a 2nd netbetween the E250 and the U10, using this as the
backup network.
-------------------------------------------------------------------------------
Ric Anderson
I use ufsdump (and have for years) with no
problems.  with that kind of dump performance,
I'd bet on duplex mismatch someplace along
the path.  Ultra-10s are infamous for auto
negotiating to 100/half when the network port
autos to 100/full.  You might want to run
	ndd /dev/hme -set instance 0 # select hme0
	ndd /dev/hme link_speed
0=10, 1=100
	ndd /dev/hme link_mode
0=half, 1=full
and also check the network devices (assuming they
provide the ability to do so) to make sure they have
the same speed and duplex as the Suns on both ends
-------------------------------------------------------------------------------
Russell Page :
Ufsdump is a very bad choice for an active filesystem! The manual is 
quite clear on this: "When running ufsdump, the file system must be 
inactive; otherwise, the output of ufsdump may be inconsistent and 
restoring files correctly may be impossible." ufsdump(1M). If you watch 
ufsdump running, you will notice it makes four passes over the 
filesystem. If the filesystem changes while ufsdump is running, the 
passes become inconsistent with each other and the archive becomes 
corrupted. I know of a number of cases where this has happened.

Try find and cpio:

# cd /mount_point
# find . -depth -print | cpio -ocC 131072 > /dev/rmt/0

You can also save a lot of time by doing incremental backups. Do a full 
backup, say once a month and then only backup changed files on other 
days. If you want to continue using ufsdump, read the man page for full 
details. The man page for "find" tells you how to select files that have 
changed recently.


-------------------------------------------------------------------------------

Govind Kulkarni
As per your setup, you need to run nfs server on U10 connected to LTO.

As per your option to connect LTO to E250 directly will definately not 
need nfs service.  So you can go ahead and connect LTO to E250 on 
onboard SCSI controller. This may  also enhance your backup time. To 
what extent I wont be able to predict.

If your LTO drive is part of tape library you need to install it 
corrosponding HBA/SCSI connector.


Girard Henry wrote:
> hello all,
> i hope my message is clear:
> 
> when ufsdump start, nfs clients have "NFS server scr not responding 
> still trying"
>  -------     ------                           -----    ------
> | D1000 |---| E250 |--- Ethernet 100 Mb/s -- | U10 |--| LTO2 |
>  -------     ------                           -----    ------
> 
> every night, ufsdump start to save a complete partition /export/home
> /export/home is on D1000 configured with SDS RAID 0+1 (200GB useful on 
> 400GB total)
> It takes 8 hours to dump 120 GB.
> I'm using ufsdump because i think it's the better tool to dump an active 
> filesystem
> When i detached a submirror, it tooks  9 hours to reconstruct the 
> mirror, so it's not a solution for daily backups.
> 
> Do you think that ufsdump is the cause of problem?
> 
> I wonder what will be the best solution, but as this server is in prod, 
> i cannot do more tests:
> - sol1: plug LTO2 directly on E250, anybody knows if it will solve the 
> "NFS server scr not responding still trying" problem?
> - sol2: don't use ufsdump. fssnap?
> - sol3: trying ufsdump from U10, on nfs mounted partition?
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Thu Dec 2 09:27:46 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:40 EST