SUMMARY: Copy many small files

From: Ahmed F. Al Twaijiry <a.altwaijiry_at_mobily.com.sa>
Date: Mon Jun 11 2007 - 08:42:48 EDT
I received many solutions and answers from everyone, So THANK YOU .

Now I will give you some of the nice solutions that I receive and I will tell
you how we decree the time from ~12 hours to ~6 hours :)

Steve reply to me with:
Have you tried using rsync?

Rsync is good if we want to copy the same files, because it will check if the
file same it will not copy it. But for us it's will incress the time, since it
will check for every file and then will copy it (because we always copy
different files)


Casper reply to me with:
There are two possible causes of slow copies: inefficiencies in the "scp"
protocol and the expense of creating so many files on Windows.

> yes, this is why we use rsh now ( see my solution below)


Aaron reply to me with:
taring and copying a single tar file should be MUCH faster than trying to do
this with 140000 files...

> creating the tar will take long time also untaring it will take time.


Mehran reply to me with:
Use Samba. to share the directory with the PC, then its a local
copy on the PC.   Have lots of memory on the PC.

> this is good if we have the two machine near to each other, but it's far
way.

Matthew reply to me with:
Can you transfer via "sneakernet", that is copy the files to an external
device (SCSI, Firewire, USB2) and them move them physically?

> we tried this, there is no much improvement, but we discover that there is a
heavy IO work in the server so we decide to do it in other way (see below)


I also got many solutions from other people, I really want to thank them all
(and sorry if I didn't write your name, I just select the names randomly)


Now the solution we did to cut the time from 12 hours to 6 hours is this;

The problem (again :)  is that we have many files generated every month in
Unix server and we must copy it as soon as we can to another Windows server in
different area , the problem is that the files around 400k files and very
small the total size is 150GB

So what we did is this:

1. We notice there is a lot of IO in the server so we bought a small server
and we create a metaset between these two server, so when the user want to
generate the files we switch the filesystem to the first server, and when he
want to transfer it we switch the file system to the new server and he start
the copy from the new server

With this method the time went from 12 hours to 9 hours.


2. Instead of using ssh to copy we installed cygwin in the windows server and
we start using rcp to do the copy with multi sessions (around 12 rcp process
doing the copy in the same time.)

And this way we got it done in 6 hours :)

There is another test we want to do by taring the file on the fly and untar it
when we receive it, I will tell you about it when we do it.


Again, THANK YOU ALL FOR YOUR HELP ;-)


-------
Ahmad F. AlTwaijiry
IT / DC Unix Administrator
Mobily (www.mobily.com.sa)
 
-----------------------------------------------------------------------------
------------------
Disclaimer
This email and any files transmitted with are
classified as confidential unless otherwise specified. This e-mail is intended
solely for the use of the individual or entity to whom this e-mail is
addressed. If you have received this email by mistake, please notify the
sender and delete this e-mail immediately and permanently. Although measures
were taken to free this e-mail and its attachments from any malicious code
infection, it is the responsibility of the recipient to check this email and
any attachments for the presence of such infection. The use of EEC(Mobily)
e-mail service is limited for EEC(Mobily) business use only.
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jun 11 08:43:12 2007

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:06 EST