From: Hall, Johnny (johnny@pahv.xerox.com)
Date: Fri Jul 23 1999 - 12:46:36 CDT

Original Question below..

Thanks to all, here are the replys I got. Seems system calls for
opening and closing files seems quite popular... :)

<Rahul Roy>
One reason for that can be the number of system calls you are making for
read/write/stat .... system call can be "expensive" as far as system
resource goes. If you do a truss on "tar cvf <tarfile> <source> - you
notice that there are a number of system calls like fstat, lstat,read
So if you have many smaller files - more time will be spent on the
calls as compared to trying to tar up fewer larger files - even though
they are of the same or comparable size ....

Hope that helped ..

Rahul Roy

<Nickolai Zeldovich // nickolai@zepa.net>
every file requires a directory lookup. large directories are
on many unices.

<Brett Lymn>
Yes, it will take longer. There would be a minimal impact due to the
overhead of looking up the name and so forth for lots of little
files. Probably the biggest hit will be that the tape will stop
when you have little files which means the tape will stop, backup and
then start writing again. With the old QIC drives you could actually
hear this happening :-)

<Kevin Sheehan>
It's real - doing a bunch of smaller files means dealing with a lot
more metadata related stuff (e.g. open, getting the node info, close &c)
instead of just dealing with data.

<Jamie Lawrence>
Lots of reasons.

 - In a UFS volume, large files will typically
be in contiguous blocks. different small files will be all
over the disk. Seek times go up a lot (Look at your disk specs
for comparisons of sustained vs. random access).

 - Each file is an inode access (at least one, but I'm keeping
this simple). There is OS overhead for each inode. Tar probably
has its own overhead for file access.

 - Tar is writing metadata to the archive for each file as well.

I'm not sure why you're mentioning packets and what not, unless
you're tarring a file over NFS. If that's the case, you'll also
see NFS file manipulation overhead.

Hope this helps.


<Max Trummer>
> Is this just bordom on my part (wathcing paint dry) or is it real?
well, it *could* be both!! :-)

but then again, i'm replying, so i'm *sure* it's not the former!! :-)

anyhow, opening/closing files takes time, no doubt about it!

i once had a 9gb disk on my net that used to take over 24 hours to
backup. and it didn't matter if it was a full backup or not, almost!

but there were over half a million files on the disk, so all the
time was spent looking up inodes and opening files...



> Can anyone tell me why it takes so much longer to tar up many smaller
> files than a few bigger files given that actuall data amount is the same
> or nearly the same. I though maybe it was something to do with
> read/write but then the data (say 25Mb) is the same, then I thought it
> may be something to do with tcp/ip and packets but again a packet is a
> packet. The only thing I can think of is something along the lines of
> loading each file into memory, processing it and then spitting output.
> Is this just bordom on my part (wathcing paint dry) or is it real?
> Johnny

"Nothing is more difficult than the art of maneuvering for advantageous
positions." - Sun Tzu

This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:13:24 CDT