SUMMARY: Deleting Millions of Files

From: Bryan Pepin <bpepin_at_emc.com>
Date: Mon Jan 26 2004 - 10:38:52 EST
  Hello,

Thanks to everyone who responded, really too many to mention.....but 
here are the results from most of the suggestions.....

1) cd /top-level-dir ; find . -type f -print | xargs /bin/rm -f  ---> 
same results.....no gain in the rate at which files were 
deleted.....this was a popular suggestion.....

2) I was not in a position to backup any of the good data in the other 
directories, newfs the filesystem, and put the good stuff 
back....although in my opinion this may have been my quickest/safest 
option...but under my circumstances, it could not be done......

3) One suggestion was to use WS FTP, and delete using that, but I did 
not get a chance to try that either....

4) find . -type f -exec rm {} \; ---> no gain in the rate at which files 
were deleted.....

5) One suggestion was to enable noatime on the FS mount.....I did not 
test this....although I did have logging enabled, which did not seem to 
help much....

6) cd /to-offending-dir ; ls | perl -ne 'chomp; unlink ;'  ---> this 
still took a long time to do on each file....and also, the unlinking, 
come to find out, had another side effect...see below.....

7) Another interesting suggestion was to use 
fastfs.c..http://www.science.uva.nl/pub/solaris/fastfs.c.gz   
unfortunately I did not get a chance to use it as it does put the rest 
of the data on the FS at risk....some claims were that it could improve 
the removal rate  by 500% or more!!!......I may run some tests w/ this 
in our lab for future problems like this....

8) cd /up-one-dir ; unlink ./dir-with-files  --> This worked 
instantaneously...but did not free up any inodes....so we stopped the 
application, unmounted the FS, and fsck'd....that took only a few 
minutes, and then remounted....but the inodes were still not 
free.....come to find out, fsck put all the files back in lost+found, so 
the inodes were still in use.......but for whatever reason, removing the 
"lost" files from lost+found was much quicker!!!.....it took less than 1 
hour to clean out around 1 million files from lost+found???......I think 
I may do some testing w/ this method as well......

In Summary, it seems the unlink, unmount, fsck, and then remove from 
lost+found was the best option for us.....the newfs would have been a 
better solution if it were not for the rest of the good data on that 
filesystem.....

fyi, this was a Solaris 8 environment.......

Thanks again for all of the suggestions.

-Bryan


-------- Original Message --------
Subject: Deleting Millions of Files
Date: Thu, 22 Jan 2004 14:14:29 -0500
From: Bryan Pepin <bpepin@emc.com>
Organization: EMC Corporation
To: sunmanagers@sunmanagers.org



Hello,

We had an application "loose it's brain" and create millions of tiny 
files all in 1 directory on a UFS filesystem. We have since fixed the 
application, but now we are trying to clean up the directory because it 
used up all the available inodes on the ufs filesystem.

So we have tried many different techniques for removing the files, but 
it is taking forever?

Here is a sample of what we tried:

1) rm * -> the shell could not handle that expansion
2) cd to upper directory, and rm -rf dircectory_of_all_files --> this is 
taking forever...on one server, it has been running around 12 hours, and 
only half way done....
3) create a for loop from the output of an ls, and remove each file 
individually --> same results as above....
4) create a for loop from the output of an ls, and remove each file 
individually in the background --> this caused severe performance issues 
on the box and had to be killed because it spun off so many rm's so 
quickly, and they were all hanging around waiting.....

We cannot just nuke the filesystem since the other directories on it 
have valuable information.....

Has anyone out there came up with a better way to remove this many 
files?.....there is no disk/cpu/memory contention at all as 
well....except for when we did the for loop and sent all the rm's in the 
background.....

Thanks in advance, and I will summarize.

-Bryan Pepin
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers




-- 
************************************************
Bryan Pepin
Unix Enterprise Systems

EMC Corporation
171 South Street
Hopkinton, MA 01748
508-249-3543
bpepin@emc.com
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jan 26 10:38:45 2004

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:26 EST