SUMMARY: Creating a searcheable filesystem index

From: Roland Gabriel <>
Date: Tue Mar 09 2004 - 11:10:46 EST
Sorry for the late summary :-( The initial message is at the end:


Many thanks go to Rich Kulawiec, Russel Page, Anthony Talltree, Warren
Powell, David Knight, Kelly Setzer, John Leadeham and Paul Greidanus for

I got several replies along the expected lines of doing a system-wide find
and using subsequent "greps" to extract the data that I want. I got several
replies mentioning the GNU find/locate/slocate/updatedb toolset, but the
thing is, 'locate/updatedb' do not store/retrieve attribute info. Some even
suggested using an actual RDBMS (Oracle/MySQL) or something like Sleepycat
to populate the initial 'database', thus making the queries faster. This is
a good idea, but maybe a bit of overkill, and anyway, I'm not too skilled in
database topics.


In the end I decided to go with the simplest (brute force and inelegant)
solution viz: doing a periodic (via cron) find (in this case GNU find with
the -printf "%u %p" options) which will dump username and full path to a
flat file and use greps to grab the data that I want. 


FYI this data eventually feeds another script that takes this collection of
tuples and operates on each one. The key I wanted to achieve was a degree of
parallelism coupled with efficient use of system calls (minimum of 'greps',
'awks' and their ilk). It's not the best solution, but deadlines await :-)




> I am looking into creating an index/database of the entries in a

> filesystem

> such that this index can be queried for things like file attributes 

> (owner,

> permissions etc). I know a tool like 'glimpse' creates an index that 

> can be

> used in doing text searches of files within a filesystem, but what I am

> after is not just that, but something that can allow me to search 

> (large)

> filesystems for files that meet certain attribute criteria. The 'find'

> command would not be an option, as I am talking about filesystems of 

> the

> order of n-terabytes, and doing repeated searches using find would

> (obviously) be too costly.




> Any ideas out there? Any help woul be appreciated. 
