Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: (OT) should i limit number of files in a directory

by merlyn (Sage)
on Sep 11, 2008 at 15:57 UTC ( [id://710659]=note: print w/replies, xml ) Need Help??


in reply to (OT) should i limit number of files in a directory

I wonder why you have 3 million files.

Do you really need each of those chunks of data to have a name accessible to all other applications, and the metadata of last access, modified, and inode-changed, and permissions and ownership maintained by the operating system?

Or perhaps, what you should do instead is create a database that stores only the metadata you need for each item, along with the item itself.

PostgreSQL's "binary" columns should handle any data that you might stick into a file, and will scale and replicate nicely. And with a unique index on the data column, you won't even need to "MD5" the data to ensure only one version... just try to insert it, and if it fails, it's already there. Nice atomic test.

  • Comment on Re: (OT) should i limit number of files in a directory

Replies are listed 'Best First'.
Re^2: (OT) should i limit number of files in a directory
by leocharre (Priest) on Sep 11, 2008 at 16:38 UTC

    Ah! Very interesting point.. the extra inode metadata! I didn't even consider that waste.. This is something that could indeed ammount to something with that many data chunks!

    Some of it could be useful, mtime, atime, ctime- could have use.. indeed, for trash collection and update purposes- possibly.

    I have 3 million files because I'm tracking documents in an office environment- for a buncha users who do bureaucratic work fo tha man. So.. there are a lot of freaking pdfs, docs, "excel spreddshits", hard copy doc scans... etc. A lot.

    I'm indexing everything about everything, runnig ocr and deet on every speck of junk here... Makes evyerbody's life easier and way more interesting.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://710659]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-25 15:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found