Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: (OT) should i limit number of files in a directory

by leocharre (Priest)
on Sep 11, 2008 at 16:27 UTC ( [id://710673]=note: print w/replies, xml ) Need Help??


in reply to Re: (OT) should i limit number of files in a directory
in thread (OT) should i limit number of files in a directory

I do have a database keeping track of sums and using ids.

I am not using this system merely to check existance. The files actually hold something. Data that does not belong in a database, as it is.

It makes sense what merlyn and other said about storing in a database.
Let's not forget that the filesystem *is* a form of database system. It's a data storage discipline.
Some things are more appropriate on a fs then a db server.

A million text files ranging in size from 1k to 486k etc.. would probably cripple a db system- it's too much of a variation.. maybe i'm wrong about that.

There's no searching, no comparing, the size of each element is wildly varied... It feels like a fs thing..

  • Comment on Re^2: (OT) should i limit number of files in a directory

Replies are listed 'Best First'.
Re^3: (OT) should i limit number of files in a directory
by tilly (Archbishop) on Sep 11, 2008 at 18:44 UTC
    In your original post you said that you were just using the filename to check existence. If it has data, then a file is more reasonable. However I would still suggest looking at something like DB_File's interface to Berkeley DB.

    That's designed to store data of exactly this type. Its data limits are 4 GB per entry, and 256 terabytes for the entire dataset.

    If you want to store the data on one system and use it on another, then you might want to move up to a database. Sure, there are things like NFS. But if someone goes innocently looking at a directory like that using standard tools over a networked filesystem and you'll be putting everything through an "interesting" stress test. Plus even though it works today on ext3, that's no guarantee that in 2 years someone won't migrate the system to another system and not understand that that directory really, really needs to be a specific filesystem.

    While I agree that there are things that belong on filesystems, this feels to me like something that would be happier not living on a filesystem. But if you put it there, then I'm going to suggest that your disks will be happier if you turn off maintenance of last access time in that directory. That information is almost never used, and causes every read of a file to write to the directory. If you're under load this can be a significant cause of overhead.

Re^3: (OT) should i limit number of files in a directory
by Illuminatus (Curate) on Sep 11, 2008 at 16:47 UTC
    I still think merlyn is right -- a db is the way to go. blobs are not the most elegant/efficient mechanisms, but they are very easy to find based on a key. As long as your blobs stay below about 1MB, mysql or postgres should be fine. Trying to find a single file in a directory hierarchy of millions of entries is going to suffer significantly worse performance.
      What about a berkeley db ? That seems like it would be good backend here?
        From everything I have heard, BerkeleyDB would probably be fine as well. I did not mention it because I have not personally used it. I have used both mysql and postgres, so I feel more comfortable recommending them. BerkeleyDB is fast, but has a couple of drawbacks. I don't believe it has network access, so if you want to access the database from a different system, it might not be the best choice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://710673]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 04:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found