Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: (OT) should i limit number of files in a directory

by kyle (Abbot)
on Sep 11, 2008 at 16:12 UTC ( [id://710669]=note: print w/replies, xml ) Need Help??


in reply to (OT) should i limit number of files in a directory

If it were me, I'd put them in directories based on the hashes. I'd put "d41d8cd98f00b204e9800998ecf8427e" in "d/4/1/d/8/d41d8cd98f00b204e9800998ecf8427e" (for example). At five levels deep, each leaf directory would have an average of three files in it (for three million files), so maybe you want just four levels with an average of 45 files each. The deeper you go, the more room to grow.

I find it hard to believe you'll never do a directory listing. Eventually someone will do one on accident. We had a Linux machine where I work brought to its knees by an 'ls' in a directory with too many files. We thought it had died completely, but it eventually came back.

It's possible that ext3 doesn't have this problem (I don't know), but on some filesystems even a check for existence involves a brute force search through the contents of the directory.

Having looked just now, I see there's an option for 'mke2fs' called "dir_index" which "uses hashed b-trees to speed up lookups in large directories." Also, a "tune2fs -l /dev/sda1" tells me that my filesystem has this feature even though I don't recall asking for it. Maybe it's the default. It might be worth your while to look.

Replies are listed 'Best First'.
Re^2: (OT) should i limit number of files in a directory
by leocharre (Priest) on Sep 11, 2008 at 16:52 UTC

    Yes, doing ls on ext3 slows stuff down- still. I've not seen a slowdown that looks like a crash- but.. I have seen a pause.

    I keep my sshd running just in case of stuff like that.

    Your multilevel system makes more sense. I had the idea that if I had, say, two levels, I would pre write 256 dirs at level 1, then 256 more at each one of those, and then all the possible dirs would be made already. Helping the system along, so I wouldn't have to check that the target absolute location is there or not.

    (I'm so glad I asked about this- really impressed by the responses and ideas.)

    Hm... may be time to start scripting some .t s .. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://710669]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-19 00:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found