comment on

Given that you have a database keeping track of things (one record per file, right?), and you have data that belongs in files, another idea to consider is: just concatenate your current "file-unit" data chunks into a smaller number of larger files. The database record for each unit can take a couple extra columns to hold the byte offset info (start and end, or start and length).

When you are storing each data unit to disk, just append to an existing file until that file reaches some maximum reasonable size, and keep track of the file name and byte offsets for that unit. Once one file gets big enough (one or two gigs would be good), start a new one. This could get a bit less simple if you have multiple processes or threads writing different data units at the same time, but it won't be that much harder -- just set up a way of apportioning or assigning output files to each process (that's where directory trees would be handy).

You'll be using a lot fewer inodes and your directories will be smaller. When you go to fetch data back from the files, there will be less filesystem navigation, fewer file open/close operations on average, and more use of seek(), which would be a Good Thing.

(Update: it would be prudent to worry about the risk of lost or corrupted byte offset info, so you might want to supplement that with some sort of distinctive record delimiter as part of the concatenation process -- but this would depend on your data: how confident can you be about coming up with some sort of pattern that you know will never occur as data within a given record (leading to a "false-alarm" boundary)? If you can be completely confident about that, then there won't be any problem. It could be as something as simple/silly as a 128-byte record with even values 00-FE in ascending order.)

(Second and final update: Bear in mind that the above idea really just amounts to implementing your own little BLOB attachment on your existing database. If you are already using a database that doesn't have good BLOB support -- and if there's inertia that disfavors changing the DB server -- then concatenating files is not such a bad fall-back approach. But actually, I'd go with merlyn's advice on this, if you happen to have the DB for it.)

When you say:

I will not be searching for files, or doing a dir listing operation.

Well, maybe you personally won't be doing that, but what about everybody else? (Like maybe the nightly backup job? Your system has one of those, doesn't it?) There tend to be a fair number of routine sysadmin tasks that involve traversing whatever directory tree you assemble, and this will usually involve "find" and other tools that are surprisingly bad at scaling up beyond a certain order of magnitude, especially when it comes to the number of file entries in a single directory. I've seen it happen, and I assure you, you do not want to go there.

In reply to Re: (OT) should i limit number of files in a directory by graff
in thread (OT) should i limit number of files in a directory by leocharre

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Your skill will accomplish what the force of many cannot
	PerlMonks