Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re: Parse data into large number of output files.

by tachyon (Chancellor)
on Sep 29, 2004 at 02:34 UTC ( #394827=note: print w/replies, xml ) Need Help??

in reply to Parse data into large number of output files.

The problem is that there can be hundreds of original senders. Having that many filehandles open is certain to be problematic....Should I just risk opening a zillion filehandles?

On what do you base you supposition that having lots of file descriptors open is a problem? What risks do you perceive? You assert but do you test. By default on Win2K you can have 509, on Linux 1021. 3 handles are used for STDIN, STDOUT, STDERR, so there are 512 and 1024 handles respectively available.

C:\tmp>perl -e "open ++$fh, '>', $fh or die qq'$fh $!\n' for 1..$ARGV[ +0]" 512 510 [root@devel3 tmp]# perl -e 'open ++$fh, ">", $fh or die "$fh $!\n" for + 1..$ARGV[0]' 1024 1022 Too many open files

But so what? Just increase the number if you need to. On Linux:

[root@devel3 tmp]# ulimit -n 65535 [root@devel3 tmp]# perl -e 'open ++$fh, ">", $fh or die "$fh $!\n" for + 1..$ARGV[0]' 2048 [root@devel3 tmp]# ls 204? 2040 2041 2042 2043 2044 2045 2046 2047 2048 [root@devel3 tmp]#

It is not actually the number of open file handles that will cause an issue. Depending on the underlying file system you will start to get issues if you go over10-20,000 files in a single direcotry with ext2/3. Reiser FS does not care.



Replies are listed 'Best First'.
Re^2: Parse data into large number of output files.
by BrowserUk (Pope) on Sep 29, 2004 at 03:10 UTC

    The biggest problem I've encountered with maintaining large numbers of file handles open is that it tends to cause the filesystem caching to work against you rather than with you.

    On NTFS, you can use the native CreateFile() API and provide extra information about the type of use you intend to make of the file. Using FILE_FLAG_NO_BUFFERING, using your own buffering and multi-sector sized writes can prove beneficial in alleviating this.

    Most of the limitations are embodied within the (almost POSIX) complient C-runtime semantics. It's quite probable that baypassing these on other filesystems could also be beneficial, but it probably requires fairly detailed knowledge of the FS concerned.

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re^2: Parse data into large number of output files.
by pg (Canon) on Sep 29, 2004 at 03:40 UTC

    Two comments here:

    • First, you assumed that we were talking about physical limitation here, as how many file hanlders the OS allows. That's one thing, but not the only thing we are talking about. I agree it is good to have this in mind.
    • Second, with this physical limitation in your mind, you most likely don't want to go there, but rather stay somewhere below. I am not saying you suggested to go there, but rather a comment for the OP in general. He has to test and find out a reasonable number.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://394827]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2020-11-24 20:47 GMT
Find Nodes?
    Voting Booth?

    No recent polls found