http://qs321.pair.com?node_id=698594

xaprb has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that might want to estimate completion on large files. Any thoughts on the best way to quickly estimate the line count in a very large text file? My idea was to get the file size, and if it's less than 100MB just use wc -l. Otherwise take 100 4 KiB (aligned) samples by seeking to pre-calculated offsets in the file and reading 4096 bytes, counting the number of bytes between each newline and taking that as the line length; then the number of lines is $filesize / ($avg_line_len + length("\n")).

Update: replaced "seeking through" with "seeking to pre-calculated offsets in"

Replies are listed 'Best First'.
Re: Estimate line count in text file
by GrandFather (Saint) on Jul 18, 2008 at 13:12 UTC

    Why not use -s to find the file size then use tell from time to time to determine how far through you are and make a time remaining estimate from that?


    Perl is environmentally friendly - it saves trees
      That's a great idea!
Re: Estimate line count in text file
by marto (Cardinal) on Jul 18, 2008 at 12:51 UTC
      Sure. I saw all of these. (Though I do not see any reply by davorg). They are all exact, not estimated. The key here is "estimated because the file is Very, Very Large." Reading the whole file may be unacceptable.
        xaprb,

        My mistake, I was referring to davidrw's reply. I would suggest (if you have not already done so) benchmarking their Tie::File solution with some 'large' files, since peoples definition of what constitutes a large file differs :)

        Martin