xaprb has asked for the wisdom of the Perl Monks concerning the following question:
I have a program that might want to estimate completion on large files. Any thoughts on the best way to quickly estimate the line count in a very large text file?
My idea was to get the file size, and if it's less than 100MB just use wc -l. Otherwise take 100 4 KiB (aligned) samples by seeking to pre-calculated offsets in the file and reading 4096 bytes, counting the number of bytes between each newline and taking that as the line length; then the number of lines is $filesize / ($avg_line_len + length("\n")).
Update: replaced "seeking through" with "seeking to pre-calculated offsets in"
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Estimate line count in text file
by GrandFather (Saint) on Jul 18, 2008 at 13:12 UTC | |
by xaprb (Scribe) on Jul 18, 2008 at 13:16 UTC | |
Re: Estimate line count in text file
by marto (Cardinal) on Jul 18, 2008 at 12:51 UTC | |
by xaprb (Scribe) on Jul 18, 2008 at 13:16 UTC | |
by marto (Cardinal) on Jul 18, 2008 at 13:24 UTC |
Back to
Seekers of Perl Wisdom