http://qs321.pair.com?node_id=1160777


in reply to Re^2: How to optimize a regex on a large file read line by line ?
in thread How to optimize a regex on a large file read line by line ?

Thanks for showing your comparison of the unzip pipeline vs. reading uncompressed text. I had said that the former would be faster (because of less reading from disk), but without actually testing it. (I think I must have encountered at least a couple situations in the past where some process finished more quickly if I read compressed data from disk, rather than uncompressed, but I don't know what may have been different in those cases.)

Having now tested it for this situation (multiple times in quick succession to check for consistency), the difference in timing was negligible or slightly favoring reading the uncompressed file, so it seems my initial idea about the role of disk access was wrong: either it really doesn't make any difference, or else whatever difference it makes is washed out by the added overhead of the extra unzip process and/or the pipeline itself.

(The perl one-liner was still faster than the compiled "grep" utility on my machine, but YMMV - different machines will have different versions / compilations of both Perl and grep.)

  • Comment on Re^3: How to optimize a regex on a large file read line by line ?