http://qs321.pair.com?node_id=1162827


in reply to Best way to search large files in Perl

Based on this part of your description:

I first get a list of unique things I'm interested in, similar to a product id(list contains about 7000 unique items). Then, I iterate the big log file one time using regex to find lines I'm interested in for gathering additional data about each product id. I write those lines out to a few new (smaller) files. Then, I loop through the product ID list one time, and execute several different grep commands on the new smaller files I created.

I can't really tell: (a) how many different primary input files you have, or (b) how many times you read each primary input file from beginning to end. If you are reading a really big file many many times in order to get matches on some number of different patterns, then you might be able to speed things up by doing a slightly more complicated set of matches on a single pass over the data.

RonW gave you a really useful suggestion: PerlIO::gzip -- I second that. Read the gzip data directly into your perl script so you can use regex matches on each (uncompressed) line, because (1) Perl gives you a lot more power in matching things efficiently and flexibly, (2) you can use the matches directly in your script to build useful data structures, and (3) you save some overhead time by not launching sub-shells to run unix commands.