Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Best way to search large files in Perl

by graff (Chancellor)
on May 12, 2016 at 02:46 UTC ( #1162827=note: print w/replies, xml ) Need Help??


in reply to Best way to search large files in Perl

Based on this part of your description:

I first get a list of unique things I'm interested in, similar to a product id(list contains about 7000 unique items). Then, I iterate the big log file one time using regex to find lines I'm interested in for gathering additional data about each product id. I write those lines out to a few new (smaller) files. Then, I loop through the product ID list one time, and execute several different grep commands on the new smaller files I created.

I can't really tell: (a) how many different primary input files you have, or (b) how many times you read each primary input file from beginning to end. If you are reading a really big file many many times in order to get matches on some number of different patterns, then you might be able to speed things up by doing a slightly more complicated set of matches on a single pass over the data.

RonW gave you a really useful suggestion: PerlIO::gzip -- I second that. Read the gzip data directly into your perl script so you can use regex matches on each (uncompressed) line, because (1) Perl gives you a lot more power in matching things efficiently and flexibly, (2) you can use the matches directly in your script to build useful data structures, and (3) you save some overhead time by not launching sub-shells to run unix commands.

  • Comment on Re: Best way to search large files in Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1162827]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2021-02-24 22:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?