Re: Optimization tips

Good remarks so far but I don't think anyone has yet remarked that you're slurping your input file into an array then iterating over it. For a large multi-meg file that's going to cause a good bit of overhead in and of itself (to say nothing of it inflating your process' size which may affect performance if it then winds up causing extra paging by the OS). Unless you need the full file for context (which it doesn't appear to my (admittedly perfunctory) skimming over the code) there's no reason not to read that input file line-by-line.

Presumably your mapping files are going to be the smaller inputs so (as was mentioned) I'd suggest restructuring things to read those into data structures once, then work over the meat of the main input line-by-line. If you can alter things to work more from a hash lookup instead of the multiple substitutions even if the mappings are "large" you can use something like GDBM_File or DB_File to keep those out of memory and lookup from disk instead.

Edit: s/it the/it then/ ; me no tipe gud tewday. Also if you're really looking to speed things up you might could use MCE::Loop to split the reading of the large file across multiple consumers. But fix the structural problems first then it'll be easier because you'll have a cleaner line-by-line processing loop to shove into the MCE bits.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Comment on Re: Optimization tips Download Code


Problems? Is your data what you think it is?
	PerlMonks