Re: Efficient code.

A group of 400 alternations is something Perl's regex engine can surely handle. Remember to quotemeta your lines, when it's possible they contain regex relevant chars (braces for example). Then you can have a look at Regex::Assemble which will do the regex construction and optimize the result.

Personally I'd go for a hashed based solution and simply use the lines from "a" as keys for a hash to look up the contents of "b". Alternatively you can sort the lines and use a binary search algorithm.

holli, /regexed monk/

Comment on Re: Efficient code. Select or Download Code

Replies are listed 'Best First'.
Re^2: Efficient code. by Aristotle (Chancellor) on Jan 30, 2006 at 17:55 UTC
400 alternations is manageable, but don’t forget the match is executed 50,000 times. Regexp::Assemble helps by reducing the necessary backtracking, but don’t forget that Regexp::Assemble itself doesn’t work by magic and will need to do a fair amount of string munging. Most likely, that effort is easily amortised during the actual tests, but you don’t know that yet. Then the question that remains is how similar the lines are; if they all differ greatly, then Regexp::Assemble can’t do much. Without benchmarks I wouldn’t go on the record with any prediction as to what’s going to be faster. And the problem that the code does not match intention closely remains. The hash-based approach will be trivial to get right, and it will be much easier to understand in six months. Makeshifts last the longest.	[reply]
Re^3: Efficient code. by demerphq (Chancellor) on Jan 30, 2006 at 18:23 UTC
Juts FYI: In Perl 5.10 alternations not containing regex metachars will not backtrack at all as they are represented by a trie node that can traverse all of the alternations in a single pass. Id say which version of perl contains this but embarrasingly I appear to have forgotten. --- $world=~s/war/peace/g	[reply]
Re^4: Efficient code. by Aristotle (Chancellor) on Jan 30, 2006 at 20:56 UTC
Cool! Old geezer moment: sometimes these optimisations make me feel like I did when I was writing assembly code (inner loops of texture mappers slinging pixels… unrolling carefully, rejoicing when a clever trick got your loop down from 11 to 9 cycles per pixel – those were the days) and went from Pentium to PentiumPro – suddenly it was no longer possible to measure how many clock cycles any one instruction took: an innocuous change somewhere up or down the line would sometimes dramatically change the time response of an unrelated instruction, occasionally a loop iteration would randomly take more or less cycles than average, and sometimes instructions would appear to take a fractional amount of cycles. It was like wading through quicksand. Of course, there was no way to inspect the result of the PPro’s µ-Op translator. Thank goodness for `use re 'debug';` `:-)` Makeshifts last the longest.	[reply] [d/l]


more useful options
	PerlMonks