http://qs321.pair.com?node_id=567889


in reply to Re: Multiple Regex's on a Big Sequence - Benchmark
in thread Multiple Regex's on a Big Sequence

For the cases where you compare multiple regexps against your target string, it may save time if you also study($sequence) before starting the matches.

This will do a scan of the sequence to allow subsequent matches to use the Boyer-Moore algorithm - it builds a linked list of the locations of each different character in the sequence, and then takes advantage of the frequency data to pick the rarest character for which to walk the list.

Because the main benefit of this approach is about rarity, it may not be a big win for a case like this where the string uses only a 4-character alphabet, and (presumably) uses each character roughly 1/4 of the time; I'd be interested to see how it affects the benchmarks.

Hugo