Re^2: Multiple Regex's on a Big Sequence


Pathologically Eclectic Rubbish Lister
	PerlMonks

Re^2: Multiple Regex's on a Big Sequence - Benchmark

by hv (Prior)

on Aug 17, 2006 at 11:21 UTC ( [id://567889]=note: print w/replies, xml )

Need Help??

in reply to Re: Multiple Regex's on a Big Sequence - Benchmark
in thread Multiple Regex's on a Big Sequence

For the cases where you compare multiple regexps against your target string, it may save time if you also study($sequence) before starting the matches.

This will do a scan of the sequence to allow subsequent matches to use the Boyer-Moore algorithm - it builds a linked list of the locations of each different character in the sequence, and then takes advantage of the frequency data to pick the rarest character for which to walk the list.

Because the main benefit of this approach is about rarity, it may not be a big win for a case like this where the string uses only a 4-character alphabet, and (presumably) uses each character roughly 1/4 of the time; I'd be interested to see how it affects the benchmarks.

Hugo

Comment on Re^2: Multiple Regex's on a Big Sequence - Benchmark Download Code

Replies are listed 'Best First'.
Re^3: Multiple Regex's on a Big Sequence - Benchmark by bernanke01 (Beadle) on Aug 18, 2006 at 02:02 UTC
Great idea, I'll add it to the next round of Benchmarks.	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://567889]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others chilling in the Monastery: (4)

As of 2024-04-16 20:11 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found