Regex Word Pairs

logie17 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex Word Pairs by Joost (Canon) on Aug 15, 2007 at 23:20 UTC
Well that solution is pretty direct. You could go (almost) without regexes and just get the pairs yourself - at the cost of some memory (since you need to store all the separate words): `#!/usr/local/bin/perl -w use strict; my $s = 'This is a test'; my @words = split / +/,$s; my @pairs = map [@words[$_,$_+1]],0 .. @words-2;` [download] update: I'm sure there's a way to get rid of the intermediate @words array, I'm just not sure the code would get any clearer. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l]
Re: Regex Word Pairs by GrandFather (Saint) on Aug 15, 2007 at 23:46 UTC
Depends what you call better: `use strict; use warnings; my $s = 'This is a test'; my @pairs = $s =~ /(?=(\w+ \s+ \w+))\w+ \s+/gx; print join "\n", @pairs;` [download] Prints: `This is is a a test` [download] Update: or if you want @pairs as an AoA: `... my @pairs = map [split], $s =~ /(?=(\w+ \s+ \w+))\w+ \s+/gx; print "@$_\n" for @pairs;` [download] prints: `This is is a a test` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Regex Word Pairs by graff (Chancellor) on Aug 16, 2007 at 01:37 UTC
Not using regex matches at all might be a better solution. `$_ = 'This is a test'; my @words = split; my @wordpairs = map {[ $words[$_-1], $words[$_] ]} 1..$#words;` [download] Benchmarking is left as an exercise... ;) Update: oops! I failed to notice that Joost already provided this solution. (I should have expected that he would.) Apologies for being redundant.	[reply] [d/l]
Re: Regex Word Pairs by goibhniu (Hermit) on Aug 16, 2007 at 00:41 UTC
For my cryptography hobby (old school cryptograms, not modern cypto), I wrote this to find frequencies of letter pairs. I don't know if it could be adapted. # # # my $input = shift; # my $input = "peon bookkeeper"; my $input; #print $input."\n"; while (<>) { $input = $_; @evenmatches = ($input =~ m/ (.{2}?) (?{ #print $^N . " found at " . ($tmpp +os = pos($input)) . "\n"; $chars{$^N}++; }) /xg); #print join(", ", @evenmatches)."\n"; #print $#evenmatches + 1 ." matches found\n"; #print $input."\n"; pos($input) = 1; @oddmatches = ($input =~ m/ (.{2}?) (?{ #print $^N . " found at " . ($tmpp +os = pos($input)) . "\n"; $chars{$^N}++; }) /xg); #print join(", ", @oddmatches)."\n"; #print $#oddmatches + 1 ." matches found\n"; } print "frequency of '$_' is $chars{$_}\n" foreach (sort {$chars{$b} <=> $chars{$a}} keys %chars); print "\n"; print "frequency of '$_' is $chars{$_}\n" foreach (sort keys %chars); [download] I humbly seek wisdom.	[reply] [d/l]


P is for Practical
	PerlMonks