Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regex Word Pairs

by logie17 (Friar)
on Aug 15, 2007 at 23:11 UTC ( #632882=perlquestion: print w/replies, xml ) Need Help??

logie17 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I am slowly trying to understand all the power you can weld with regular expressions. What I have is a string: "This is a test". I want to take all of the words within this string and find all the word pairs: "This is", "is a", "a test".
I came up with this solution:
my $s = 'This is a test'; #Create an array my @words; while ($s =~ /(\w+)(?=\s+(\w+))/g){ #Push each found word into an array push @words, [$1,$2]; }
Strictly for pedantic reasons, is there a better solution?
Thanks,
s;;5776?12321=10609$d=9409:12100$xx;;s;(\d*);push @_,$1;eg;map{print chr(sqrt($_))."\n"} @_;

Replies are listed 'Best First'.
Re: Regex Word Pairs
by Joost (Canon) on Aug 15, 2007 at 23:20 UTC
    Well that solution is pretty direct.

    You could go (almost) without regexes and just get the pairs yourself - at the cost of some memory (since you need to store all the separate words):

    #!/usr/local/bin/perl -w use strict; my $s = 'This is a test'; my @words = split / +/,$s; my @pairs = map [@words[$_,$_+1]],0 .. @words-2;
    update: I'm sure there's a way to get rid of the intermediate @words array, I'm just not sure the code would get any clearer.

Re: Regex Word Pairs
by GrandFather (Saint) on Aug 15, 2007 at 23:46 UTC

    Depends what you call better:

    use strict; use warnings; my $s = 'This is a test'; my @pairs = $s =~ /(?=(\w+ \s+ \w+))\w+ \s+/gx; print join "\n", @pairs;

    Prints:

    This is is a a test

    Update: or if you want @pairs as an AoA:

    ... my @pairs = map [split], $s =~ /(?=(\w+ \s+ \w+))\w+ \s+/gx; print "@$_\n" for @pairs;

    prints:

    This is is a a test

    DWIM is Perl's answer to Gödel
Re: Regex Word Pairs
by graff (Chancellor) on Aug 16, 2007 at 01:37 UTC
    Not using regex matches at all might be a better solution.
    $_ = 'This is a test'; my @words = split; my @wordpairs = map {[ $words[$_-1], $words[$_] ]} 1..$#words;
    Benchmarking is left as an exercise... ;)

    Update: oops! I failed to notice that Joost already provided this solution. (I should have expected that he would.) Apologies for being redundant.

Re: Regex Word Pairs
by goibhniu (Hermit) on Aug 16, 2007 at 00:41 UTC

    For my cryptography hobby (old school cryptograms, not modern cypto), I wrote this to find frequencies of letter pairs. I don't know if it could be adapted.

    # # # my $input = shift; # my $input = "peon bookkeeper"; my $input; #print $input."\n"; while (<>) { $input = $_; @evenmatches = ($input =~ m/ (.{2}?) (?{ #print $^N . " found at " . ($tmpp +os = pos($input)) . "\n"; $chars{$^N}++; }) /xg); #print join(", ", @evenmatches)."\n"; #print $#evenmatches + 1 ." matches found\n"; #print $input."\n"; pos($input) = 1; @oddmatches = ($input =~ m/ (.{2}?) (?{ #print $^N . " found at " . ($tmpp +os = pos($input)) . "\n"; $chars{$^N}++; }) /xg); #print join(", ", @oddmatches)."\n"; #print $#oddmatches + 1 ." matches found\n"; } print "frequency of '$_' is $chars{$_}\n" foreach (sort {$chars{$b} <=> $chars{$a}} keys %chars); print "\n"; print "frequency of '$_' is $chars{$_}\n" foreach (sort keys %chars);


    I humbly seek wisdom.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://632882]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2022-08-09 02:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?