Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Global regexp

by Anonymous Monk
on Jun 16, 2008 at 07:38 UTC ( [id://692218]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi I want to find all 2 consequent digits in a string, i.e. having string '1234' i need to get array of ('12', '23', '34'). But as appeared regexp doesn't makes what i need: @result = ('1234' =~ /\d\d/g); print "@result"; #prints 12 34 Do you know any way to find all matches without manipulations with pos()?

Replies are listed 'Best First'.
Re: Global regexp
by Corion (Patriarch) on Jun 16, 2008 at 07:55 UTC

    The Perl regular expression engine starts the next match always after the last matched character of the previous match. So you will need lookahead assertions (?= that don't adjust pos if you don't want to fiddle with pos yourself (see perlre on lookahead assertions):

    perl -le "$_='1234';@x=/(?=(\d\d))/g;print for @x"
      How would you modify that so that match only capture strings where the second digit is higher than the first? (Which is what I thought the OP meant, but I'm apparently wrong..) I'm scratching my head.

      throop

        In such cases, I mostly resort to enumeration:

        /(0[1-9] |1[2-9] |2[3-9] |3[4-9] |4[5-9] |5[6-9] |6[7-9] |7[8-9] |8[9-9])/

        will only capture pairs where the first digit is lower than the second digit. I'm not sure that there is a smarter/nicer version of that though ...

        It's usually best/simplest to just filter out the bad results.

        my @x; while (/(?=((\d)(\d)))/g) { push @x, $1 if $3 > $2; }

        You could also do that nicely using a lookup.

        my %ok; for my $i (1..8) { for my $j ($i+1..9) { $ok{"$i$j"} = 1; } } my @x = grep $ok{$_}, /(?=(\d\d))/g;

        You could build a complex pattern.

        use Regexp::List qw( ); my @ok; for my $i (1..8) { for my $j ($i+1..9) { push @ok, "$i$j"; } } my $re = Regexp::List->new()->list2re(@ok); my @x = /(?=($re))/g;

        In can be done programmatically in the regexp as well.

        my @x; push @x, $1 while /(?=((\d)(\d)(?(?{ $3 <= $2 })(?!))))/g;

        (It's pretty crazy that I was able to write the last one without errors and without checking perlre, so you probably shouldn't use that one.)

        All tested.

      Thank you.
      Actually, i read perlre doc, but i thought (?= ) wouldn't help me. So solution you gave looks for me like some perl hack :-)
        It does look a little bit hacky and I'm not sure you will find it explicitly discussed in either perlre or perlretut, but it is a kosher Perl regex expression and you need not worry about using it.
Re: Global regexp
by starbolin (Hermit) on Jun 16, 2008 at 15:35 UTC

    Not sure your parameters. Do you mean all pairs of numerically consecutive digits? Or just adjacent digits? Does the string contain alphabetics? Although everyone loves to show off their regex foo, not all solutions require a regex. Perhaps this:

    my $s = "abc12341xyz"; my @pairs = grep /\d\d/, map { substr $s, $_, 2 } (0..length($s)-2); print join " ", @pairs; 12 23 34 41


    s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}
      I meant adjacent digits.

      To be true my problem was more complex than just matching digits. I wanted to find all possible matches for any regexp.
      Take
      print $_, ", " for ('1aaab2' =~ /(a+b)/g)
      #prints aaab,

      It prints only one result, instead of list (what's needed): 'ab', 'aab', 'aaab'.
      Solution given by Corion works perfectly in this case:
      print $_, ", " for ('1aaab2' =~ /(?=(a+b))/g)
      #prints aaab, aab, ab,

      But it has one side effect. See an example:
      while ('1234' =~ /(\d\d)/g) {
      print "$`<$&>$'", ", ";
      }
      #prints <12>34, 12<34>,

      Extended regexp:
      while ('1234' =~ /(?=(\d\d))/g) {
      print "$`<$&>$'", ", ";
      }
      #prints <>1234, 1<>234, 12<>34,

      So side effect is that this extended regexp doesn't allow to use $`, $&, $' variables as usually.

        To be true my problem was more complex than just matching digits. I wanted to find all possible matches for any regexp.

        That's simple enough to.

        local our @results; # Not "my". /(\d\d)(?{ push @results, $1 })(?!)/;

        Which is to be expected, because my approach never matches anything in the "real body" of the regular expression. If you want different behaviour of the regex engine, you can only achieve that by making it match different things, which will result in the match variables containing different values. If you want to keep the behaviour of $`, $& and $', then you will need to fiddle with pos. You haven't stated why you don't want to do that.

        See Regexp::Exhaustive to get every possible match of a pattern against a string. It supports the use of $& et al (without global penalty).

        use Regexp::Exhaustive 'exhaustive'; my @matches = exhaustive( 'asdf' => qr/..??/, qw[ $` $& $' ], ); printf "%s<%s>%s\n", @$_ for @matches; __END__ <a>sdf <as>df a<s>df a<sd>f as<d>f as<df> asd<f>

        lodin

        So, have the regex engine do what it does best, that is, return the longest matching string. Then, have a routine that takes that string and provides the permutations.


        s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s |-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,, $|=1,select$,,$,,$,,1e-1;print;redo}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://692218]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (8)
As of 2024-04-25 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found