Global regexp

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Global regexp by Corion (Patriarch) on Jun 16, 2008 at 07:55 UTC
The Perl regular expression engine starts the next match always after the last matched character of the previous match. So you will need lookahead assertions `(?=` that don't adjust `pos` if you don't want to fiddle with `pos` yourself (see perlre on lookahead assertions): `perl -le "$_='1234';@x=/(?=(\d\d))/g;print for @x"` [download]	[reply] [d/l] [select]
Re^2: Global regexp by throop (Chaplain) on Jun 16, 2008 at 12:32 UTC
How would you modify that so that match only capture strings where the second digit is higher than the first? (Which is what I thought the OP meant, but I'm apparently wrong..) I'm scratching my head. throop	[reply]
Re^3: Global regexp by Corion (Patriarch) on Jun 16, 2008 at 12:49 UTC
In such cases, I mostly resort to enumeration: `/(0[1-9] \|1[2-9] \|2[3-9] \|3[4-9] \|4[5-9] \|5[6-9] \|6[7-9] \|7[8-9] \|8[9-9])/` [download] will only capture pairs where the first digit is lower than the second digit. I'm not sure that there is a smarter/nicer version of that though ...	[reply] [d/l]
Re^3: Global regexp by ikegami (Patriarch) on Jun 16, 2008 at 15:52 UTC
It's usually best/simplest to just filter out the bad results. `my @x; while (/(?=((\d)(\d)))/g) { push @x, $1 if $3 > $2; }` [download] You could also do that nicely using a lookup. `my %ok; for my $i (1..8) { for my $j ($i+1..9) { $ok{"$i$j"} = 1; } } my @x = grep $ok{$_}, /(?=(\d\d))/g;` [download] You could build a complex pattern. `use Regexp::List qw( ); my @ok; for my $i (1..8) { for my $j ($i+1..9) { push @ok, "$i$j"; } } my $re = Regexp::List->new()->list2re(@ok); my @x = /(?=($re))/g;` [download] In can be done programmatically in the regexp as well. `my @x; push @x, $1 while /(?=((\d)(\d)(?(?{ $3 <= $2 })(?!))))/g;` [download] (It's pretty crazy that I was able to write the last one without errors and without checking perlre, so you probably shouldn't use that one.) All tested.	[reply] [d/l] [select]
Re^4: Global regexp by throop (Chaplain) on Jun 16, 2008 at 22:36 UTC
Re^2: Global regexp by Anonymous Monk on Jun 16, 2008 at 09:09 UTC
Thank you. Actually, i read perlre doc, but i thought (?= ) wouldn't help me. So solution you gave looks for me like some perl hack :-)	[reply]
Re^3: Global regexp by Anonymous Monk on Jun 16, 2008 at 10:02 UTC
It does look a little bit hacky and I'm not sure you will find it explicitly discussed in either perlre or perlretut, but it is a kosher Perl regex expression and you need not worry about using it.	[reply]
Re: Global regexp by starbolin (Hermit) on Jun 16, 2008 at 15:35 UTC
Not sure your parameters. Do you mean all pairs of numerically consecutive digits? Or just adjacent digits? Does the string contain alphabetics? Although everyone loves to show off their regex foo, not all solutions require a regex. Perhaps this: `my $s = "abc12341xyz"; my @pairs = grep /\d\d/, map { substr $s, $_, 2 } (0..length($s)-2); print join " ", @pairs; 12 23 34 41` [download] s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]
Re^2: Global regexp by Anonymous Monk on Jun 17, 2008 at 09:21 UTC
I meant adjacent digits. To be true my problem was more complex than just matching digits. I wanted to find all possible matches for any regexp. Take print $_, ", " for ('1aaab2' =~ /(a+b)/g) #prints aaab, It prints only one result, instead of list (what's needed): 'ab', 'aab', 'aaab'. Solution given by Corion works perfectly in this case: print $_, ", " for ('1aaab2' =~ /(?=(a+b))/g) #prints aaab, aab, ab, But it has one side effect. See an example: while ('1234' =~ /(\d\d)/g) { print "$`<$&>$'", ", "; } #prints <12>34, 12<34>, Extended regexp: while ('1234' =~ /(?=(\d\d))/g) { print "$`<$&>$'", ", "; } #prints <>1234, 1<>234, 12<>34, So side effect is that this extended regexp doesn't allow to use $`, $&, $' variables as usually.	[reply]
Re^3: Global regexp (all possible) by ikegami (Patriarch) on Jun 17, 2008 at 16:35 UTC
To be true my problem was more complex than just matching digits. I wanted to find all possible matches for any regexp. That's simple enough to. `local our @results; # Not "my". /(\d\d)(?{ push @results, $1 })(?!)/;` [download]	[reply] [d/l]
Re^4: Global regexp (all possible) by Anonymous Monk on Jun 18, 2008 at 04:38 UTC
Re^5: Global regexp (all possible) by ikegami (Patriarch) on Jun 18, 2008 at 14:48 UTC
Re^3: Global regexp by Corion (Patriarch) on Jun 17, 2008 at 09:28 UTC
Which is to be expected, because my approach never matches anything in the "real body" of the regular expression. If you want different behaviour of the regex engine, you can only achieve that by making it match different things, which will result in the match variables containing different values. If you want to keep the behaviour of $`, `$&` and `$'`, then you will need to fiddle with `pos`. You haven't stated why you don't want to do that.	[reply] [d/l] [select]
Re^4: Global regexp by Anonymous Monk on Jun 17, 2008 at 12:24 UTC
Re^5: Global regexp by Corion (Patriarch) on Jun 17, 2008 at 12:29 UTC
Some notes below your chosen depth have not been shown here
Re^3: Global regexp (Regexp::Exhaustive) by lodin (Hermit) on Jun 17, 2008 at 18:09 UTC
See Regexp::Exhaustive to get every possible match of a pattern against a string. It supports the use of `$&` et al (without global penalty). use Regexp::Exhaustive 'exhaustive'; my @matches = exhaustive( 'asdf' => qr/..??/, qw[ $` $& $' ], ); printf "%s<%s>%s\n", @$_ for @matches; __END__ <a>sdf <as>df a<s>df a<sd>f as<d>f as<df> asd<f> [download] lodin	[reply] [d/l] [select]
Re^4: Global regexp (Regexp::Exhaustive) by Anonymous Monk on Jun 18, 2008 at 04:42 UTC
Re^3: Global regexp by starbolin (Hermit) on Jun 17, 2008 at 16:31 UTC
So, have the regex engine do what it does best, that is, return the longest matching string. Then, have a routine that takes that string and provides the permutations. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l]


Think about Loose Coupling
	PerlMonks