With aminor change to qr/(.)\1*/; to pick non repeated characters, it works fine.
Very elegant++.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
ingenious!
And not only elegant++ but also by far the fastest solution so far, provided Benchmark not lying to me.
Since it is also very compact already, we get the most compact variant so far with
/((??{'(.)\1*'}))/g
This is not as fast as the precompiled regex, of course, but still faster than the other snippets seen.
| [reply] [d/l] |
That's really cool, but I'm confused as to why it works. From perlre about (??{ code }):
This is a "postponed" regular subexpression. The "code" is
evaluated at run time, at the moment this subexpression may
match. The result of evaluation is considered as a regular
expression and matched as if it were inserted instead of this
construct.
Maybe I'm thrown by the "matched as if it were inserted" part when what's being inserted is a regular expression -- if it were just inserted, I don't understand why your approach works different from just putting the subexpression in directly. It looks like it's being evaluated somehow "separately" from the rest of the regular expression, which I didn't expect given the doc. I don't know if that makes sense, but this code sample illustrates where I'm thrown:
my $re = qr/(.)\1*/;
@matches = $string =~ m/($re)/g;
print "@matches\n"; # x x x x 5 5 5 5 6 6 x x x x
@matches = $string =~ m/((??{$re}))/g;
print "@matches\n"; # xx 55 6 xx
Could you please help me understand what's really going on that's different between these two cases?
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
Please try this to see more of what is going on here:
use warnings;
use strict;
use re 'debug';
my $re = qr/(.)\1*/;
my @matches;
my $string = "xx556xx";
@matches = $string =~ m/($re)/g;
print "@matches\n\n"; # x x x x 5 5 5 5 6 6 x x x x
@matches = $string =~ m/((??{$re}))/g;
print "@matches\n\n"; # xx 55 6 xx
It looks like a question of greedy star vs. returning the first possible match and if \1 is evaluated before returning the match to me, but I dont feel much confidence to guess any further. | [reply] [d/l] [select] |
Great suggestion! I forgot about regex debug mode. It looks like whatever the code in (??{ code }) evaluates to is compiled into a regex (if it isn't one already) and then that regex is applied in a separate scope from the rest of the regex -- so it has its own match variables and so on. I wouldn't have expected that given the documentation in perlre.
In the debug output, the telltales appear to be "Entering embedded..." and "restoring...", as seen below.
Entering embedded `(.)\1*'
Setting an EVAL scope, savestack=31
0 <> <xx556xx> | 1: OPEN1
0 <> <xx556xx> | 3: REG_ANY
1 <x> <x556xx> | 4: CLOSE1
1 <x> <x556xx> | 6: CURLYX[1] {0,32767}
1 <x> <x556xx> | 10: WHILEM[1/1]
0 out of 0..32767 cc=140fafc
Setting an EVAL scope, savestack=37
1 <x> <x556xx> | 8: REF1
2 <xx> <556xx> | 10: WHILEM[1/1]
1 out of 0..32767 cc=140fafc
Setting an EVAL scope, savestack=43
2 <xx> <556xx> | 8: REF1
failed...
failed, try continuation...
2 <xx> <556xx> | 11: NOTHING
2 <xx> <556xx> | 12: END
Setting an EVAL scope, savestack=47
restoring \1 to -1(0)..1(no)
restoring \1..\1 to undef
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
| [reply] |