Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Elegant way to split into sequences of identical chars?

by ysth (Canon)
on Nov 30, 2005 at 04:34 UTC ( [id://512845]=note: print w/replies, xml ) Need Help??


in reply to Elegant way to split into sequences of identical chars?

I don't have a perl to test with right now, but I think this will work:
$repeater = qr/(.)\1+/; @matches = $string =~ /((??{$repeater}))/g;

Replies are listed 'Best First'.
Re^2: Elegant way to split into sequences of identical chars?
by BrowserUk (Patriarch) on Nov 30, 2005 at 04:45 UTC

    With aminor change to qr/(.)\1*/; to pick non repeated characters, it works fine.

    Very elegant++.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      ingenious!

      And not only

      elegant++
      but also by far the fastest solution so far, provided Benchmark not lying to me.

      Since it is also very compact already, we get the most compact variant so far with

      /((??{'(.)\1*'}))/g

      This is not as fast as the precompiled regex, of course, but still faster than the other snippets seen.

Re^2: Elegant way to split into sequences of identical chars?
by xdg (Monsignor) on Nov 30, 2005 at 12:08 UTC

    That's really cool, but I'm confused as to why it works. From perlre about (??{ code }):

    This is a "postponed" regular subexpression. The "code" is evaluated at run time, at the moment this subexpression may match. The result of evaluation is considered as a regular expression and matched as if it were inserted instead of this construct.

    Maybe I'm thrown by the "matched as if it were inserted" part when what's being inserted is a regular expression -- if it were just inserted, I don't understand why your approach works different from just putting the subexpression in directly. It looks like it's being evaluated somehow "separately" from the rest of the regular expression, which I didn't expect given the doc. I don't know if that makes sense, but this code sample illustrates where I'm thrown:

    my $re = qr/(.)\1*/; @matches = $string =~ m/($re)/g; print "@matches\n"; # x x x x 5 5 5 5 6 6 x x x x @matches = $string =~ m/((??{$re}))/g; print "@matches\n"; # xx 55 6 xx

    Could you please help me understand what's really going on that's different between these two cases?

    -xdg

    Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      Please try this to see more of what is going on here:
      use warnings; use strict; use re 'debug'; my $re = qr/(.)\1*/; my @matches; my $string = "xx556xx"; @matches = $string =~ m/($re)/g; print "@matches\n\n"; # x x x x 5 5 5 5 6 6 x x x x @matches = $string =~ m/((??{$re}))/g; print "@matches\n\n"; # xx 55 6 xx
      It looks like a question of greedy star vs. returning the first possible match and if \1 is evaluated before returning the match to me, but I dont feel much confidence to guess any further.

        Great suggestion! I forgot about regex debug mode. It looks like whatever the code in (??{ code }) evaluates to is compiled into a regex (if it isn't one already) and then that regex is applied in a separate scope from the rest of the regex -- so it has its own match variables and so on. I wouldn't have expected that given the documentation in perlre.

        In the debug output, the telltales appear to be "Entering embedded..." and "restoring...", as seen below.

        Entering embedded `(.)\1*' Setting an EVAL scope, savestack=31 0 <> <xx556xx> | 1: OPEN1 0 <> <xx556xx> | 3: REG_ANY 1 <x> <x556xx> | 4: CLOSE1 1 <x> <x556xx> | 6: CURLYX[1] {0,32767} 1 <x> <x556xx> | 10: WHILEM[1/1] 0 out of 0..32767 cc=140fafc Setting an EVAL scope, savestack=37 1 <x> <x556xx> | 8: REF1 2 <xx> <556xx> | 10: WHILEM[1/1] 1 out of 0..32767 cc=140fafc Setting an EVAL scope, savestack=43 2 <xx> <556xx> | 8: REF1 failed... failed, try continuation... 2 <xx> <556xx> | 11: NOTHING 2 <xx> <556xx> | 12: END Setting an EVAL scope, savestack=47 restoring \1 to -1(0)..1(no) restoring \1..\1 to undef

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        Another curiosity: it is a special feature of m//g that if you don't parenthesize anything, it will return the entire match, so removing the parentheses from your first example changes nothing.

        However, removing the outer parentheses from your second example causes a crash on my box (ActiveState 5.8.7).


        Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://512845]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-25 14:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found