Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Simple regex question

by Anonymous Monk
on Oct 23, 2003 at 12:57 UTC ( [id://301569]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Monks

I'm suffering from a minor regexp mental breakdown:

Given a string such as: "aaabbccccdd eee"

Id like to split it so that each single character range is extracted:

( "aaa", "bb", "cccc", "dd", " ", "eee" )

I initially thought something like the following might be ok, but since (?:) prevents capturing, I can see that it won't work:

split /((?:.)\1*)/ $string

I'm sure the answer is staring me in the face, but there we go!

Replies are listed 'Best First'.
Re: Simple regex question
by gmax (Abbot) on Oct 23, 2003 at 13:15 UTC

    "\1" means the first opened parenthesis. You need to use "\2".

    $ perl -e '$_="aaabbccccddeee"; print $1,$/ while /((.)\2*)/g' # ^^ # || # |+--- \2 # | # \1 aaa bb cccc dd eee

    Or, if you want an array as a result,

    push @array, $1 while /((.)\2*)/g;

    Update
    Notice that you can't just assign the outcome of the regex to an array, because you'd get redundant results (both $1 and $2 for each match).

    # THIS IS WRONG @array = $_ =~ /((.)\2*)/g; print join ", " , map {"<$_>"} @array; __END__ <aaa>, <a>, <bb>, <b>, <cccc>, <c>, <dd>, <d>, <eee>, <e>
     _  _ _  _  
    (_|| | |(_|><
     _|   
    
      push @array, $1 while /((.)\2*)/g;

      nicely demonstrates the difference between capturing for backreference and capturing for return value. I wonder if Perl 7 will make that distinction?

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

      Thanks for the double quick reply

      I can see it was a mistake trying to use split, which is why I was pratting about with the (?:) grouping in the first place.

      I'm getting older and wiser all the time (well, older anyway)...

Re: Simple regex question (ex.s)
by tye (Sage) on Oct 23, 2003 at 15:11 UTC

    A few ways to do it, for variety:

    my $str= "aaabbccccdd eee"; my @list; my $pos= 0; while( $str =~ /(?<=(.))(?!\1)/g ) { push @list, substr( $str, $pos, pos($str)-$pos ); $pos= pos($str); }
    or
    my @list= split /(?<=(.))(?!\1)/, "aaabbccccdd eee"; @list= @list[ map 2*$_, 0..@list/2-1 ];

                    - tye
Re: Simple regex question
by tachyon (Chancellor) on Oct 23, 2003 at 13:26 UTC

    Scan string.....

    use Data::Dumper; my $str = "aaabbccccdd eee"; my $last = ''; my @res; my $i = -1; for my $chr ( split //, $str ) { $i++ unless $chr eq $last; $res[$i] .= $chr; $last = $chr; } print Dumper \@res; __DATA__ $VAR1 = [ 'aaa', 'bb', 'cccc', 'dd', ' ', 'eee' ];

    OK so it looks like C without the pointers. So shoot me!

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Well, I did have something similar in reserve, but wanted to make it a bit more regexpy for a laugh!

      Thanks for the reply

        Why? Less written code == faster => no; some regexp solution is 'cooler' == you say; posting to PM for a 5 line answer in almost any lang is efficient......

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Then why watse our time? Why not go: "this does what I want but..." for some inexplicable reason I need TIMTOWDI. Despite the fact that what I have what I need, with no speed justification/benchmark at all, I would like you all to solve a problem I have already solved, for no particular reason, for free, for someone you dont know......

        I am surprised there is less enthusiasm.

        • Solving problems is one thing
        • Solving solved problems is antoher thing
        • Solving problems for someone who then goes on to go..... might possible erode the milk of HK. JM2c.

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Simple regex question
by delirium (Chaplain) on Oct 23, 2003 at 14:43 UTC
    Using command line split, and quick hack to glue a new array out of @F:

    echo 'aaabbccccdd eee' | perl -aF'(.)(?!\1)' -lne 'while(@F){push@M,sh +ift(@F).shift(@F)}print for @M'

    Producing:

    aaa bb cccc dd eee

    Update:

    Shorter :

    echo 'aaabbccccdd eee'|perl -aF'(.)(?!\1)' -lne 'for(@F){$M[$c].=$_;$c+=$d++%2}print for @M'

    Not using split, and shorter still:

    echo 'aaabbccccdd eee'|perl -lne 'while($_){s/((.)\2*)//;push@F,$1}print for @F'

Re: Simple regex question
by Anonymous Monk on Oct 23, 2003 at 20:50 UTC
    Wow, I never realized there could be so many different ways of doing this (but I should have guessed, of course!).

    Thanks to everyone for the replies - they're most appreciated.

      Well here's another way, when you want only a part of a list, grep comes to my mind:
      grep { ++$i % 2 } /((.)\2*)/g

      $anarion=\$anarion;

      s==q^QBY_^=,$_^=$[x7,print
Re: Simple regex question
by holo (Monk) on Oct 26, 2003 at 14:09 UTC

    Ok, I'm cheating but what the heck ...

    use Data::Dumper qw/Dumper/; $_ = "aaaabbbcccccdd eeef"; print Dumper([ keys %{{ split /(?<=(.))(?!\1)/x }} ]); __END__ $VAR1 = [ 'ccccc', 'eee', 'aaaa', ' ', 'bbb', 'dd', 'f' ];

    Any comments ?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://301569]
Approved by Corion
Front-paged by rnahi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-19 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found