Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Regexp: can I do it in one go?

by moxliukas (Curate)
on Aug 22, 2002 at 11:13 UTC ( [id://191980]=perlquestion: print w/replies, xml ) Need Help??

moxliukas has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have been writing a regexp that would transform this:

$s = 'aaaabababbbbaaaccccbbbbbbaadddd';

into

$s = '4ababa4b3a3a4c6b2a4d';

Basicly it is something similar to mathematical series test (ummm... not sure if this is the correct translation from Lithuanian) where subsequent occurrences of the same character are counted (except no number would be inserted if there is only one character).

I have been trying to come up with a regexp that would do this transformation and I got to the point where everything works:

$s = 'aaaabababbbbaaaccccbbbbbbaadddd'; $s =~ s"($_{2,})"length($1).$_"ge for ('a'..'d'); print $s;

However I am not very happy with the for loop. I wonder if the same can be achieved in one regexp, without the need to scan the line for each character. Can character classes be somehow involved in the regexp to avoid looping?

Thanks for any help in advance.

Replies are listed 'Best First'.
Re: Regexp: can I do it in one go?
by jmcnamara (Monsignor) on Aug 22, 2002 at 11:34 UTC

    You can use a backreference to obtain a single regex:
    #!/usr/bin/perl -wl use strict; my $s = 'aaaabababbbbaaaccccbbbbbbaadddd'; print $s; $s =~ s/((.)\2+)/length($1) . $2/eg; print $s; __END__ Prints: aaaabababbbbaaaccccbbbbbbaadddd 4ababa4b3a4c6b2a4d

    --
    John.

      Thanks a lot. I can't believe that I didn't think about it this way ;)

      Thank you again

Re: Regexp: can I do it in one go?
by Arien (Pilgrim) on Aug 22, 2002 at 11:31 UTC

    What you want to do is globally match a something including possible repetitions, and replace what you've found with that something followed by the length of your match:

    $s =~ s/((.)\2*)/$2 . length $1/eg;

    — Arien

    Edit: It seems I misread the output you want. To only have sequences of two or more repeated letters replaced, change the star to a plus sign. (And after some sleep...) Also, you'd want to swap length $1 and $2 to have the length preceed the letter.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://191980]
Approved by simon.proctor
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2024-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found