Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

RegEx to filter \s not between labels

by gryphon (Abbot)
on May 23, 2002 at 00:17 UTC ( [id://168622]=perlquestion: print w/replies, xml ) Need Help??

gryphon has asked for the wisdom of the Perl Monks concerning the following question:

Greetings fellow monks,

This seems like such a simple problem, and perhaps it is and my brain is just extra slow today, but I need your help. I've got a rather long text string in a scalar from which I'd like to filter out multiple white spaces, converting them into just a single space per instance. Simple enough:

$text =~ s/\s+/ /g;

However, I'd like to not do this between a starting label and ending label. Here's some example text:

A Bridge Too Far Hosted by Rod Stuart Friendly Skys 42 STARTPRESERVE Life, the universe... and Everything STOPPR +ESERVE X-Files HotWheels are cool More movies on Fox File server

Basically, I'd love to have a simple regex that does the /\s+/ /og except not do anything between STARTPRESERVE and STOPPRESERVE. Any thoughts? (My appologies if this is really basic. I'm having a bad brain-day.)

-gryphon
code('Perl') || die;

Replies are listed 'Best First'.
Re: RegEx to filter \s not between labels
by I0 (Priest) on May 23, 2002 at 00:30 UTC
    $text =~ s/\s+|(STARTPRESERVE.*?STOPPRESERVE)/${[$1,' ']}[!$1]/gs

    UPDATE: jeffa, If you think that's twisted, you should see the version for nested START STOP markers:-)

    to remove the markers:
    s/\s+|STARTPRESERVE(.*?)STOPPRESERVE/${[' ',$1]}[defined $1]/gs

      ++I0! Minor changes to remove the start/stop markers which I think was desired:

      # The golfish way: $text =~ s#(\s)\s+|STARTPRESERVE(.*?)STOPPRESERVE#$1$2#gs; # Update: Oops! This fails if a run # of whitespace doesn't start with ' '. # or, if you don't like warnings: $text =~ s#\s{2,}|STARTPRESERVE(.*?)STOPPRESERVE# defined $1 ? $1 : " " #seg;
      (:

              - tye (but my friends call me "Tye")

      Greetings I0,

      ++! That's amazingly cool code. I love one-liners like that. In my specific situation, I don't need the START/STOP markers removed, and I can't imagine a senario where the data string would contain nested labels, however... I'm still curious. How would the nested labels version look?

      (Note that I did spend some time trying to figure this out for myself so I could at least post a few bits of code here for others to revise, but I've failed to figure out even the basic theory without going into many, many lines of basic code.)

      -gryphon
      code('Perl') || die;

        #Okay, you asked for it:
        $_ = '0 1 STARTPRESERVE STARTPRESERVE 2 STOPPRESERVE STOPPRESERVE 3 4 STARTPRESERVE STARTPRESERVE 5 STOPPRESERVE STOPPRESERVE 6 7'; (my $re=$_)=~s/((STARTPRESERVE)|(STOPPRESERVE)|.)/${['(','']}[!$2]\Q$1 +\E${[')','']}[!$3]/gs; $re= join'|',map{quotemeta}eval{/$re/}; die $@ if $@ =~ /unmatched/; s/\s+|($re)/${[$1,' ']}[!$1]/g;
•Re: RegEx to filter \s not between labels
by merlyn (Sage) on May 23, 2002 at 00:26 UTC
    Untested, but we did this one here before, and I think this is the pattern I followed:
    $_ = "your long string"; my $output = ""; while (/\G(.*?)(STARTPRESERVE.*?STOPPRESERVE)/gcs) { my ($left, $right) = ($1, $2); # I wish I could shortcut that above +:( $left =~ s/\s+/ /g; $output .= "$left$right"; } # last bit: if (/\G(.+)/gcs) { my ($left) = $1; $left =~ s/\s+/ /g; $output .= $left; }

    -- Randal L. Schwartz, Perl hacker

      And as another strategy:
      my @temp = split /(STARTPRESERVE.*?STOPPRESERVE)/; my $output = ""; while (@temp) { local $_ = shift @temp; s/\s+/ / unless @temp % 2; $output .= $_; }

      -- Randal L. Schwartz, Perl hacker

Re: RegEx to filter \s not between labels
by talexb (Chancellor) on May 23, 2002 at 00:24 UTC
    I'd simplify, instead of trying to do it all with a regex.
    • Split based on the STARTPRESERVE and STOPPRESERVE markers, stuff the results into an array;
    • Do the $text =~ s/\s+/ /g; on the first and last elements of the array; and
    • Reassemble using join.

    --t. alex

    "Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny

(jeffa) Re: RegEx to filter \s not between labels
by jeffa (Bishop) on May 23, 2002 at 00:30 UTC
    Similar to talexb's idea, no where near as cool as merlyn's code, here is another way to do it:
    use strict; while(<DATA>) { my ($preserve) = $_ =~ /STARTPRESERVE(.*?)STOPPRESERVE/; s/$preserve// if $preserve; s/\s+/ /og; s/STARTPRESERVESTOPPRESERVE/$preserve/ if $preserve; print; } __DATA__ A Bridge Too Far Hosted by Rod Stuart Friendly Skys 42 STARTPRESERVE Life, the universe... and Everything + STOPPRESERVE X-Files HotWheels are cool More movies on Fox File server
    UPDATE: you are one sick twisted puppy I0! ;)

    UPDATE UPDATE: that's it - i'm outta here ...

    jeffa travels to the highest mountain to meditate more perlre

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: RegEx to filter \s not between labels
by abstracts (Hermit) on May 23, 2002 at 17:31 UTC
    Okay, I think this one looks good too:
    s[(STARTPRESERVE.*?STOPPRESERVE)|(\s+)|(.)] [($1||'') . ($2?' ':'') . ($3||'')]eg;
    The ||'' is to avoid the warnings.

    Update: And this is of course wrong, since I'm testing with || and should do defined instead.

    Update2: This is a correct and shorter version:

    s#(STARTPRESERVE.*?STOPPRESERVE)|(\s+)|(.)#$1?$1:$2?' ':$3#eg;
    because $1 and $2 can both be tested for truth.

    Update3 (ouch):

    s#(STARTPRESERVE.*?STOPPRESERVE)|(\s+)#$1?$1:$2?' ':''#eg;
    Now this looks alot like tye's suggestion except for elimination of defined :-(

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://168622]
Approved by talexb
Front-paged by VSarkiss
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-26 01:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found