Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Regex To remove text between parentheses

by beretboy (Chaplain)
on Jul 10, 2001 at 18:12 UTC ( [id://95302]=perlquestion: print w/replies, xml ) Need Help??

beretboy has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex that removes all text that is within parenthese, for example:
(*) PLEDGE AEROSOL LEMON Made By: S.C. Johnson Wax (046500)
would become:
PLEDGE AEROSOL LEMON Made By: S.C. Johnson Wax


"Sanity is the playground of the unimaginative" -Unknown

Replies are listed 'Best First'.
Re: Regex To remove text between parentheses
by davorg (Chancellor) on Jul 10, 2001 at 18:16 UTC
Re: Regex To remove text between parentheses
by japhy (Canon) on Jul 10, 2001 at 18:59 UTC
    Here's a grunky version.
    $re = qr{ \( (?{ local $N = 1 }) (?: (?(?{ !$N })(?!)) (?: \( (?{ local $N = $N + 1 }) | \) (?{ local $N = $N - 1 }) | [^()]+ ) )+ (?(?{ $N })(?!)) # fixed, thanks to Hofmator }x; $text =~ s/$re//g;


    japhy -- Perl and Regex Hacker

      Or another version for this nested parens matching from the camel (3rd edition) which is a little bit easier to understand IMHO. But I'd guess that the recursion makes it slower

      $re = qr{ \( (?: (?> [^()]+ ) # Non-parens without backtracking | (??{ $re }) # Group with matching parens )* \) }x; $text =~ s/$re//g;

      -- Hofmator

      Someone mentioned the other day that repeated nested recursions of the '?' produces hard to read code. Certainly it works but my preference would be for some other way.

      I guess that is what you meant by 'grunky'.

Re: Regex To remove text between parentheses
by Hofmator (Curate) on Jul 10, 2001 at 18:21 UTC

    To give you some hints (if you haven't been quick enough to read the first version of this node :)

    • match literal parenthesis - you have to escape them because of their special meaning
    • then match anything in between with non-greedy quantifiers
    • match corresponding parenthesis
    • you hopefully don't have nested parens because then it gets tricky

    Update: OK, so now I probably spoiled the learning already as davorg and jeroens remarked. I generally agree with it so I put the code now in the 'spoiler' section and altered the explanation into a hint list - but probably too late anyway ;-).

    -- Hofmator

Re: Regex To remove text between parentheses
by jeroenes (Priest) on Jul 10, 2001 at 18:22 UTC
Re: Regex To remove text between parentheses
by beretboy (Chaplain) on Jul 10, 2001 at 18:25 UTC
    what I have so far is:
    s/\((.*)\)//g;
    but this gets rid of the whole string :-(

    "Sanity is the playground of the unimaginative" -Unknown

      You might want to look at Death to Dot Star! to why it is getting rid of the whole thing

      s/\([^)]+\)//g;

      (Unless you have nested parenteses, as Hofmator pointed out.) And if you have (), the above won't remove it, unless you change the + to a *.

      Also, any reason why you put the contents between the ()'s into $1? Are you going to use it later?

      Remember that * is *greedy* ... it will match the longest possible substring. So, given a string like

      (*) Microsoft Internet Exploder (which calls itself "Mozilla") - a pro +duct of a convicted violator of the Sherman Act (wow)

      That greedy quantifier will snatch up that * after the opening paren, and not stop till it gets to the ) that follows the "wow" (that IS the longest substring that matches your RE).

      As to the suggestion you use a character class (hint: match one or more things that aren't closing parentheses), see Death to Dot Star! for a discussion of why not to use .*

      HTH!

      perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'

      You're being bitten by the "greediness" of regular expressions. They try to match as much of the string as possible - which, in this case, is all of it.

      To make the regex "non-greedy" put a '?' after the greedy part of the regex.

      You might also like to take a look at Death to Dot Star!

      --
      <http://www.dave.org.uk>

      Perl Training in the UK <http://www.iterative-software.com>

      As a quick, cludgy fix; try (.+) instead of (.*)
      Update: OK, so this headache is affecting my mental powers -- ignore that previous bit and instead look at the answers invloving negated classes (^)*) and the like - In my defence, I said that was what you're supposed to do below:

      As a better fix, create a character class that doen't include brackets, and use that instead...

      --
      RatArsed, in search of enlightment and asprin.

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Regex To remove text between parentheses
by the_slycer (Chaplain) on Jul 10, 2001 at 18:44 UTC
    This works fairly well. It can also get rid of nested parentheses like this: (begin(next(third))). Though not like (begin(next)third).
    s/\([^)]+\)+//g;

      No, it doesn't handle even your first case of nesting correctly. My preference is to use: s#\([^()]*\)##g and repeat if you need to worry about nested parens: 0 while s#\([^()]*\)##g; I really hate complicated regexes.

              - tye (but my friends call me "Tye")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://95302]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2024-04-24 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found