Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
Search and replace a bunch of words simultaneously seems a somewhat fairly common question. Thought I might as well put up a few examples for reference.

First, the classical hash technique:
my %ch = ('green' => 'lousy', 'blue' => 'cool', 'pink' => 'mini' ) ; my $str = 'I have a green hat, blue shirt, plus a pink jacket'; print $str . "\n" ; $str =~ s/(green|blue|pink)/$ch{$1}/g ; print $str ; __END__ I have a green hat, blue shirt, plus a pink jacket I have a lousy hat, cool shirt, plus a mini jacket

Very self-explanatory, right?

Second, let's try the more flexible RegexpHash:
use Tie::RegexpHash; my %sr; tie %sr, 'Tie::RegexpHash'; $sr{qr/\bh+(a|@)+t+e+\b/i} = 'love'; $sr{qr/\b(u|you|eww)\b/i} = 'you'; # - - - - - - - - - - - - - - - - -- - - - -- - - - - $_ = "I hate you i HAte u i HH\@\@\@TTeE eww i HA\@AaaTTee u I HATE YO +U!\n"; print; my $s = join("|", keys%sr); s/($s)/$sr{$1}/g; print; __END__ I hate you i HAte u i HH@@@TTeE eww i HA@AaaTTee u I HATE YOU! I love you i love you i love you i love you I love you!

It's certainly better than hardcoding all the variation of "hate" with the plain old hash.

What if you want the replacements to be conditional on the matches? Like, using $1, etc. Let's see, let's try British spelling to American spelling conversion:
use Tie::RegexpHash; my %sr; tie %sr, 'Tie::RegexpHash'; # - - - - - - - - - - - - - - - - - - - - - - - - - - # search and replace $sr{qr/\b(h)arbour\b/i} = '$2arbor'; $sr{qr/\b(h)onour(.*?)\b/i} = '$3onor$4'; $sr{qr/\b(c)entre\b/i} = '$5enter'; # - - - - - - - - - - - - - - - - -- - - - -- - - - - $_ = "Programmers Honoured at Harbour Centre\n"; print; my $s = join("|", keys%sr); s/($s)/eval'"'.$sr{$1}.'"'/ge; print; __END__ Programmers Honoured at Harbour Centre Programmers Honored at Harbor Center

You have the upper/lowercase agreement, and you don't have to hardcode all the 'honour,' 'honourable,' 'honourary,' etc. Pretty good. (Thanks Skeeve for the eval hint.)

But, wait. There're so many $1 ... $n. What if I add, delete, or somehow reorder the key/value pairs? Well, let's see:
use Tie::RegexpHash; my %sr; tie %sr, 'Tie::RegexpHash'; # - - - - - - - - - - - - - - - - - - - - - - - - - - # search and replace $sr{qr/\b(c)entre\b/i} = '$5enter'; $sr{qr/\b(h)arbour\b/i} = '$2arbor'; $sr{qr/\b(h)onour(.*?)\b/i} = '$3onor$4'; $sr{qr/(T|t)heatre/} = 'theater'; # - - - - - - - - - - - - - - - - -- - - - -- - - - - $_ = "Programmers Honoured at Harbour Centre\n"; print; my $s = join("|", keys%sr); s/($s)/eval'"'.$sr{$1}.'"'/ge; print; __END__ Programmers Honoured at Harbour Centre Programmers onorH at arbor enter

Right, we're doomed. Keeping track of all the bracketing contructs and trying to put all the $1...$n in the right order seems too impractical. Let's look for some other modules...

Third example here comes:
use Regexp::Subst::Parallel; my @sr =( qr/\b(h)arbour\b/i => '$1arbor', qr/\b(h)onour(.*?)\b/i => '$1onor$2', qr/\b(c)entre\b/i => '$1enter', qr/\b(L|l)ift\b/ => sub{$_=$_[1]=~/L/?"E":"e";$_."leva +tor"} ); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - $_ = "Man Honoured at Harbour Centre by the Lift\n"; print; $_ = subst($_, @sr); print; __END__ Man Honoured at Harbour Centre by the Lift Man Honored at Harbor Center by the Elevator

Since the expression in each replacement is independent of each other (unlike RegexpHash). We're in good shape.

And notice how we can use a sub in replacement for even more flexibility (which, incidentally, tastes like a functional programming flavor).

In reply to 3 Examples of Multiple-Word Search n Replace by chunlou

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.