use warnings;
$sentence='kinase inhibitor SET6 activates p16(INK4A) in cell-wall.';
my @phrases = ('kinase i', 'inhibitor', 'tor SET6', 'SET6', 'p16(INK4A
+)', 'cell');
my $phrases_re = join '|', map { quotemeta } @phrases;
$sentence =~ s/(^| )($phrases_re)(?= |$)/$1#$2#/g;
print $sentence, "\n";
You get the output
kinase #inhibitor# #SET6# activates #p16(INK4A)# in cell-wall.
Update: There are ways to do this kind of thing without lookaheads or lookbehinds, just as a curiosity. Replace the substitution statement above with either
$sentence =~ s/(^| )($phrases_re)( |$)/$1#$2#$3/g for 0, 1;
or
use 5.010; given ($sentence) { s/ / /g; s/(^| )($phrases_re)( |$)/$1#
+$2#$3/g; s/ / /g; }
Update: One more alternative is below.
my %phrase; $phrase{$_}++ for @phrases;
my @sentence = split /( +)/, $sentence;
for (@sentence) { $phrase{$_} and $_ = "#" . $_ . "#"; };
$sentence = join "", @sentence;
Update: Oh, let's not forget this one either.
$sentence =~ s/(?<![^ ])($phrases_re)(?= |$)/#$1#/g;
|