http://qs321.pair.com?node_id=785818

newbio has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

Example:

Input: **Type_1_deiodinase** , **D1** , metabolizes different forms, **A** , **B** of thyroid hormones to control levels of T3 , the active ligand for **thyroid_hormone_receptors** , **TR**

Output: **Type_1_deiodinase_(D1)** metabolizes different forms, **A_(B)** of thyroid hormones to control levels of T3 , the active ligand for **thyroid_hormone_receptors_(TR)**

If I do the following:  $line =~ s/\*\*([^\*]+)\*\*\s\,\s\*\*([^\*]+)\*\*(\s\,)?/**$1_($2)**/g;

it merges all $1 and $2 in the sentence. However, I want to merge terms only if either of $1 or $2 contains '_'.

That is in the above sentence, **Type_1_deiodinase_(D1)** and **thyroid_hormone_receptors_(TR)** are OK, while **A_(B)** is not. Is there a way to apply 'if' condition in the substitution expression above so that I can merge only those adjacent terms that contain '_'?

Thanks a lot.

Replies are listed 'Best First'.
Re: conditional statement in substitution expression
by Roy Johnson (Monsignor) on Aug 04, 2009 at 17:16 UTC
    s/\*\*([^\*]+)\*\*\s\,\s\*\*([^\*]+)\*\*(\s\,)?/(grep {index($_,'_') +>=0} $1,$2)?"**$1_($2)**":$&/ge;
    This incurs the performance penalty for using $&, so you might prefer
    while(<DATA>) { s/(\*\*([^\*]+)\*\*\s\,\s\*\*([^\*]+)\*\*(\s\,)?)/(grep {index($_,'_ +')>=0} $2,$3)?"**$2_($3)**":$1/ge; print; }

    Caution: Contents may have been coded under pressure.
Re: conditional statement in substitution expression
by jethro (Monsignor) on Aug 04, 2009 at 16:52 UTC
    You might split the regex into two. In the first the regex should have the first ([^\*]+) changed to ([^\*]*_[^\*]*). In regex two the second ([^\*]+) is changed. Note this will also substitute if BOTH substrings contain underscores, but you could use ([^\*_]+) to prevent that.
Re: conditional statement in substitution expression
by jwkrahn (Abbot) on Aug 04, 2009 at 17:04 UTC
    $line =~ s/\Q**\E(([^*]+)\Q**\E\s,\s\Q**\E([^*]+))\Q**\E(\s,)?/ "$2$3" + =~ tr!_!! ? "**$2_($3)**" : "**$1**" /eg;