http://qs321.pair.com?node_id=139769

wasii has asked for the wisdom of the Perl Monks concerning the following question:

There must be a long-established idiomatic and elegant perl way to do this -- that I haven't found. I need a statement like:
s/\b$old\b/$new/g;
where $new properly reflects the case of the first letter of $old.

Given $old="my"; $new="no"; this would transmogrify

My perl is my life.
into
No perl is no life.
(I can't include patterns in $new.)

thanks much -
Andy

Replies are listed 'Best First'.
Re: Case-preserving substitutions
by I0 (Priest) on Jan 18, 2002 at 13:21 UTC
    s/\b($old)\b/lc$new^$1^lc$1/gie

    Update:
    In case $new=~/\W/ or length$new<length$old it may be slightly more complicated:
    s/\b($old)\b/lc$new^($1^lc$1)&(lc$new^uc$new)/gie

      What did you do, golf mine? :)

      I was unaware at that property of ^, very nice... in fact, I'm not ever sure why that works. Time for me to hit perlman:perlop.

      Update:

      Ok, its like this: ^ is XOR. That means the bit returns true if and only if one of the 2 bits is true:

      lc $new # lowercase the replacement ^ $1 # XOR with $1 - will increase the values by # the corresponding ascii value of each character # of $1 ^ lc $1 # XOR with lc $1 - basically, it will subract the # ascii values of lowercase $1 - if the values were # lowercase to begin with, the resulting sum is 0, # otherwise, the increase is enough to uppercase # the corresponding character
      In short - an incredibly concise way of doing it. ++!

      Another try at explaining this:

      When dealing with 7-bit ascci, the Uppercase begins at 65 and the lowercase at 97 -- 32 higher.   Since 32 is a power of two represented by bit 5 of the character, if this bit is set, the letter is lc, if unset, Uc.
      $ perl -lwe'$,=$\;print unpack("B*","A"), unpack("B*","a"), unpack"B*" +,"A"^"a"' 01000001 <- "A": 64 + 1 01100001 <- "a": 64 + 32 + 1 00100000 <- result of XORing
      The bit will be set only if the original was uppercase.   Since XORing something with itself is always 0, that is the only bit which can be set.   The lc of the replacement will have that bit set because that's what makes it lc, with other bits set to determine which letter.  

      So, bit 5 is set in the XORing of the original with its lc self only if the original is Uc (the opposite of the bits meaning!) and set in the lc replacement.   If they are both set XOR clears the result: hence Uc; if only the replacement is set it leaves it: lc.

      I think at this point I should exclaim "QED" and run.   It seemed clear enough before I started trying to explain it in this little box!

      update:   But note that jryan's answer above will work with any locale !

      reupdate;   IO points out (and I should've checked) that capitalizing-by-resetting-bit-5 also works for the 8-bit characters in the standard ISO8859-1 ("latin-1") character set.

        p
Re: Case-preserving substitutions
by jryan (Vicar) on Jan 18, 2002 at 13:03 UTC
    You need the /e modifier to cause the right side interpolation of the variable:
    s/\b($old)\b/(lcfirst($1) eq $1) ? lcfirst($new) : ucfirst($new)/gei;
    Consult perlman:perlre for more details.

    Update:
    I think I may have mis-interpreted the question. It has been since corrected, and the difference turned out to be slight.

      Update: by the time i posted this, jryan updated his post correctly :)

      You also need the "i" modifier in the end, to match case-insensitive way.

      $|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

Re: Case-preserving substitutions
by blakem (Monsignor) on Jan 19, 2002 at 00:34 UTC