Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Can't get \n or other character/translation escapes to interpolate if originally read from a data file

by GrandFather (Saint)
on Mar 15, 2021 at 20:46 UTC ( [id://11129720]=note: print w/replies, xml ) Need Help??


in reply to Can't get \n or other character/translation escapes to interpolate if originally read from a data file

By now you've got "why it happens" and "how to work around it". But I'm interested to know "why do you want that"?. Maybe we can help you find a better way to achieve your end goal if we know what that is?

Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
  • Comment on Re: Can't get \n or other character/translation escapes to interpolate if originally read from a data file

Replies are listed 'Best First'.
Re^2: Can't get \n or other character/translation escapes to interpolate if originally read from a data file
by davebaker (Pilgrim) on Mar 16, 2021 at 02:16 UTC

    Thanks. My end goal was the replacement of several different strings that appear in an unknown number of text files in a particular directory. Some of the strings include a line feed. All of the strings would be replaced by a particular piece of text.

    (Backstory: The text files are newsletters that are archived on the web, the text of which sometimes includes an address that's being picked up by spammers scraping the newsletters over the web. The goal is to scrub the email address, which appears in one of several different standard sentences, and replace it with generic contact information.)

    Reach Holly Smith for help by sending an email
    to hollysmith@nosuchdomain.com.
    

    and

    For more information, contact Holly Smith at
    hollysmith@nosuchdomain.com.
    

    and

    For more information, contact Holly Smith at (800) 555-1212 or
    via email to hollysmith@nosuchdomain.com
    

    In each case, I want to replace those sentences (which have line breaks where you see them, e.g. after "sending an email" in the first one) with "For more information, contact Holly Smith." (without the quotation marks).

    The reason I got balled up with the interpolation issue is that I first tried to use a DATA block in a small script, as in:

    __DATA__ Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdo +main.com.~For more information, contact Holly Smith. For more information, contact Holly Smith at\nhollysmith@nosuchdomain. +com.~For more information, contact Holly Smith. For more information, contact Holly Smith at (800) 555-1212 or\nvia em +ail to hollysmith@nosuchdomain.com~For more information, contact Holl +y Smith.

    (There are only three lines in the DATA block, even though they almost certainly will wrap on this web page.)

    My script would read each of those lines in the DATA block, load variables $old_string and $new_string by splitting on the "~" character, and then do a

    if ($slurped_file =~ s/\Q$old_string\E/$new_string/g ) {

    kind of thing to make the replacement, ultimately resulting in updated text that would be used to replace the existing file.

    And now you see why it didn't work :-) I had used the \Q in order to escape the domain name's period and the set of parentheses in one targeted string, but of course \Q wants to do what \Q does, so it also escapes the backslash in "\n" in my targeted strings. Hence the "\n" in my DATA block records didn't seem to "work." The substitutions never took place because the files don't have strings that include a literal \ followed by n.

    Before I realized the \Q issue, though, I was convinced that something about the use of a __DATA__ section for the data had to be preventing the \n from being interpolated, and I thought I needed interpolation so that I could put, on a single line in the DATA block, an expression that would match the line breaks that occur in all three target strings. So I created the little test script shown in my original post in order to make sure \n would be the proper way to represent such a line feed. And I couldn't get the \n to turn into a line feed. I seem to have stumbled onto something that turns out to be unrelated to solving my problem!

    Basically I failed to remember that merely putting a string into a variable doesn’t cause the string to be interpolated. Otherwise there would be trouble every time a graphic file is read from disk, if its content included the character sequence “\n”, for example. (I think.)

    So this code is doing what I need it to do, even with data in $old_string that’s coming from a DATA block and includes embedded “\n” strings:

    if ($slurped_file =~ s/$old_string/$new_string/g ) {

    The two-character \n string in $old_string is still the two-character \n string when the regular expression in the substitution function is built, resulting in something like "s/Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdomain.com/For more information, contact Holly Smith./" (without the quotation marks).

    I did need to revise my DATA block a bit, to escape a couple of parentheses that otherwise would be treated as grouping operators in the regular expression (and I escaped the periods in order to avoid the inefficiency of the substitution operator treating them like wild cards):

    __DATA__ Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdo +main\.com\.~For more information, contact Holly Smith. For more information, contact Holly Smith at\nhollysmith@nosuchdomain\ +.com\.~For more information, contact Holly Smith. For more information, contact Holly Smith at \(800\) 555-1212 or\nvia +email to hollysmith@nosuchdomain\.com~For more information, contact H +olly Smith.

      A further elaboration is to use  \s+ in place of a literal space in your pattern strings (in either an array of strings or in __DATA__ records). So
          'Reach Holly Smith for help ...'
      might look like
          'Reach \s+ Holly \s+ Smith \s+ for \s+ help \s+ ...'
      Because the \s whitespace class includes \n, this has the advantage that a newline or any other combination of whitespace may appear anywhere in the target string and will be matched and replaced. E.g., the target string may be broken over any number of lines in the target text. Important Note: The s///x substitution must use the /x modifier to allow \s+ sub-patterns surrounded by spaces (for readability) to be sprinkled all over the place.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11129720]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-25 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found