Thanks. My end goal was the replacement of several different strings that appear in an unknown number of text files in a particular directory. Some of the strings include a line feed. All of the strings would be replaced by a particular piece of text.
(Backstory: The text files are newsletters that are archived on the web, the text of which sometimes includes an address that's being picked up by spammers scraping the newsletters over the web. The goal is to scrub the email address, which appears in one of several different standard sentences, and replace it with generic contact information.)
Reach Holly Smith for help by sending an email
to hollysmith@nosuchdomain.com.
and
For more information, contact Holly Smith at
hollysmith@nosuchdomain.com.
and
For more information, contact Holly Smith at (800) 555-1212 or
via email to hollysmith@nosuchdomain.com
In each case, I want to replace those sentences (which have line breaks where you see them, e.g. after "sending an email" in the first one) with "For more information, contact Holly Smith." (without the quotation marks).
The reason I got balled up with the interpolation issue is that I first tried to use a DATA block in a small script, as in:
__DATA__
Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdo
+main.com.~For more information, contact Holly Smith.
For more information, contact Holly Smith at\nhollysmith@nosuchdomain.
+com.~For more information, contact Holly Smith.
For more information, contact Holly Smith at (800) 555-1212 or\nvia em
+ail to hollysmith@nosuchdomain.com~For more information, contact Holl
+y Smith.
(There are only three lines in the DATA block, even though they almost certainly will wrap on this web page.)
My script would read each of those lines in the DATA block, load variables $old_string and $new_string by splitting on the "~" character, and then do a
if ($slurped_file =~ s/\Q$old_string\E/$new_string/g ) {
kind of thing to make the replacement, ultimately resulting in updated text that would be used to replace the existing file.
And now you see why it didn't work :-) I had used the \Q in order to escape the domain name's period and the set of parentheses in one targeted string, but of course \Q wants to do what \Q does, so it also escapes the backslash in "\n" in my targeted strings. Hence the "\n" in my DATA block records didn't seem to "work." The substitutions never took place because the files don't have strings that include a literal \ followed by n.
Before I realized the \Q issue, though, I was convinced that something about the use of a __DATA__ section for the data had to be preventing the \n from being interpolated, and I thought I needed interpolation so that I could put, on a single line in the DATA block, an expression that would match the line breaks that occur in all three target strings. So I created the little test script shown in my original post in order to make sure \n would be the proper way to represent such a line feed. And I couldn't get the \n to turn into a line feed. I seem to have stumbled onto something that turns out to be unrelated to solving my problem!
Basically I failed to remember that merely putting a string into a variable doesn’t cause the string to be interpolated. Otherwise there would be trouble every time a graphic file is read from disk, if its content included the character sequence “\n”, for example. (I think.)
So this code is doing what I need it to do, even with data in $old_string that’s coming from a DATA block and includes embedded “\n” strings:
if ($slurped_file =~ s/$old_string/$new_string/g ) {
The two-character \n string in $old_string is still the two-character \n string when the regular expression in the substitution function is built, resulting in something like "s/Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdomain.com/For more information, contact Holly Smith./" (without the quotation marks).
I did need to revise my DATA block a bit, to escape a couple of parentheses that otherwise would be treated as grouping operators in the regular expression (and I escaped the periods in order to avoid the inefficiency of the substitution operator treating them like wild cards):
__DATA__
Reach Holly Smith for help by sending an email\nto hollysmith@nosuchdo
+main\.com\.~For more information, contact Holly Smith.
For more information, contact Holly Smith at\nhollysmith@nosuchdomain\
+.com\.~For more information, contact Holly Smith.
For more information, contact Holly Smith at \(800\) 555-1212 or\nvia
+email to hollysmith@nosuchdomain\.com~For more information, contact H
+olly Smith.
|