Re: Removing the carriage return in a Find & Replace?

Per HTML standard, <CR>, <LF> and <space> are interchangeable separators in an HTML document. Moreover, a string of two or more separators is treated like a single separator.

So, if you want to catch <TD><FONT FACE=arial SIZE=-1> with a regex you should expect 0 or more separators wherever a separator is optional and 1 or more wherever it is required. That said, I think that /<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/ should be enough.

Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Comment on Re: Removing the carriage return in a Find & Replace? Select or Download Code

Replies are listed 'Best First'.
Re^2: Removing the carriage return in a Find & Replace? by bobafifi (Beadle) on Sep 26, 2008 at 12:04 UTC
Thanks for the quick reply psini! Using your suggestion, I just tried `perl -i -pe 's/<TD>\s<FONT FACE=arial SIZE=-1>/widget/g' test.php` unfortunately it didn't work. However, when I remove the carriage return in the html and run `perl -i -pe 's/<TD><FONT FACE=arial SIZE=-1>/widget/g' * test.php` no problem. Not sure why, but the s* doesn't seem to be recognized. Thanks again, Bob	[reply] [d/l] [select]
Re^3: Removing the carriage return in a Find & Replace? by Fletch (Bishop) on Sep 26, 2008 at 12:13 UTC
Because you've told perl to read the file a line at a time (well, more you haven't told it not to do otherwise and line is the default) so `$_` will only contain `<TD>\n` and the next line will have `<FONT ....>`. At no point is the entire contents you expect to match in `$_` simultaneously and in the right order so the match never happens and the substitution never triggers. See the documentation for the `-0` switch in perlrun, specifically the part about turning on paragraph mode. The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l] [select]
Re^4: Removing the carriage return in a Find & Replace? by mscharrer (Hermit) on Sep 26, 2008 at 13:51 UTC
The `-p` option you use splits the input in separate lines. For Perl \n isn't the same as a space even if it is for HTML. One solution is to undef `$/` in order to enable ''slurp mode'', (or to use the before mentioned command line option `-0`): `perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1> +/widget/g' test.html` [download]	[reply] [d/l] [select]
Re^5: Removing the carriage return in a Find & Replace? by bobafifi (Beadle) on Sep 30, 2008 at 16:03 UTC
Re^4: Removing the carriage return in a Find & Replace? by bobafifi (Beadle) on Sep 26, 2008 at 15:54 UTC
I just tried mscharrer variation on this and it worked! `perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/widget/g' test.html` Thanks so much! Bob	[reply] [d/l]
Re^3: Removing the carriage return in a Find & Replace? by psini (Deacon) on Sep 26, 2008 at 12:17 UTC
Are you sure it is a CR and not some evil non-printable character used by MS? Try editing the file with a text editor (not a word processor!), delete the current newline character, insert a CR and try again. If it works, the problem is to find what is the newline character used in the file. Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."	[reply]
Re^4: Removing the carriage return in a Find & Replace? by bobafifi (Beadle) on Sep 26, 2008 at 12:24 UTC
I'm on a Mac using TextEdit in text mode (no MS) and Terminal to run Perl. Have you been able to get my example to work on your machine? Thanks, Bob	[reply]
Re^5: Removing the carriage return in a Find & Replace? by broomduster (Priest) on Sep 26, 2008 at 13:07 UTC


more useful options
	PerlMonks