Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Removing the carriage return in a Find & Replace?

by psini (Deacon)
on Sep 26, 2008 at 11:51 UTC ( [id://713861]=note: print w/replies, xml ) Need Help??


in reply to Removing the carriage return in a Find & Replace?

Per HTML standard, <CR>, <LF> and <space> are interchangeable separators in an HTML document. Moreover, a string of two or more separators is treated like a single separator.

So, if you want to catch <TD><FONT FACE=arial SIZE=-1> with a regex you should expect 0 or more separators wherever a separator is optional and 1 or more wherever it is required. That said, I think that /<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/ should be enough.

Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Replies are listed 'Best First'.
Re^2: Removing the carriage return in a Find & Replace?
by bobafifi (Beadle) on Sep 26, 2008 at 12:04 UTC
    Thanks for the quick reply psini!
    Using your suggestion, I just tried
    perl -i -pe 's/<TD>\s*<FONT FACE=arial SIZE=-1>/widget/g' * test.php
    unfortunately it didn't work.

    However, when I remove the carriage return in the html and run
    perl -i -pe 's/<TD><FONT FACE=arial SIZE=-1>/widget/g' * test.php
    no problem. Not sure why, but the s* doesn't seem to be recognized.

    Thanks again,
    Bob

      Because you've told perl to read the file a line at a time (well, more you haven't told it not to do otherwise and line is the default) so $_ will only contain <TD>\n and the next line will have <FONT ....>. At no point is the entire contents you expect to match in $_ simultaneously and in the right order so the match never happens and the substitution never triggers.

      See the documentation for the -0 switch in perlrun, specifically the part about turning on paragraph mode.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        The -p option you use splits the input in separate lines. For Perl \n isn't the same as a space even if it is for HTML. One solution is to undef $/ in order to enable ''slurp mode'', (or to use the before mentioned command line option -0):

        perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1> +/widget/g' test.html
        I just tried mscharrer variation on this and it worked!
        perl -i -pe 'BEGIN { undef $/ } s/<TD>\s*<FONT\s+FACE=arial\s+SIZE=-1>/widget/g' test.html

        Thanks so much!
        Bob

      Are you sure it is a CR and not some evil non-printable character used by MS?

      Try editing the file with a text editor (not a word processor!), delete the current newline character, insert a CR and try again. If it works, the problem is to find what is the newline character used in the file.

      Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

        I'm on a Mac using TextEdit in text mode (no MS) and Terminal to run Perl.
        Have you been able to get my example to work on your machine? Thanks,
        Bob

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://713861]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-18 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found