Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

search and replace question

by Anonymous Monk
on Mar 11, 2010 at 07:48 UTC ( [id://827970]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am cleaning up a bunch of data and am putting together a series of perl commands to do the job.

The data to work on looks something like this:

fjfjf fjfjf fjfjf <?Pub fontsize="10.00pt"?> ghg gghg fgf gf gjgjg gjgjg gjgjg <?Pub fontsize="9.00pt"?> gggg dfd dfdf dfd <?Pub fontsize= "19.00pt"?> fgf fgfg gfff fgf fggf ghg ghgghhg ghghg

I need to simply remove the '<?Pub XXXX ?>' strings.

I am using a command like this:

find ./MMData -iname '*.dita*' -print|xargs perl -pi -e 's/<\?Pub fontsize\="0.50pt"\?\>/<\!\-\-Rem Pub Dir\-\-\>/g';

I've tried wildcard'ing the search string but cannot get it to work - I always seem to end up greedily taking up a lot of the rest of the file

Also - I cannot get my search to find these "search strings" that are split over a line - I guess if I can get the wildcard working I could put a linefeed in there ?

Also - now that I'm asking (!!!), how do I make this command NOT produce a BAK file ?

Replies are listed 'Best First'.
Re: search and replace question
by almut (Canon) on Mar 11, 2010 at 08:31 UTC

    For one, I think you'd want

    's/<\?Pub fontsize="[^"]*"\?>/<!--Rem Pub Dir-->/g' ^^^^^

    i.e. match anything but a double quote between the double quotes that hold the fontsize. (As the fontsize value is unlikely to contain an escaped double quote itself, this simplified approach should work reasonably well.)

    The other issue with the wrapped lines not matching is simply because you're processing the file line by line (option -p), so you never have more than one line in $_ to match against (so /s won't help).  Solving this in a "-p"-one-liner is rather involved... so you might want to check first whether you could maybe read the entire file into a single string instead.

Re: search and replace question
by Hue-Bond (Priest) on Mar 11, 2010 at 08:21 UTC
    I've tried wildcard'ing the search string but cannot get it to work - I always seem to end up greedily taking up a lot of the rest of the file

    Wildcards are greedy by default, which means that they'll take as much as they can, as you have already seen. Try appending a question mark to the quantifier, to make it non-greedy, ie use abc.*? instead of abc.*.

    Also - I cannot get my search to find these "search strings" that are split over a line - I guess if I can get the wildcard working I could put a linefeed in there ?

    Use the /s modifier in the regex.

    --
     David Serrano
     (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling error. Thank you!).

Re: search and replace question
by jwkrahn (Abbot) on Mar 11, 2010 at 10:57 UTC

    You probably want something like:

    perl -i -0777pe's/<\?Pub\s+fontsize\s*=\s*"\d+\.\d+pt"\?>/<!--Rem Pub +Dir-->/g'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://827970]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-19 21:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found