Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
you would certainly want to use the second approach (with find -exec grep -l foo) to reduce your working file set as much as possible.

You would certainly not, because you will have to open all files anyway - even if just to check. The difference is that grepping for matches first will make you spawn one process per file as well as require to open the matching files another time (in Perl) to actually process them. You have a (large) net loss that way.

Taking that out, and using the -print0 option to avoid some nasty surprises (but not all, unfortunately, due to the darn magic open) leaves us with the following. Note I have removed the continue {} block as it isn't necessary and just costs time. I'm also setting the record separator such that the diamond operator reads fixed size blocks (64kbytes in this example), rather than scanning for some end of line character.

find . -name "*.html" -type f -print0 | \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \ while (<>) { s/foo/bar/g; print }'

That should be about as efficient as it gets.

If you have a lot of nonmatching files, you might save work by hooking a grep in there - but not with find's -exec. That's what xargs was invented for.

find . -name "*.html" -type f -print0 | \ xargs -r0 grep -l0 | \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \ while (<>) { s/foo/bar/g; print }'
Update: s/= \\65536!= "\\n"/; as per runrig's observation.

Makeshifts last the longest.


In reply to Re^2: Large scale search and replace with perl -i (don't grep(1)) by Aristotle
in thread Large scale search and replace with perl -i by elbie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-03-29 07:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found