Re: Large scale search and replace with perl -i

First question: how many is a "large number"? If it's on the order of 10^4 or less, you will probably spend more time fiddling with a script than it would take to do with a more "brute-force" approach. (Given reasonably fast computer, yadda yadda yadda.)

As for the more theoretical question, you would certainly want to use the second approach (with find -exec grep -l foo) to reduce your working file set as much as possible.

Then your next issue is avoiding the overhead of running multiple perls. The -i switch relies on the magic of <>, which is @ARGV if there are command-line arguments, and STDIN if there are not (paraphrasing slightly). However, what you need to do in this case is use both kinds of magic, so your perl will have to be a little more creative. It's harder to do the shuffle that -i does than to read from STDIN manually, so here's one way to try it:

find . -name "*.html" -type f -exec grep -l foo {} \; | perl -pi -e 'B
+EGIN{ @ARGV = <STDIN>; chomp @ARGV };  while (<>) { s/foo/bar/g; } co
+ntinue { print }'
[download]

Notice that you can fiddle with @ARGV before the <> magic takes place. The internals of the script are basically what the -p option does.

---
"I hate it when I think myself into a corner."
Matt Mitchell

Comment on Re: Large scale search and replace with perl -i Download Code

Replies are listed 'Best First'.
Re^2: Large scale search and replace with perl -i (don't grep(1)) by Aristotle (Chancellor) on Apr 14, 2003 at 20:52 UTC
you would certainly want to use the second approach (with `find -exec grep -l foo`) to reduce your working file set as much as possible. You would certainly not, because you will have to open all files anyway - even if just to check. The difference is that `grep`ping for matches first will make you spawn one process per file as well as require to open the matching files another time (in Perl) to actually process them. You have a (large) net loss that way. Taking that out, and using the `-print0` option to avoid some nasty surprises (but not all, unfortunately, due to the darn magic open) leaves us with the following. Note I have removed the `continue {}` block as it isn't necessary and just costs time. ~~I'm also setting the record separator such that the diamond operator reads fixed size blocks (64kbytes in this example), rather than scanning for some end of line character.~~ `find . -name ".html" -type f -print0 \| \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \ while (<>) { s/foo/bar/g; print }'` [download] That should be about as efficient as it gets. If you have a lot* of nonmatching files, you might save work by hooking a `grep` in there - but not with `find`'s `-exec`. That's what `xargs` was invented for. `find . -name ".html" -type f -print0 \| \ xargs -r0 grep -l0 \| \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = "\n" }; \ while (<>) { s/foo/bar/g; print }'` [download] Update:* `s/= \\65536!= "\\n"/;` as per runrig's observation. Makeshifts last the longest.	[reply] [d/l] [select]
Re: Re^2: Large scale search and replace with perl -i by runrig (Abbot) on Apr 14, 2003 at 21:01 UTC
`find . -name "*.html" -type f -print0 \| perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = \65536 }; \ while (<>) { s/foo/bar/g; print }'` [download] You don't want to do that. If 'foo' spans across one of those read blocks, then you'll miss the substitution.	[reply] [d/l]
Re^4: Large scale search and replace with perl -i (doh!) by Aristotle (Chancellor) on Apr 14, 2003 at 21:07 UTC
Duh.. I can't believe I didn't think of that. Makeshifts last the longest.	[reply]


Perl-Sensitive Sunglasses
	PerlMonks