Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Given that most html files are usually (hopefully) < 1 MB in size, it would make sense to use Aristotle's technique of changing $/, but set it to null and slurp the whole file each time.

find . -name "*.html" -type f -print0 | \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; chomp @ARGV; $/ = '' }; \ while (<>) { s/foo/bar/g; print }'

If the number of files produced by find is too many for your command line to handle, couldn't you produce a list of directories from find and pass that into perl and then let perl glob those? Something like (NB:completely untested code)

find . -type d -print0 | \ perl -i -p0e \ 'BEGIN{ @ARGV = <STDIN>; \ chomp @ARGV; \ @ARGV = map{glob "$_/*.html"}; \ $/ = '' }; \ while (<>) { s/foo/bar/g; print }'

Combining that with Merlyn's trick of backing out the -i effect if nothing is found should save more time.


Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

In reply to Re: Large scale search and replace with perl -i by BrowserUk
in thread Large scale search and replace with perl -i by elbie

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (None)
    As of 2024-04-19 00:04 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found