Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^3: Batch remove "404 Not Found" URLs

by hippo (Chancellor)
on Oct 27, 2017 at 08:41 UTC ( #1202129=note: print w/replies, xml ) Need Help??

in reply to Re^2: Batch remove "404 Not Found" URLs
in thread Batch remove URLs

Assuming that you just want to get the job done and are not pursuing this as an academic exercise, I would abandon the one-liner approach. It can be done that way, but the more you throw into it the messier it gets. Here's one plan:

  1. Store your 300 URLs in a file, one per line (if you haven't already done so). You can then slurp this into an array at the start of your script.
  2. Loop over the files with a simple glob
  3. Inside that loop over all the URLs
  4. Inside the inner loop, call a subroutine with the filename and the URL to replace

You can now test the inner subroutine in isolation on a test file to your heart's content to get it perfectly right without destroying the initial content. Consider quotemeta for the search terms. If you get stuck with that approach, come back with specific questions, ideally as an SSCCE. Good luck.

  • Comment on Re^3: Batch remove "404 Not Found" URLs

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1202129]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2021-04-20 11:05 GMT
Find Nodes?
    Voting Booth?

    No recent polls found