Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Re: Verifying external web links

by Fastolfe (Vicar)
on Dec 06, 2001 at 01:03 UTC ( [id://129755]=note: print w/replies, xml ) Need Help??


in reply to Re: Verifying external web links
in thread Verifying external web links

Whether or not you retry should depend on the nature of the failure. If you get back a 400- or 500-series response, you should generally stop there, since the server has pretty much stated, "No way, no how." A possible exception to this would be a 408 (timeout) response and arguably 500, since it's possible the error is temporary.

If, on the other hand, (like you discuss), the request fails due to a connection problem (connection refused, timed out, no route to host), I might wait a bit (hours? days?) and try again.

Replies are listed 'Best First'.
Re: Re: Re: Verifying external web links
by Masem (Monsignor) on Dec 06, 2001 at 01:24 UTC
    I'd argue that 404 should be rechecked too, though most likely, any site that starts off with a 404 error will end up off the list, more so than 408s, 500s, or connection problems. Sometimes, if you've linked 'deep' into a site (anywhere off the front page, or in a user's account), the server's storage might be switched around, and in a short time frame, you might get 404s, but outside, the page would be accessible normally. There's other reasons that I can think of as well, which are not unlikely but are uncommon, that I'd check pages repeatedly regardless of error.

    That said, it certainly would not be too hard with such a tool to report in a log file why links were removed, allowing for the person to chase down those that might be recoverable (404s commonly), as opposed to those that are probably lost for good (no connection over serveral attempts).

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://129755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-18 21:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found