Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Scanning a html document....

by Anonymous Monk
on Jun 21, 2000 at 00:06 UTC ( [id://19093]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How is it possible to scan a html document , i want to create a script which checks for dead links and broken images etc... Any help would be appreciated, John

Replies are listed 'Best First'.
Re: Scanning a html document....
by btrott (Parson) on Jun 21, 2000 at 00:19 UTC
    And, in addition to plaid's answer, check out HTML::LinkExtor, which sounds like it might fit what you want very well. From the docs:
    HTML::LinkExtor is an HTML parser that extracts links from an HTML document. The HTML::LinkExtor is a subclass of HTML::Parser. This means that the document should be given to the parser by calling the $p->parse() or $p->parse_file() methods.
Re: Scanning a html document....
by plaid (Chaplain) on Jun 21, 2000 at 00:09 UTC
RE: Scanning a html document....
by mcwee (Pilgrim) on Jun 21, 2000 at 00:50 UTC
    If you want something homemade (but at least ready-made), you can use this little fella: My Little Time-Saving Spyder of mine. Just mod the regex to find links. Admittedly, you'll still have to put something together to check the list of links, but that's pretty easy using LWP.

    The Autonomic Pilot; it's FunkyTown, babe.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://19093]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 00:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found