Scanning a html document....

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Scanning a html document.... by btrott (Parson) on Jun 21, 2000 at 00:19 UTC
And, in addition to plaid's answer, check out HTML::LinkExtor, which sounds like it might fit what you want very well. From the docs: `HTML::LinkExtor is an HTML parser that extracts links from an HTML document. The HTML::LinkExtor is a subclass of HTML::Parser. This means that the document should be given to the parser by calling the $p->parse() or $p->parse_file() methods.` [download]	[reply] [d/l]
Re: Scanning a html document.... by plaid (Chaplain) on Jun 21, 2000 at 00:09 UTC
Check out HTML::Parser and/or HTML::TokeParser.	[reply]
RE: Scanning a html document.... by mcwee (Pilgrim) on Jun 21, 2000 at 00:50 UTC
If you want something homemade (but at least ready-made), you can use this little fella: My Little Time-Saving Spyder of mine. Just mod the regex to find links. Admittedly, you'll still have to put something together to check the list of links, but that's pretty easy using LWP. The Autonomic Pilot; it's FunkyTown, babe.	[reply]


laziness, impatience, and hubris
	PerlMonks