Re^3: Trivial HTML extractor utility


Keep It Simple, Stupid
	PerlMonks

Re^3: Trivial HTML extractor utility

by eserte (Deacon)

on Nov 22, 2007 at 20:53 UTC ( [id://652448]=note: print w/replies, xml )

Need Help??

in reply to Re^2: Trivial HTML extractor utility
in thread Trivial HTML extractor utility

If you used HTML::TreeBuilder::XPath it would be even more powerful.
Not for me; I don't know how to write an xpath expression.
You should really give it a try, it's one of the few fine things coming from the XML world. I once wrote a utility called xmlgrep, which uses XPath expressions for extracting things from HTML or XML files. For extracting links one would write:

GET http://www.perlmonks.org | xmlgrep -parse-html '//a/@href'
[download]

but you can also add additional conditions, for example extract only absolute links:

GET http://www.perlmonks.org | xmlgrep -parse-html '//a/@href[contains
+(.,"http://")]'
[download]

Comment on Re^3: Trivial HTML extractor utility Select or Download Code

In Section Meditations

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://652448]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others surveying the Monastery: (4)

As of 2024-04-19 05:11 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found