Re: Problem timing out XML::LibXML parsing


P is for Practical
	PerlMonks

Re: Problem timing out XML::LibXML parsing

by mirod (Canon)

on Feb 03, 2009 at 20:01 UTC ( [id://741109]=note: print w/replies, xml )

Need Help??

in reply to Problem timing out XML::LibXML parsing

Can HTML::Parser deal with this code? And do you really need to use XML::LibXML? If the answers are yes and no you can use HTML::TreeBuilder (and HTML::TreeBuilder::XPath for very usefull XPath support. Or use XML::Twig, which uses HTML::TreeBuilder to wrestle XML out of the HTML.

Othersiwe you could use HTML::Tidy, or just plain tidy, to clean-up the HTML before using it.

IIRC, the I was looking for a way to convert HTML to XML, HTML::TreeBuilder seemed to be the most robust parser available in Perl.

Comment on Re: Problem timing out XML::LibXML parsing

Replies are listed 'Best First'.
Re^2: Problem timing out XML::LibXML parsing by samtregar (Abbot) on Feb 03, 2009 at 20:12 UTC
Thanks, but yes, I really want to use XML::LibXML. It's so much faster than HTML::TreeBuilder and speed is critical in my application. So far it's actually been pretty reliable - this problem only occurs in around 1 out of every 100,000 or so pages I've parsed. -sam	[reply]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://741109]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others contemplating the Monastery: (1)

chatterbot

As of 2024-04-19 00:33 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found