Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Problem timing out XML::LibXML parsing

by mirod (Canon)
on Feb 03, 2009 at 20:01 UTC ( [id://741109]=note: print w/replies, xml ) Need Help??


in reply to Problem timing out XML::LibXML parsing

Can HTML::Parser deal with this code? And do you really need to use XML::LibXML? If the answers are yes and no you can use HTML::TreeBuilder (and HTML::TreeBuilder::XPath for very usefull XPath support. Or use XML::Twig, which uses HTML::TreeBuilder to wrestle XML out of the HTML.

Othersiwe you could use HTML::Tidy, or just plain tidy, to clean-up the HTML before using it.

IIRC, the I was looking for a way to convert HTML to XML, HTML::TreeBuilder seemed to be the most robust parser available in Perl.

  • Comment on Re: Problem timing out XML::LibXML parsing

Replies are listed 'Best First'.
Re^2: Problem timing out XML::LibXML parsing
by samtregar (Abbot) on Feb 03, 2009 at 20:12 UTC
    Thanks, but yes, I really want to use XML::LibXML. It's so much faster than HTML::TreeBuilder and speed is critical in my application. So far it's actually been pretty reliable - this problem only occurs in around 1 out of every 100,000 or so pages I've parsed.

    -sam

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://741109]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2024-04-19 00:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found