P is for Practical | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
does anybody know if any LWP or similar implement DOMLWP::UserAgent does not provide DOM-level access. WWW::Mechanize doesn't either, but does parse the HTML for you in order to provide methods like links(), which incidentally, does what you want.
I have a hunch that DOM parsing is cleaner than X(HT)ML parsing"DOM" is not a manner of parsing, but a manner of access. For methods from the DOM to be able to access data from a tree of nodes, some "parser" code still has to build that tree. It's certainly cleaner to access data using DOM (or DOM-like) methods, or selector interfaces like XPath or XQuery. HTML::TreeBuilder::XPath builds an HTML::Tree internally and then provides XPath-like access to that tree.
(HTML parsing) sometimes not being well formed etcIf you're talking about the robustness of parsing HTML, there are many libraries that parse HTML properly even when given invalid input. It's quite orthogonal to how you access the data once you've parsed the document. -David In reply to Re^2: Regex to match first html tag previous to text
by erroneousBollock
|
|