Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Parsing a large html with perl

by sundialsvc4 (Abbot)
on Jun 05, 2020 at 16:31 UTC ( #11117726=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parsing a large html with perl
in thread Parsing a large html with perl

How interesting – sounds like prescient advice to me, if the HTML actually qualifies as XHTML.

Very obviously, the OP should be using any one of the several "HTML Parsers" that are readily available here, in order to be handed the particular strings that need to be further processed – as it were, "on a silver platter."   Regular expressions applied against the HTML string would be a monumental waste of effort that wouldn't produce results nearly so good.

But, "XPath is even better, if it works," because this strategy is non-procedural.   If it works, then it means that you do not have to write programming that is tied to the structure of the parent document ... and which therefore would no longer work if when that changes.   If it turns out that it applies in this case, XPath can be a spectacular time-and-effort saver.

What you really would like to avoid – and what XPath is very much engineered to let you avoid – is programming that is specific to the exact structure of the XML.   Such logic is not only fragile ... but unable to realize that it is now producing incomplete answers.   (Having said that: "XPath, also, is not a panacea.")

Replies are listed 'Best First'.
Re^4: Parsing a large html with perl
by chromatic (Archbishop) on Jul 15, 2020 at 01:26 UTC
    What you really would like to avoid – and what XPath is very much engineered to let you avoid – is programming that is specific to the exact structure of the XML.

    What are you talking about? What do you think XPath is and how do you think it works?

        I have a smaller but similar experience with said person.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11117726]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2020-09-26 15:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I don’t succeed, I …










    Results (141 votes). Check out past polls.

    Notices?