Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

How well will your solution work when the layout of the page changes subtly? I'm not pretending that the solution I've given here is bullet proof, but it's a lot more flexible than yours is.

The point is that HTML parsers understand HTML. It's easier to write a solution when you use the right tool for the job. If you look at my solution, the code is very easy to follow - find all the table rows in the HTML, then find one where the text starts with the required IP address, then extract all of the text from that row. I didn't need to go into the detail of the HTML in the same way that you did.

Yes, it's possible to extract useful data from HTML using regular expressions (the most excellent book Perl & LWP is full of them) but that can only ever be a "use once", quick and dirty hack.

Oh, and a final comment on your terminology. What we're all doing in this problem is parsing. Data extraction is parsing by any meaningful definition of the term.

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg


In reply to Re: Being a heretic and going against the party line. by davorg
in thread Being a heretic and going against the party line. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-19 06:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found