Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Re: Table.pm: Extract text from html tables

by zzspectrez (Hermit)
on Jan 19, 2001 at 05:58 UTC ( [id://52908]=note: print w/replies, xml ) Need Help??


in reply to Re: Table.pm: Extract text from html tables
in thread Table.pm: Extract text from html tables

I did do a search on search.cpan.org first. I agree with you that it is better not to reinvent the wheel if you dont have to. Because not only are you wasting time but the established code will probably be more efficient or at the least better debugged.

However, I dont think HTML::Table applies well in this situation because it is for creating tables. I just want to get the data.

I did install HTML::TableExtract before attempting it myself. However, it did not seem to work well for my needs. The author states that it was designed in the mind of selecting table data based off table headers. In my case the site I am accessing doesnt utilize text headers in its tables at all. This module also allows selecting data by using Depth and Count.

From the pod.

Depth and Count are more specific ways to specify tables in relation to one another. Depth represents how deeply a table resides in other tables. The depth of a top-level table in the document is 0. A table within a top-level table has a depth of 1, and so on. Each depth can be thought of as a layer; tables sharing the same depth are on the same layer. Within each of these layers, Count represents the order in which a table was seen at that depth, starting with 0. Providing both a depth and a count will uniquely specify a table within a document.

This seems confusing to me when you have a document such as that I am accesing that has multiple top level tables with many sub tables beneath them.

My solution allows me to access the table data just as by accesing the table data through a multideminsional array. Just count each <table> tag untill you are in the table that contains the data you want then note the row and column from that table and then accessing as $table->[table_number][row][column]. Seem much easier and in my opinion a better tool for my perticular situation. Of course HTML::TableExtract is a much more robust way to handle tables and better for situation where you can select the tables using headers instead of hard coding to the page layout.

If you disagree with this, I would be interested your reasons why. I respect your opinion, as a known perl wizard!

Thanks!
zzSPECTREz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://52908]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 20:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found