Updated: Looking for something like DBD::HTML::Table

talexb has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking into a couple of different solutions for a problem, and one solution involves using an imaginary module called DBD::HTML::Table to load up a web page containing a big table. It would be smart enough to look at the top row for the column names, and the first column for each row's index value.

I've just had a stroll through http://metacpan.org, and I didn't see anything like that. Is it cunningly hidden, or does it not exist at all?

Update: Thanks for all of your thoughtful replies. I had a look at the HTML that was being generated by our internal CGI, and found that it was really, really easy to just write a very simple parser. Each opening and closing tr was on a line by itself, and t[dh] elements were either on a line by themselves (open, element, close) or they were in an easily grabbable format (open, elements, and close each on their own separate lines).

I understand that my initial question was vague -- I was still working out what my solution might look like. I now have a much better idea of what the process is going to look like. Ideally, it's going to be something that will be as automated as possible. Sorry if this all sounds vague, it's work related, so I need to be a little circumspect about how I describe the problem. :)

Alex / talexb / Toronto

Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Comment on Updated: Looking for something like DBD::HTML::Table Select or Download Code

Replies are listed 'Best First'.
Re: Looking for something like DBD::HTML::Table by Tux (Canon) on Feb 27, 2021 at 09:12 UTC
AnyData::Format::HTMLtable claims to support DBI directly: `use DBI; my $dbh = DBI->connect ("dbi:AnyData:"); $dbh->func ("table1", "HTMLtable", $filename, "ad_catalog"); my $hits = $dbh->selectall_arrayref ("select name from foo where bar = + 42"); # ... other DBI/SQL operations` [download] I never used it, but it sounds more or less what you are looking for. Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: Looking for something like DBD::HTML::Table by marto (Cardinal) on Feb 26, 2021 at 21:24 UTC
What I've done is use a mojolicious backend, rendering a template which includes the JavaScript datatables (the data source being JSON delivered by Mojolicious) this both renders and scales well, and has the benefit of users being able to search within the results, sort by column etc. Perhaps this is along the lines of what you had in mind?	[reply]
Re: Looking for something like DBD::HTML::Table by no longer just digit (Beadle) on Feb 26, 2021 at 23:02 UTC
I've used this in the past (about 12 years ago): HTML::TableExtract	[reply]
Re^2: Looking for something like DBD::HTML::Table by talexb (Chancellor) on Feb 27, 2021 at 17:17 UTC
Oooh .. this looks like the very thing. Thanks! Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply]
Re: Looking for something like DBD::HTML::Table by Fletch (Bishop) on Feb 26, 2021 at 21:41 UTC
Question fuzzy, but random thought: maybe use Mojo::DOM (or whatever HTML parser you know) to scrape the table in question into a CSV format then pull things out with DBD::CSV behind DBI? The cake is a lie. The cake is a lie. The cake is a lie.	[reply]
Re: Looking for something like DBD::HTML::Table by erix (Prior) on Feb 27, 2021 at 08:37 UTC
I have no solution really. In my experience html is a bit too variable. I can't find anything DBI-ey (also looked in https://pgxn.org/ - no luck ). Of course, to get database-access you could slurp either html (via curl) or cleaned-up text (via links -dump) into a table but they'd just be 'raw' lines that you'd still have to select the correct table rows from. Still, for well recognizable/greppable rows it might work. And anyway, it is a reminder that postgresql's COPY knows how to read input from another program's STDOUT. `create table temp_slurps (line text); copy temp_slurps ( line ) from program 'links -dump -width 512 ${url}' +; select * from temp_slurps ; -- where ...` [download] As they say, YMMV. I'm sure if you write a postgres extension (for pgxn.org) to extract-'read' html-tables from source it will be popular ;)	[reply] [d/l]
Re: Looking for something like DBD::HTML::Table by LanX (Saint) on Feb 26, 2021 at 21:31 UTC
It's not clear to me if you just want to extract data from a HTML page or use a HTML-file as a database (i.e. with write updates) If the latter is the case, you might be looking for XML based solutions. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks