Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How do I "reset" HTML::TableExtract?

by Cody Pendant (Prior)
on Sep 18, 2006 at 10:51 UTC ( [id://573522]=perlquestion: print w/replies, xml ) Need Help??

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

Every time I invoke HTML::TableExtract's parse() method, it doesn't re-initialise the object, it appends to the object. See example below.

I want to use it on a new table every time (iterating through a paged website with a scraper), and it doesn't make sense to keep the previous table.

My workaround is to just re-initialise the object with new(), but that feels wrong. I've read through the POD for TableExtract and and I'm baffled. There doesn't seem to be a preference for this behaviour and there doesn't seem to be a method to re-initialise the object either in TableExtract or HTML::Parser.

use strict; use warnings; use diagnostics; use HTML::TableExtract; my $table_1 = ' <table><tr><td>foo</td><td>bar</td></tr> <tr><td>baz</td><td>quux</td></tr></table>'; my $table_2 = ' <table><tr><td>bof</td><td>xyzzy</td></tr> <tr><td>bat</td><td>gazonk</td></tr></table>'; my $te = HTML::TableExtract->new(); $te->parse($table_1); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } } ## what goes here if I want to dump table_1 ? $te->parse($table_2); foreach my $ts ($te->tables) { print "Table (", join(',', $ts->coords), "):\n"; foreach my $row ($ts->rows) { print join(',', @$row), "\n"; } }


($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: How do I "reset" HTML::TableExtract?
by Thelonius (Priest) on Sep 18, 2006 at 14:05 UTC
    Try $te->eof
Re: How do I "reset" HTML::TableExtract?
by greatshots (Pilgrim) on Sep 18, 2006 at 11:23 UTC
      by design
Re: How do I "reset" HTML::TableExtract?
by mojotoad (Monsignor) on Sep 19, 2006 at 19:40 UTC
    There is a private method, _reset_state(), that does what you want.

    Having said that, creating a new HTML::TableExtract object each time through is not adding any significant overhead relative to the parsing load.

    Cheers,
    Matt

Re: How do I "reset" HTML::TableExtract?
by jpeg (Chaplain) on Sep 20, 2006 at 00:34 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://573522]
Approved by Velaki
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-25 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found