Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

HTML::TableExtract Memory Usage

by corpx (Acolyte)
on Sep 02, 2010 at 19:47 UTC ( [id://858625]=perlquestion: print w/replies, xml ) Need Help??

corpx has asked for the wisdom of the Perl Monks concerning the following question:

Here's my code
#!/usr/bin/perl use WWW::Mechanize; use HTML::TableExtract qw(tree);; use strict; start(); sub start { my $te = HTML::TableExtract -> new( ); my $agent = WWW::Mechanize->new( stack_depth => 0); $agent-> agent_alias( 'Windows IE 6' ); $agent->get('http://www.perlmonks.org/?'); my $page = $agent->content; for (my $c =0 ; $c < 500; $c++) { $te->parse ($page); print "c is $c\n"; } } exit();
Every time it executes $te->parse($page), the memory used by the program increases. I know HTML::Tree has delete() to clear up the memory after using it, but how can I free up the memory used by TableExtract?

Replies are listed 'Best First'.
Re: HTML::TableExtract Memory Usage
by Corion (Patriarch) on Sep 02, 2010 at 19:51 UTC

    It seems that other people also experienced this, as this bug report on RT tells. No immediate solution is suggested, but maybe you can see how the HTML::TreeBuilder deletion should be done better.

      That sucks. I've been pouring over it for a while, but no luck yet.
Re: HTML::TableExtract Memory Usage
by Anonymous Monk on Sep 03, 2010 at 00:37 UTC
    Try $te->tree->delete; or
    sub HTML::TableExtract::Table::DESTROY { eval { $_[0]->tree->delete; $_[0]->tree( undef ); }; return; }
    or
    my $te = HTML::TableExtract -> new( ); $te->parse ($page); $te->eof; print "$te c is $c\n"; eval { for( $te->tables ){ $_->tree->delete ; $_->tree( undef ); } $te->tree->delete; $te->tree( undef ); undef $te; 1; } or warn $@;
    it might work
      Just tried all 3 of the above scenarios, but the memory keeps ballooning :/
        I was afraid of that. There are too many circular references in that module, ie
        sub _reset_state { my $self = shift; $self->{_cdepth} = -1; $self->{_tablestack} = []; $self->{_tables} = {}; $self->{_ts_sequential} = []; $self->{_counts} = []; $self->{_in_a_table} = 0; } ... grid => [], translation => [], hrow => [], order => [], children => [],
        Would have to break all those circular references in DESTROY, recursively
Re: HTML::TableExtract Memory Usage
by Anonymous Monk on Sep 04, 2010 at 14:30 UTC
    This leaks practically nothing on my machine it starts at 17mb/14mb VM, and by the 500th iteration its at 18mb/16mb.

    My guess is this very tiny leak is probably in HTML::Parser.

Re: HTML::TableExtract Memory Usage
by Anonymous Monk on Sep 04, 2010 at 07:03 UTC
    Its related to HTML::ElementTable, it leaks memory
    perl -MHTML::ElementTable -e " for(;;){ my $c = HTML::ElementTable->ne +w; warn $c; undef $c; } "
      This also leaks
      perl -MHTML::ElementTable -e " for(;;){ my $c = HTML::ElementTable->ne +w; warn $c; $c->delete; undef $c; } "

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://858625]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-25 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found