Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: WWW::Mechanize memory leak???

by ViceRaid (Chaplain)
on Jan 07, 2004 at 17:41 UTC ( [id://319534]=note: print w/replies, xml ) Need Help??


in reply to WWW::Mechanize memory leak???

Afternoon

Yeah, I get the same results (perl, v5.8.0 built for i386-linux-thread-multi; WWW::Mechanize 0.70). Like Roy Johnson suggested above, it's because WWW::Mechanize keeps a list of HTTP results in a page stack. Whenever it starts to get a new page, it stores the last response it received in an array.

If this is a problem for you - for example, if you've got a long running process and it's getting too fat - you should create a subclass of WWW::Mechanize that keeps a limit on the size of the page stack, perhaps by redefining the _push_page_stack method:

package WWW::Mechanize::KeepSlim; our @ISA = qw/WWW::Mechanize/; sub _push_page_stack { my $self = shift; if ( $self->{res} ) { my $save_stack = $self->{page_stack}; $self->{page_stack} = []; push( @$save_stack, $self->clone ); # HERE! - stop the stack getting bigger than 10 if ( @$save_stack > 10 ) { shift(@$save_stack); } $self->{page_stack} = $save_stack; } return 1; } package main; my $agent = WWW::Mechanize::KeepSlim->new(); # ....

If you use this class with your example that demonstrates the problem, you should see the memory usage increase arithmetically for the first 10 requests, then stop increasing.

cheers
ViceRaid

Replies are listed 'Best First'.
Re: Re: WWW::Mechanize memory leak???
by pg (Canon) on Jan 07, 2004 at 20:28 UTC

    It might be more useful to remove element base on its age. As the frequency you accessing web is not the same through out the day, and when you are busy surfing, you don't want the history get cleaned up quicker than the time you are mainly idle.

    You may modify the structure of $self->{page_stack} a little bit, so that a time stamp is kept, and only those ones older than a certain age will get deleted.

    However, as you are subclassing, it is probably a better idea to keep $self->{page_stack} as it is, and add a new array $self->{page_time_stamp}. The elements of those two arrays match 1-on-1.

    The performance for deleting would be good, as the ones need to be deleted always stay together at the beginning of the array.

Re^2: WWW::Mechanize memory leak???
by Anonymous Monk on May 09, 2018 at 07:36 UTC
    For future reference: You can set the max. stack depth with $mech->stack_depth(100) now:
    =head2 $mech->stack_depth( $max_depth ) Get or set the page stack depth. Use this if you're doing a lot of page scraping and running out of memory. A value of 0 means "no history at all." By default, the max stack depth is humongously large, effectively keeping all history.
Re: Re: WWW::Mechanize memory leak???
by Anonymous Monk on Jan 08, 2004 at 15:52 UTC
    Hi, ViceRaid,

    Thank you very much for your explaination and code. It worked. I really appreciate.

    CW

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://319534]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-25 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found