Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Strange memory leak question. Please help!

by catsophie (Initiate)
on Sep 21, 2007 at 15:38 UTC ( [id://640389]=perlquestion: print w/replies, xml ) Need Help??

catsophie has asked for the wisdom of the Perl Monks concerning the following question:

In my code, I read information from MySQL database, grab a web page (using LWP) from the Internet, analyze the content, store something to the database, and grab the next page. This operation may repeat many times.

But I found that the memory was continuously consumed and program crashed finally due to "out of memory". Although memory should be released after each web page is analyzed.

What I have done is to delete all the arrays after they are useless. But the problem is still there.

Could anybody give me some suggestions about the possible reasons? The code is long so I won't want to post it here. Do I have to release LWP objects too?

Thanks!!
  • Comment on Strange memory leak question. Please help!

Replies are listed 'Best First'.
Re: Strange memory leak question. Please help!
by Fletch (Bishop) on Sep 21, 2007 at 16:02 UTC

    Without the code, you're pretty much going to get generalities. Having said that though, the most likely one that comes to mind given the context is that you might be using HTML::TreeBuilder to do the analysis. It uses circular references which won't get correctly garbage collected unless you call the delete method on the instance.

Re: Strange memory leak question. Please help!
by moritz (Cardinal) on Sep 21, 2007 at 16:11 UTC
    There could be circular references that the garbage collector can't reclaim. Maybe Devel::Cycle can help you finding them.
Re: Strange memory leak question. Please help!
by artist (Parson) on Sep 21, 2007 at 17:09 UTC
Re: Strange memory leak question. Please help!
by graff (Chancellor) on Sep 22, 2007 at 01:04 UTC
    I ran into a similar, seemingly unavoidable problem with memory consumption when I was facing a huge number of Excel files, and decided to use Spreadsheet::ParseExcel to normalize/condense/combine the data from all of them. For each new Excel file that I opened, read, processed and closed, the module just kept taking up more memory, instead of re-using the space that was allocated for a previous file.

    I decided to do a work-around, whereby I would process files until some reliable event occurred (e.g. changing directory, because there were never too many files in a single folder), write a "checkpoint" file to indicate how far I had gotten in the overall list, and exit. On start-up, the script would read the checkpoint file to figure out which directory to do next. Then it was just a matter of putting the script in a shell loop, running it enough times to cover the whole set.

    In your case:

    • Does the database provide info that you need in order to decide which web pages to get? If not, segregate the LWP/HTML::Parser part from the MySQL part -- those two parts don't need to be in the same script. The page-fetch script could just output a tab-delimited text file, which could be loaded to the database via LOAD DATA INFILE.

    • If the page fetch does depend on stuff being fetched from the database, you should still separate the LWP and html parsing to a separate process that just does one page at a time, and run this as a child of the MySQL process at each iteration. In this case, a script that takes a url as a command-line arg, and prints string data suitable for mysql insertion to its STDOUT, could be run via back-ticks or via  open( PROC, "-|", $script_name, $url );

    Either way, most of your trouble comes from trying to do too much in one huge monolithic script. Break it down into simpler components -- that's likely to improve performance in a lot of ways, and will make it easier to maintain; it's a win-win approach.

      Thank all for quick helps. Bellow is my report on the question.

      talexb, I suspect my Perl program consumed my memory by using 'free -m' to look at the free memory. When I ran the Perl program, free memory decreased very fast and did not release after the Perl program stopped.

      Fletch, you got the point. I forgot to delete the tree. Since I called HTML::TreeBuilder many times, that caused a serious memory wastage. After I deleted the tree, the memory leaking was almost solved.

      When I say 'almost', I mean there is still very slow memory leaking, like 1M bytes several minutes. graff is right, the trouble comes from my large script (1305 lines :P). I should break the script into smaller components.

      I didn't try Devel::Cycle and Test::Memory::Cycle, since I did not have complex reference structures.

        That is not a good way to detect a memory leak. You are looking at the total free memory on the system, which could go up or down due to pretty much anything happening on that box. It only worked for you because the leak was so large.

        Far better would be to find the pid of your process and run ps l on it periodically. Look at the VSZ column. If it never changes, then you don't have a leak.

Re: Strange memory leak question. Please help!
by talexb (Chancellor) on Sep 21, 2007 at 17:20 UTC

    What evidence do you have to support the hypothesis that your Perl program is causing the memory leak? Is your program running as a daemon? Can you disable or mockup portions of your program and see if the memory leak persists or goes away?

    I almost never worry about undefing variables to free them up -- I just allow them to fall out of scope, and Perl does the rest.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Strange memory leak question. Please help!
by perlfan (Vicar) on Sep 21, 2007 at 19:56 UTC
    The only time that I was bitten by a memory leak in Perl was when constructing a recursive function out of a an anonymous sub reference, which is/was addressed by Sub::Recursive.

    Every other time I ran into something like this, it was a mistake or code I wrote that was worse than usual.

    The best thing to do is to create the smallest, simplest snippet of code that demonstrates the leak. This would serve as your "evidence".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://640389]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-20 04:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found