catsophie has asked for the wisdom of the Perl Monks concerning the following question:
In my code, I read information from MySQL database, grab a web page (using LWP) from the Internet, analyze the content, store something to the database, and grab the next page. This operation may repeat many times.
But I found that the memory was continuously consumed and program crashed finally due to "out of memory". Although memory should be released after each web page is analyzed.
What I have done is to delete all the arrays after they are useless. But the problem is still there.
Could anybody give me some suggestions about the possible reasons? The code is long so I won't want to post it here. Do I have to release LWP objects too?
Thanks!!
Re: Strange memory leak question. Please help!
by Fletch (Bishop) on Sep 21, 2007 at 16:02 UTC
|
Without the code, you're pretty much going to get generalities. Having said that though, the most likely one that comes to mind given the context is that you might be using HTML::TreeBuilder to do the analysis. It uses circular references which won't get correctly garbage collected unless you call the delete method on the instance.
| [reply] [d/l] |
Re: Strange memory leak question. Please help!
by moritz (Cardinal) on Sep 21, 2007 at 16:11 UTC
|
There could be circular references that the garbage collector can't reclaim. Maybe Devel::Cycle can help you finding them.
| [reply] |
Re: Strange memory leak question. Please help!
by artist (Parson) on Sep 21, 2007 at 17:09 UTC
|
| [reply] [d/l] |
Re: Strange memory leak question. Please help!
by graff (Chancellor) on Sep 22, 2007 at 01:04 UTC
|
I ran into a similar, seemingly unavoidable problem with memory consumption when I was facing a huge number of Excel files, and decided to use Spreadsheet::ParseExcel to normalize/condense/combine the data from all of them. For each new Excel file that I opened, read, processed and closed, the module just kept taking up more memory, instead of re-using the space that was allocated for a previous file.
I decided to do a work-around, whereby I would process files until some reliable event occurred (e.g. changing directory, because there were never too many files in a single folder), write a "checkpoint" file to indicate how far I had gotten in the overall list, and exit. On start-up, the script would read the checkpoint file to figure out which directory to do next. Then it was just a matter of putting the script in a shell loop, running it enough times to cover the whole set.
In your case:
- Does the database provide info that you need in order to decide which web pages to get? If not, segregate the LWP/HTML::Parser part from the MySQL part -- those two parts don't need to be in the same script. The page-fetch script could just output a tab-delimited text file, which could be loaded to the database via LOAD DATA INFILE.
- If the page fetch does depend on stuff being fetched from the database, you should still separate the LWP and html parsing to a separate process that just does one page at a time, and run this as a child of the MySQL process at each iteration. In this case, a script that takes a url as a command-line arg, and prints string data suitable for mysql insertion to its STDOUT, could be run via back-ticks or via open( PROC, "-|", $script_name, $url );
Either way, most of your trouble comes from trying to do too much in one huge monolithic script. Break it down into simpler components -- that's likely to improve performance in a lot of ways, and will make it easier to maintain; it's a win-win approach. | [reply] [d/l] |
|
Thank all for quick helps. Bellow is my report on the question.
talexb, I suspect my Perl program consumed my memory by using 'free -m' to look at the free memory. When I ran the Perl program, free memory decreased very fast and did not release after the Perl program stopped.
Fletch, you got the point. I forgot to delete the tree. Since I called HTML::TreeBuilder many times, that caused a serious memory wastage. After I deleted the tree, the memory leaking was almost solved.
When I say 'almost', I mean there is still very slow memory leaking, like 1M bytes several minutes. graff is right, the trouble comes from my large script (1305 lines :P). I should break the script into smaller components.
I didn't try Devel::Cycle and Test::Memory::Cycle, since I did not have complex reference structures.
| [reply] |
|
| [reply] |
|
Re: Strange memory leak question. Please help!
by talexb (Chancellor) on Sep 21, 2007 at 17:20 UTC
|
What evidence do you have to support the hypothesis that your Perl program is causing the memory leak? Is your program running as a daemon? Can you disable or mockup portions of your program and see if the memory leak persists or goes away?
I almost never worry about undefing variables to free them up -- I just allow them to fall out of scope, and Perl does the rest.
Alex / talexb / Toronto
"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds
| [reply] [d/l] |
Re: Strange memory leak question. Please help!
by perlfan (Vicar) on Sep 21, 2007 at 19:56 UTC
|
The only time that I was bitten by a memory leak in Perl was when constructing a recursive function out of a an anonymous sub reference, which is/was addressed by Sub::Recursive.
Every other time I ran into something like this, it was a mistake or code I wrote that was worse than usual.
The best thing to do is to create the smallest, simplest snippet of code that demonstrates the leak. This would serve as your "evidence". | [reply] |
|
|