Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Extensive memory usage

by TheMarty (Acolyte)
on Oct 04, 2004 at 08:17 UTC ( [id://396145]=perlquestion: print w/replies, xml ) Need Help??

TheMarty has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've got huge program. It's used to process even more huge amounts of ascii data. The program exists in two version - as a script for UNIX, and executable ( ActiveState PDK ;-) ) for Windows. Everythink runs well, except memory usage, which gets bigger and bigger. I've got 2GB RAm on my PC - so it works somehow. For UNIX Workstation the run ends with "out of memory" error. Program consits of modules which open the file, process it and save it back. This happens 10 to 20 times (average) for simngle run. Each file is also written double (requirement of data processing). Example: two files, each 10MB. My program runs, and at the end of the run -> Windows Task Manager shows 1.7GB(sic!) of memory usage. I use "my" statement where it's possible - however some data structures must be global. Thequestion is: what can I do? How can I track the things which make problem? Greetings TheMarty

Replies are listed 'Best First'.
Re: Extensive memory usage
by BrowserUk (Patriarch) on Oct 04, 2004 at 10:01 UTC

    You could take a look at Devel::Size, Devel::Size::Report, Devel::Peek, Devel::FindGlobals and, if you can build your own copy of Perl for the target platforms, Devel::LeakTrace.

    That said, to be able to generate a 1.7GB footprint from a script that processes a couple of 10 MB files requires that you are either building a lot of duplicate datastructures (hashes or arrays)--this is quite easy to do with several of the Graph::* and Set::* type modules; many of which are quite profligate with memory.

    This is especially true if your building Graphs that result in self referencing trees and the like and don't have code to explicitely break circular references as these will prevent garbage collection.

    The other way to chew up large volumes of ram unnecessarially, is to pass lots of bigs lists around between subroutines, rather than array references. The classic example is something like:

    This consumes 390MB to process 10 MB of data:

    Whereas this does the same processing and in 170 MB by using references and side-effects to avoiding duplicating large lists

    And this does the same using less than 2 MB by avoiding building large lists in the first place:

    perl -nlwe" m[s*(\S+)\s*] and print $1*2" data\1millionlines.dat

    It's a contrived example, but it illustrates the points.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Extensive memory usage
by Corion (Patriarch) on Oct 04, 2004 at 08:30 UTC

    Using my to tell Perl when you don't need a variable anymore is a good start, but Perl uses reference counting for memory management, so you might run into circular references like the following:

    my $bar = {}; my $foo = { bar => \$bar }; $bar->{foo} = \$foo;

    Here, $bar points to $foo and reverse, so there is no way that Perl will ever release the two variables back until the process exits.

    There are a number of solutions to circumvent the problem. The easiest way is to split up your program into several programs, which all process a small amount of the data and then exit, thus cleaning up the memory.

    There is also weaken in Scalar::Util, which creates a weak reference, which will not prevent the target from being collected. To know which references to weaken is not easy though.

    You can also try to analyze your code and find out where circular references are, and manually break them, by setting in the example $foo->{bar} = undef.

    A second thing might be that your program is simply creating too complex memory structures - for example, I can well imagine an 10 MB XML file eating up lots of memory - so you might want to consider changing your data structures or changing from a DOM XML parser to an event based parser. Such restructuring is not easy though.

Re: Extensive memory usage
by Sandy (Curate) on Oct 04, 2004 at 14:04 UTC
    I also had some very weird memory leaking behaviour on a UNIX box, that was never successfully solved. Change in memory consumption when using debugger

    One especially weird aspect was that if I ran my program in the debugger, the memory leaks disappeared.

    But, it only happened on Solaris 2.7 and not 2.8. It seems there the system memory management (by Solaris) was flawed, and it was fixed for 2.8. Sorry, I don't remember the details.

    Maybe this is the problem??

    Good Luck

    Sandy

    UPDATE: Another related node Managing System Memory Resources

Re: Extensive memory usage
by foss_city (Novice) on Oct 04, 2004 at 09:00 UTC
    How much disk space do you have available to you? Would it be possible to split each file into a number of temporary files, then process each of those files in turn, then concatenate them when all of the temp files have been processed?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://396145]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-16 19:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found