|Problems? Is your data what you think it is?|
I was doing some refactoring: The original script combined a parser module that created a big datastructure, and the na bunch of backend modules that dumped out the data in various formats. I restructured this into multiple scripts: The "parser" script would parse the input files, and then dump the datastructure to disk using Storable::nstore_fd. The backend scripts would then read this (using fd_retrieve) and then do their backend stuff. This new architecture was generally considered to be an improvement.
But then I started getting bug reports from cygwin users. The backend scripts were dying with "Out of memory during ridiculously large request at ../../lib/Storable.pm ..." errors. This errors appear even then I set HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\heap_chunk_in_mb=1024 to tell cygwin to allow the process to use a full gigabyte of memory.
I did some profiling (a binary search "limit vmemoryuse ..."). On linux I determined the memory consumption to be 43 MBytes. Hardly excessive (the file dumped by Storable is 10M). Replacing Storable with the original parser code reduced it to 42MB ... and this fixed the cygwin issues. A different dataset goes from 39M to 38M, which refutes a suggestion that that the problem is that we are right on the edge of some limit.
So my question is, are there any known issues with the way that Storable works on Cygwin that would cause this excessive memory use? I'm considering switching to Sqlite for the intermediate file, but I'd like to understand the problem before doing the work.
And another question, are there any good memory profiling tools for perl? I use -d:DProf for speed profiling, but for these types of issues I need to know where my memory is being used.
Opinions my own; statements of fact may be in error.