http://qs321.pair.com?node_id=660371

wilsonch has asked for the wisdom of the Perl Monks concerning the following question:

I am currently working on a project which requires reading up 30 Million records from MS SQL by calling a stored proc and process. The following script works for smaller runs, but die without any error/warnings from Perl at about 2.5 Million.

The stored proc has been tested from MS SQL client and it finishes successfully (it takes like 6 hours). I am just wandering if there is something wrong from the following code or is there any parameters in DBI that I should set to make Perl wait for the whole set of records indefinately?

my $cdr_list = {}; eval { my $query = "EXEC prc_getUsage $run_id, 'CDR', '$start_account +', '$end_account'"; print "\t\tpreparing query [$query]\n"; my $sth = $utils->{DBH}->prepare($query); print "\t\texecuting query\n"; $sth->execute(); my $i =1; print "\t\tfetching records\n"; while (my $record = $sth->fetchrow_hashref()) { my $msisdn = $record->{MSISDN}; #print "\t\tfetchrow [$msisdn] [$i]\n"; $i++; unless (defined $cdr_list->{$msisdn}) { $cdr_list->{$msisdn} = []; } push @{$cdr_list->{$msisdn}}, $record; } $sth->finish(); }; print "\t\tCompleted selecting CDRs\n"; if ($@) { # Try to check for connection error if ($@ =~ /(SQL\-28000)|(SQL\-08S01)/) { $utils->log_error("$@\n", $MAXIS_ERROR_DB_CONNECT); $utils->reconnect(); goto RETRY; } else { $utils->log_error("$@\n", $MAXIS_ERROR_SQL); print "Unhandled Exception [$@]\n"; return undef; } }

Replies are listed 'Best First'.
Re: Script die without warning while getting result from DBI
by mpeppler (Vicar) on Jan 04, 2008 at 08:32 UTC
    I'm with perrin on this one - loading 30 million rows into memory seems guaranteed to fail, even if you have 8GB of RAM (after all, 8GB means about 266 bytes for each row if you have 30 million rows, and that's discounting all other RAM usage on the machine!)

    I would strongly suggest that you review your algorithm to avoid fetching the whole result set in one go.

    Michael

Re: Script die without warning while getting result from DBI
by perrin (Chancellor) on Jan 04, 2008 at 05:38 UTC
    When you say it dies, what do you mean exactly? My guess is that you're running out of memory because you're trying to load the entire result set into RAM.
      When I say die, the script just terminate and return to the command prompt as if nothing had happen. I have put debug after the piece of code and it never get there.

      I have had memory issue with this piece of software before, but it usually accompanied by a windows error message saying perl.exe encounter some problem or something.

      PS. the server have 8GB of RAM

        I'd suggest you monitor how much memory it uses while running to see if that is the problem.
Re: Script die without warning while getting result from DBI
by cdarke (Prior) on Jan 04, 2008 at 09:41 UTC
    As others said, don't store the result set in memory.

    RAM is not going to help necessarily. Windows is a Virtual Memory operating system, it uses pagefile.sys as virtual memory when RAM is over-subscribed. On a 32-bit machine Windows only allows ~ 2GB virtual memory to a user process (there are hacks around that - but don't go there).

    Take a look in the Event Log, system and application logs, there may be something in there. However it will almost certainly be some sort of out-of-memory error.
Re: Script die without warning while getting result from DBI
by graff (Chancellor) on Jan 05, 2008 at 02:09 UTC
    Slightly off-topic, but: where are $run_id, $start_account and $end_account coming from, and should you perhaps be using placeholders in the query statement, and passing those variables to the $sth->execute call?

    More on-topic: the question becomes "what do you need to do with these millions of rows? wouldn't it be possible to just handle each row as it comes and be done with it before fetching the next row? Rather than pushing all the rows into a single massive HoA structure, do what needs to be done with each row and forget about keeping the row data in a structure.

    If you think there's some reason why all the rows need to be in a single structure, there's bound to be a way to restructure the process so that you only have to deal with a limited set of rows in memory at any one time. Apart from that, your other choice is to use one of the DBM modules to tie your hash structure to a disk file. This seems kind of weird, because it seems like you end up replicating your MSSQL database in a DBM file. But if it gets the job done... (see AnyDBM_File and the various flavors of DBM modules cited there).

    Finally, a slightly irrelevant nit-pick: you don't need to do this:

    unless (defined $cdr_list->{$msisdn}) { $cdr_list->{$msisdn} = []; }
    It turns out that your next line of code knows how to take care of the details for creating an array-ref as the hash value when necessary, without further ado:
    push @{$cdr_list->{$msisdn}}, $record; # autovivifies $cdr_list->{$msisdn} as an array ref f +or each new key