Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Perl cleanup takes a long time

by ChrisR (Hermit)
on Jun 13, 2007 at 14:51 UTC ( [id://620988]=perlquestion: print w/replies, xml ) Need Help??

ChrisR has asked for the wisdom of the Perl Monks concerning the following question:

I have a very small program, around 130 lines, that is really memory intensive. It takes only a few minutes to run but takes over 6 minutes to exit.

Here's the rub: The program creates a few minor hashes and arrays and 3 major hashes from MySQL data.
Hash #1: about 64,000 keys each having 3 subkeys (total: 192000)
Hash #2: 2.2 million keys each having 4 subkeys (total: 8.8 million)
Hash #3: 2 keys each of which has 2 subkeys each of which has 2 subkeys each of which has about 10 subkeys each of which has about 15 subkeys each of which has about 40 subkeys (total: 96000)

The data in each value is a small integer or a short string. I know that the structures are rather large but I am compiling alot of historical data for use in a detailed yield analysis of a production facility.

Again, It takes only a few minutes to run but takes over 6 minutes to exit. I am running it on a Xeon 2.8GHz (dual core) with 1GB RAM. Is the slow exit to be expected or is there something I am doing wrong or can do better?
#!/usr/bin/perl use strict; use DBI; if($ARGV[0] != int($ARGV[0]) || $ARGV[0] eq '' || $ARGV[1] != int($ARG +V[1]) || $ARGV[1] eq '') { print "Usage: ./bo_prod_vifan.pl year month \n\n"; exit; } my $year = $ARGV[0]; my $month = $ARGV[1]; my $startdate="$year-$month-01"; my %groupings = (); $groupings{CDN}{G}{code} = 'TN'; $groupings{CDN}{G}{descr} = 'PRIMARY SLITTING'; $groupings{CDN}{G}{order} = 20; $groupings{USA}{P}{code} = 'TN'; $groupings{USA}{P}{descr} = 'PRIMARY SLITTING'; $groupings{USA}{P}{order} = 20; my %site_ext = (); $site_ext{USA} = "EXT"; $site_ext{CDN} = "BR"; my $dbh = DBI->connect('DBI:mysql:vifan','webuser') or die "Couldn't o +pen database: ". DBI->errstr . "\n"; my $statementC = "SELECT TRIM(masterroll), qty, UCASE(site), family FR +OM consume "; my $sthC = $dbh->prepare($statementC); my $rcC = $sthC->execute(); my $refC = $sthC->fetchall_arrayref; my %Cdata = (); for my $x(0..$#{$refC}) { $Cdata{$refC->[$x][0]}{family} = $refC->[$x][3]; $Cdata{$refC->[$x][0]}{site} = U$refC->[$x][2]; } $statementC = "SELECT TRIM(masterroll), qty, UCASE(site), family FROM +consume WHERE YEAR(transdate)=? AND MONTH(transdate)=?"; $sthC = $dbh->prepare($statementC); $rcC = $sthC->execute($year,$month); $refC = $sthC->fetchall_arrayref; for my $x(0..$#{$refC}) { $Cdata{$refC->[$x][0]}{qty} += $refC->[$x][1]; $Cdata{familyqty}{$refC->[$x][2]}{$refC->[$x][3]} += $refC->[$x][1 +]; } my %label =(); my $statementR1 = "SELECT label, wc, master_1, slit_1 FROM rolls "; my $sthR1 = $dbh->prepare($statementR1); my $rcR1 = $sthR1->execute(); my $refR1 = $sthR1->fetchall_arrayref; for my $x(0..$#{$refR1}) { $label{$refR1->[$x][0]}{wcout} = $refR1->[$x][1]; } my $statementR = "SELECT net_wgt, grade, wc, TRIM(master_1), master_2, + master_3, master_4, master_5, UCASE(prod_site), YEAR(prod_date), MON +TH(prod_date), TRIM(label), slit_1 FROM rolls WHERE YEAR(prod_date)=? + AND MONTH(prod_date)=? ORDER BY label"; my $sthR = $dbh->prepare($statementR); my $rcR = $sthR->execute($year,$month); my $refR = $sthR->fetchall_arrayref; for my $x(0..$#{$refR}) { $label{$refR->[$x][11]}{mr} = $refR->[$x][3]; if($refR->[$x][12] == 0) { $label{$refR->[$x][11]}{wcin} = $site_ext{$Cdata{$refR->[$x][3 +]}{site}}; } else { $label{$refR->[$x][11]}{wcin} = $label{$refR->[$x][12]}{wcout} +; } } my %Rdata = (); for my $x(0..$#{$refR}) { my $choice = $refR->[$x][1]; if($choice == 4 ){$choice = "scrap";} elsif($choice == 1){$choice = "first"} else{$choice = "second";} # prodsite extsite + year month wcin + wcout family grade $Rdata{$refR->[$x][8]}{$site_ext{$Cdata{$refR->[$x][3]}{site}}}{$r +efR->[$x][9]}{$refR->[$x][10]}{$label{$refR->[$x][11]}{wcin}}{$label{ +$refR->[$x][11]}{wcout}}{$Cdata{$refR->[$x][3]}{family}}{$choice} += +$refR->[$x][0]; } $dbh->disconnect; open(FILE,">/home/web/vibacgroup_info/data_it/bo_prod_vifan.txt"); print FILE "Division;Grouping code;Year;Month;Site (Plant);Extrusion W +ork Center;Input Work Center;Output Work Center;Product code;Budget 1 +st Choice Hour Capacity;Budget Input Qty;Actual Input Qty;Prev Year/M +onth Input Qty;Budget Input Repro Qty;Actual Input Repro Qty;Prev Yea +r/Month Input Repro Qty;Budget 1st choice Qty;Actual 1st choice Qty;P +rev Year/Month 1st choice Qty;Budget 2nd choice Qty;Actual 2nd choice + Qty;Prev Year/Month 2nd choice Qty;Budget Input Qty Compared;Actual +Input Qty Compared;Prev Year/Month Input Qty Compared;Budget 1st choi +ce Qty Compared;Actual 1st choice Qty Compared;Prev Year/Month 1st ch +oice Qty Compared;Budget 2nd choice Qty Compared;Actual 2nd choice Qt +y Compared;Prev Year/Month 2nd choice Qty Compared;File creation date +;1st Choice compared product hour capacity;Compared product code;Prod +uct type;Grouping order;Grouping description;Budget Dispersion Qty;Ac +tual Dispersion Qty;Prev Year/Month Dispersion Qty;IND_CIG_POLO;IND_C +IG_TOT;IND_CIG_ESTR\r\n"; my $date = DateStamp(); for my $site(keys %Rdata) { for my $extsite(keys %{$Rdata{$site}}) { for my $year(keys %{$Rdata{$site}{$extsite}}) { for my $month(keys %{$Rdata{$site}{$extsite}{$year}}) { for my $wcin(keys %{$Rdata{$site}{$extsite}{$year}{$mo +nth}}) { for my $wcout(keys %{$Rdata{$site}{$extsite}{$year +}{$month}{$wcin}}) { for my $family(keys %{$Rdata{$site}{$extsite}{ +$year}{$month}{$wcin}{$wcout}}) { my $disp = $Cdata{familyqty}{$site}{$famil +y} - $Rdata{$site}{$extsite}{$year}{$month}{$wcin}{$wcout}{$family}{f +irst} - $Rdata{$site}{$extsite}{$year}{$month}{$wcin}{$wcout}{$family +}{second} - $Rdata{$site}{$extsite}{$year}{$month}{$wcin}{$wcout}{$fa +mily}{scrap}; print FILE "1;$groupings{$site}{$wcout}{co +de};$year;$month;$site;$extsite;$wcin;$wcout;$family;;;$Cdata{familyq +ty}{$site}{$family};;;;;;$Rdata{$site}{$extsite}{$year}{$month}{$wcin +}{$wcout}{$family}{first};;;$Rdata{$site}{$extsite}{$year}{$month}{$w +cin}{$wcout}{$family}{second};;;;;;;;;;;$date;;;BF;$groupings{$site}{ +$wcout}{order};$groupings{$site}{$wcout}{descr};;$disp;;;;\r\n"; } } } } } } } close(FILE); exit; sub DateStamp { my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localti +me; $year = $year + 1900; $mon++; if($mon < 10){ $mon = "0$mon"; } if($mday < 10){ $mday = "0$mday"; } if($hour < 10){ $hour = "0$hour"; } if($min < 10){ $min = "0$min"; } if($sec < 10){ $sec = "0$sec"; } my $timestamp = "$year-$mon-$mday"; return $timestamp; }

Thanks,
Chris Rogers
www.pcewebs.com

Replies are listed 'Best First'.
Re: Perl cleanup takes a long time
by Joost (Canon) on Jun 13, 2007 at 15:08 UTC
    AFAIK the reason it's slow at cleanup is because the garbage collector is pretty slow when you have lots (over a couple of million) of objects.

    (I'm talking about the collector that's only run at exit time, not the reference counting collector)

    If you want to skip the exit-collector entirely, take a look at POSIX's _exit() function and be careful.

      Thanks Joost. I had read that in the Camel book but was a little scared to do it. It just seemed dangerous. I am not using any threads in this program so if I use POSIX's _exit(), will the memory be released made available to the system? Or have I just created another problem?
      Chris Rogers
      www.pcewebs.com
        All the resources that normally get taken back, should still be taken back by the system. The issues with _exit() AFAIK are in user-space: exit handling code that doesn't get run. From the perl view, END {} blocks will not be run, file handles won't be flushed to the kernel, DESTROY methods won't be called etc.

        You'll probably be fine if you make sure to close all file & database handles before calling _exit().

        update: see also this _exit(2) manpage.

Re: Perl cleanup takes a long time
by perrin (Chancellor) on Jun 13, 2007 at 16:19 UTC
    I wonder if it takes so long to close because you're swapping. Are you out of memory?
      Perrin brings up a very good point.

      I quickly ran a test that creates 3 hashes of the type and depth that you mentioned. Here's the ps command output:

      PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 22842 pts/1 R+ 0:00 0 10 6941 1680 0.1 perl xxx.pl create all the hashes PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 22842 pts/1 S+ 0:23 2 10 932137 883884 85.5 perl xxx.pl clean out hashes PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 22842 pts/1 R+ 0:30 31 10 932137 885632 85.6 perl xxx.pl we're done

      Memory requirements of your program are likely to be at least 933 MBytes this is without DBI etc.!! Since you mentioned that your computer has 1GByte internal memory on board the sheer size of the hashes definitely will require the operating system to swap memory pages.
      The Perl garbage collection is not the cause of this problem, even while it does require some time. The operating system takes time to memory swap things in good shape again.
      So you may want to add a bit more internal memory to your system to run an application like this one ;-)

      Test-program source:

Re: Perl cleanup takes a long time
by TOD (Friar) on Jun 13, 2007 at 15:10 UTC
    can you specify what you mean by "exit"? maybe a silly question, but the reason why i'm asking is that slurping in the data from the database and grouping them really won't take much time. but the ... 7 nested for loops by which you write the data to the text file, will probably really not be the fastest approach.

    besides, doesn't SQL offer lots of means for the task you are performing? and for MySQL, if you /Tee your results to an outfile...?

    --------------------------------
    masses are the opiate for religion.
      By "exit", I mean the exit command (the last line executed in the script). Slurping the data and looping for the file output is actually very fast (given that it may not be the fastest) considering the numbers of records I am looking at. And yes, you are correct in that MySQL has many ways of completing this task however it is not nearly as fast as perl.
      Chris Rogers
      www.pcewebs.com
Re: Perl cleanup takes a long time
by benno (Novice) on Jun 16, 2007 at 15:36 UTC
    I've built applications with similar memory requirements. Generally there is no reason to store all this in RAM unless you are performing complex matrix operations from the top to the bottom. I would refactor the code and so that it processes one chunk/one line/one customer (whatever) at a time and writes to a file. Better still, if you can, generalise the transformation into code units, you can then string them together with pipes and let the OS worry about disk and RAM management.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://620988]
Approved by marto
Front-paged by naikonta
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2024-04-23 19:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found