Out of memory and While replacements with excel XLSX application

Weinstar has asked for the wisdom of the Perl Monks concerning the following question:

Hello. For the following I feel I should lead with a quick disclaimer. The code is my first attempt at a Perl script and a non-html script in general, so it might read like the first attempt at a script - however the script technically works.

Furthermore I don't deal with coding in my daily life and studies and therefore do not know if the code conforms to all existing standards - I ran it through perl::tidy, used strict and warnings (no return) but that is about it - please advice if it is unreadable in some way.

Anyhow, The problem I am facing is one of memory - the script returns "out of memory". I believe it is related to http://learn.perl.org/faq/perlfaq3.html#How-can-I-make-my-Perl-program-take-less-memory- but cannot figure out how to implement without changing the results of the run.

The case is as follows: I need to go over 20 seperate excel files (.xls) and grap values based on certain patterns and then copy them into new .XLSX sheets. The total amount of rows is close to 1 million.

The script worked fine when I tried to write to .xls but it hit the limit of available rows. So i changed it to ::XLSX and now it runs out of memory. The strange thing is that it does not seem to be in the writing of excel part that the memory expires, but around the time it is done with grabing values, or maybe when I copy one of the returned array at @righ = @right (done because the original is destroyed in a splice). The system I work on has 8GB of ram, the error happens at around a total memory use of 4GB of ram, but I can see from test on a single sheet, that the final part (when all values are in the arrays) takes up a lot of memory. I have a feeling that this memory is excessive and can be done in better way. The script is on around 2GB of memory at the time of error. four of the eight gb were installed after the original run to see if this helped.

My question is - how do I solve this? I understand that there may be room for loads of improvements and I will gladly listen to every kind of response, but my main concern goes to the apparent memory shortage. I have a feeling I am copying the data into memory several times, but I do not know how to avoid this.

If I have overlooked an already existing answer a reference would be appreciated and an apology returned, but please understand that any deviation from my case will quickly complicate things for me

 
#!C:\strawberry\perl\bin\perl

####Load Packages
use strict;
use warnings;
use Win32::OLE qw(in with);
use Win32::OLE::Const 'Microsoft Excel';
use Win32::OLE::Variant;
use Win32::OLE::NLS qw(:LOCALE :DATE);
use List::MoreUtils qw(natatime);
use Excel::Writer::XLSX;

############################################################

####Initialize Excel Object and Count Files to Process
$Win32::OLE::Warn = 3;    #die on errors...

#get already active Excel application or open new
my $Excel = Win32::OLE->GetActiveObject('Excel.Application')
  || Win32::OLE->new( 'Excel.Application', 'Quit' );

#Get the file names of the production files and prints a status of the
+ results
my $ProdDirect = 'C:/Users/McLovin/Documents/Thesis/Data/';
opendir DH, $ProdDirect or die "Cannot open $ProdDirect: $!";
my @files = grep { !-d } readdir DH;
print "@files\n";
closedir DH;
my $NumberofFiles = @files;
print "Number of files is: $NumberofFiles\n";

######################################################################
+##########

#################################################################
################Start Processing the Data########################
#################################################################
my $counti;  #used to count the excelfiles
my $usefile; #The file that is being used
my @recID;   #an array of ID's for the worksheets
my $ID;      #The actual ID being pushed to the array
my $val;     #A test value that is used to test whether a row should b
+e included
my @right
  ; #The first of the arrays that will be filled during the process an
+d will be further disected later
my @prod;     #The second array, same as above
my @legal;    #The third

####declare a lot of variables used below when getting information fro
+m the excel sheets
my $c1;
my $c4;
my $c5;
my $c6;
my $c7;
my $c8;
my $c9;
my $c10;
my $c11;
my $c12;
my $c13;
my $c14;
my $c15;
my $c17;
my $c18;
my $c20;
my $c21;
my $c22;
my $c23;
my $c24;
my $c25;
my $c26;
my $c27;
my $c32;
my $c33;
my $c34;
my $c35;
my $c36;
my $c37;
my $c38;
my $c39;

#####Loop over all the files
for ( $counti = 0 ; $counti < $NumberofFiles ; $counti++ ) {

##### open Excel file -- This is the input data files
    print "Working on file $counti of $NumberofFiles\n";
    $usefile = $files[$counti];
###This points to the folder where the excel files are placed
    my $workfile = "C:/Users/McLovin/Documents/Thesis/Datatal/" . "$us
+efile";
    print "$workfile\n";
    my $Book = $Excel->Workbooks->Open($workfile);

#####Work on each sheet in the workbook
#####This is the number of worksheets in the workbook
    my $sheetcnt = $Book->Worksheets->Count();
    foreach my $r (1) {
        my $Sheet = $Book->Worksheets($r);
        $ID = $Sheet->{Name};
        push @recID, $ID;
        print "Worksheet name is $Sheet->{Name}\n";
        ##work on each row and column
        my $Tot_Rows = $Sheet->UsedRange->Rows->{'Count'};
        my $Tot_Cols = $Sheet->UsedRange->Columns->{'Count'};

#Extract the necessary information from the rows and columns and place
+s them in three arrays @right @legal and @prod
        foreach my $row ( 1 .. $Tot_Rows ) {
            foreach my $col (1) {

                # skip empty cells
                next unless defined $Sheet->Cells( $row, $col )->{'Val
+ue'};
                $val = $Sheet->Cells( $row, $col )->{'Value'};

                if ( $val =~ /^\d{10}$/ ) {

                    #Gets column one information
                    foreach my $col1 (1) {
                        next
                          unless
                          defined $Sheet->Cells( $row, $col1 )->{'Valu
+e'};
                        $c1 = $Sheet->Cells( $row, $col1 )->{'Value'};
                        push @right, $c1;
                        push @legal, $c1;
                        push @prod,  $c1;

                        #Gets region from column 2
                        foreach my $col4 (4) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col4 )->{'
+Value'};
                            $c4 = $Sheet->Cells( $row, $col4 )->{'Valu
+e'};
                            push @right, $c4;
                            push @legal, $c4;
                            push @prod,  $c4;
                        }

                        #Gets column 3 information
                        foreach my $col5 (5) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col5 )->{'
+Value'};
                            $c5 = $Sheet->Cells( $row, $col5 )->{'Valu
+e'};
                            push @right, $c5;
                            push @legal, $c5;
                            push @prod,  $c5;
                        }

                        #Get the description that needs to be processe
+d later
                        foreach my $col6 (6) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col6 )->{'
+Value'};
                            $c6 = $Sheet->Cells( $row, $col6 )->{'Valu
+e'};
                            push @legal, $c6;
                        }

                        #Gets area
                        foreach my $col7 (7) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col7 )->{'
+Value'};
                            $c7 = $Sheet->Cells( $row, $col7 )->{'Valu
+e'};
                            push @right, $c7;

                        }

                        #Gets column 8 information
                        foreach my $col8 (8) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col8 )->{'
+Value'};
                            $c8 = $Sheet->Cells( $row, $col8 )->{'Valu
+e'};
                            push @right, $c8;
                        }

                        #Gets column 9
                        foreach my $col9 (9) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col9 )->{'
+Value'};
                            $c9 = $Sheet->Cells( $row, $col9 )->{'Valu
+e'};
                            push @right, $c9;
                        }

                        #and so on....
                        foreach my $col10 (10) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col10 )->{
+'Value'};
                            $c10 = $Sheet->Cells( $row, $col10 )->{'Va
+lue'};
                            push @right, $c10;
                            push @legal, $c10;
                        }

                        #same
                        foreach my $col11 (11) {
                            next
                              unless
                              defined $Sheet->Cells( $row, $col11 )->{
+'Value'};
                            $c11 = $Sheet->Cells( $row, $col11 )->{'Va
+lue'};
                            push @right, $c11;
                            push @legal, $c11;
                        }

                        #same
                        foreach my $col12 (12) {
                            $c12 = $Sheet->Cells( $row, $col12 )->{'Va
+lue'};
                            push @right, $c12;
                            push @prod,  $c12;
                        }

                        #same
                        foreach my $col13 (13) {
                            $c13 = $Sheet->Cells( $row, $col13 )->{'Va
+lue'};
                            push @prod, $c13;
                        }

                        #same
                        foreach my $col14 (14) {
                            $c14 = $Sheet->Cells( $row, $col14 )->{'Va
+lue'};
                            push @prod, $c14;
                        }

                        #same
                        foreach my $col15 (15) {
                            $c15 = $Sheet->Cells( $row, $col15 )->{'Va
+lue'};
                            push @prod, $c15;
                        }

                        #same
                        foreach my $col17 (17) {
                            $c17 = $Sheet->Cells( $row, $col17 )->{'Va
+lue'};
                            push @prod, $c17;
                        }

                        #same
                        foreach my $col18 (18) {
                            $c18 = $Sheet->Cells( $row, $col18 )->{'Va
+lue'};
                            push @prod, $c18;
                        }

                        #same
                        foreach my $col20 (20) {
                            $c20 = $Sheet->Cells( $row, $col20 )->{'Va
+lue'};
                            push @prod, $c20;
                        }

                        #same
                        foreach my $col21 (21) {
                            $c21 = $Sheet->Cells( $row, $col21 )->{'Va
+lue'};
                            push @prod, $c21;
                        }

                        #same
                        foreach my $col22 (22) {
                            $c22 = $Sheet->Cells( $row, $col22 )->{'Va
+lue'};
                            push @prod, $c22;
                        }

                        #same
                        foreach my $col23 (23) {
                            $c23 = $Sheet->Cells( $row, $col23 )->{'Va
+lue'};
                            push @prod, $c23;
                        }

                        #same
                        foreach my $col24 (24) {
                            $c24 = $Sheet->Cells( $row, $col24 )->{'Va
+lue'};
                            push @prod, $c24;
                        }

                        #same
                        foreach my $col25 (25) {
                            $c25 = $Sheet->Cells( $row, $col25 )->{'Va
+lue'};
                            push @prod, $c25;
                        }

                        #same
                        foreach my $col26 (26) {
                            $c26 = $Sheet->Cells( $row, $col26 )->{'Va
+lue'};
                            push @prod, $c26;
                        }

                        #same
                        foreach my $col27 (27) {
                            $c27 = $Sheet->Cells( $row, $col27 )->{'Va
+lue'};
                            push @prod, $c27;
                        }

                        #32-39 production data
                        foreach my $col32 (32) {
                            $c32 = $Sheet->Cells( $row, $col32 )->{'Va
+lue'};
                            push @prod, $c32;
                        }
                        foreach my $col33 (33) {
                            $c33 = $Sheet->Cells( $row, $col33 )->{'Va
+lue'};
                            push @prod, $c33;
                        }
                        foreach my $col34 (34) {
                            $c34 = $Sheet->Cells( $row, $col34 )->{'Va
+lue'};
                            push @prod, $c34;
                        }
                        foreach my $col35 (35) {
                            $c35 = $Sheet->Cells( $row, $col35 )->{'Va
+lue'};
                            push @prod, $c35;
                        }
                        foreach my $col36 (36) {
                            $c36 = $Sheet->Cells( $row, $col36 )->{'Va
+lue'};
                            push @prod, $c36;
                        }
                        foreach my $col37 (37) {
                            $c37 = $Sheet->Cells( $row, $col37 )->{'Va
+lue'};
                            push @prod, $c37;
                        }
                        foreach my $col38 (38) {
                            $c38 = $Sheet->Cells( $row, $col38 )->{'Va
+lue'};
                            push @prod, $c38;
                        }
                        foreach my $col39 (39) {
                            $c39 = $Sheet->Cells( $row, $col39 )->{'Va
+lue'};
                            push @prod, $c39;
                        }
                    }
                }
            }
        }
    }
    $Book->Close;
}

######################################################################
+##########
#Prints of the created arrays
#print "@right\n";
#print "@legal\n";
#print "@prod\n";
######################################################################
+##########

######################################################################
+##############
#the process that eliminates duplicates in the @right array based on c
+riteria

#makes a copy of the array as it is destroyed in the next while statem
+ent, This is maybe one of the problems
my @rig = @right;  #copy of array
my @righ;          # a new array that is the result of the while state
+ment below
my %seen;          # a hash that stores agreementnumbers for unique en
+tries

#The array is in exact sets of 9 strings and i want it spliced a inter
+vals of exactly those intervals
while ( my ( $m, $n, $o, $p, $q, $r, $s, $t, $y ) = splice( @right, 0,
+ 9 ) ) {
    last if $m !~ /^\d{10}$/;
    next if $seen{$m}++;
    if ( $n =~ /Specific_region/ ) {
        if ( $o =~ /NG/ ) {
            push @righ, $m, $o, $p, $q, $r, $s, $t;
        }
    }

}

# print "@righ\n";

my @leg;     #an array that holds the result for the next while statem
+ent
my @spli;    #used as a container for certain entries in the while sta
+tement
my @joi;     #another middle of equation array for picking up results
%seen = ()
  ; #emties the previous hash as the uniqueness of entries is also imp
+ortant here
my @tes;    #yet another array for picking up results

######################################################################
+##########
#the process that insures that each returned value is printed in the c
+orrect
#form and coupled with the ten digit number 
#again the array is organized in 6 values in a row that need to be sep
+erated out into rows.

while ( my ( $h, $aa, $rr, $j, $k, $l ) = splice( @legal, 0, 6 ) ) {
    last if $h !~ /^\d{10}$/;
    if ( $aa =~ /Specific_region/ ) {
        if ( $rr =~ /NG/ ) {
            if ( $j =~ /\n/ ) {
                next if $seen{$h}++;
                my @spli = split( /\n/, $j );
                foreach my $n (@spli) {
                    if ( $n =~ /LSD/ ) {
                        my @tes = split( /LSD/, $n );
                        foreach my $lon (@tes) {
                            if ( $lon =~ /SEC/ ) {
                                my @joi = split( /-|W|:|\s|,|\(/, $lon
+ );
                                my $chans = @joi;
                                my $eleg =
                                  join( "", @joi[ 0, 2, 1, $chans - 1 
+] );
                                push @leg, $eleg, $h, $k, $l;
                            }
                        }
                    }
                    elsif ( $n =~ /\(/ ) {
                        my @joi   = split( /-|W|:|\s|\(/, $n );
                        my $chans = @joi;
                        my $eleg  = join( "", @joi[ 0, 2, 1, $chans - 
+2 ] );
                        push @leg, $eleg, $h, $k, $l;
                    }
                    else {
                        my @joi   = split( /-|W|:|\s/, $n );
                        my $chans = @joi;
                        my $eleg  = join( "", @joi[ 0, 2, 1, $chans - 
+1 ] );
                        push @leg, $eleg, $h, $k, $l;
                    }
                }
            }
            else {
                next if $seen{$h}++;
                my @joi   = split( /-|W|:|\s/, $j );
                my $chans = @joi;
                my $eleg  = join( "", @joi[ 0, 2, 1, $chans - 1 ] );
                push @leg, $eleg, $h, $k, $l;
            }
        }
    }

}

#print "@leg\n";
my @peg;

foreach my $loma (@leg) {
    if ( $loma =~ /^\d{7}$/ ) {
        substr( $loma, 6, 0, 0 );
        push @peg, $loma;
    }
    else {
        push @peg, $loma;
    }
}

######################################################################
+##############
#the process that creates the production array for the entries
my @produ;

while (
    my (
        $cp, $aaa, $rrr, $dp, $ep, $fp, $gp, $hp, $ip,
        $jp, $kp,  $lp,  $mp, $np, $op, $pp, $qp, $rp,
        $sp, $tp,  $up,  $vp, $wp, $yp, $xp
    )
    = splice( @prod, 0, 25 )
  )
{
    last if $cp !~ /^\d{10}$/;
    if ( $aaa =~ /Specific_region/ ) {
        if ( $rrr =~ /NG/ ) {
            unless (
                $dp =~ /a specific repeated text for all relevant entr
+ies/ )
            {
#eliminate this if statement for option two, where entries with no act
+ual production is included
                if ( defined($qp) && $qp =~ /\d\d-\d\d-\d{4}/ ) {
                    push @produ, $dp, $ep, $cp, $fp, $gp, $hp, $ip, $j
+p, $kp,
                      $lp, $mp, $np, $op, $pp, $qp, $rp, $sp, $tp, $up
+, $vp,
                      $wp, $yp, $xp;
                }
            }
        }
    }
}

# print "@produ\n";

my @nwells;    #an array that collects the results

#creates the 4.2 entries "agreements with no wells
while ( my ( $mn, $nn, $on, $pn, $qn, $rn, $sn, $tn, $yn ) =
    splice( @rig, 0, 9 ) )
{
    last if $mn !~ /^\d{10}$/;
    if ( $nn =~ /Specific_region/ ) {
        if ( $on =~ /NG/ ) {
            if ( $yn =~ /a specific repeated text for all relevant ent
+ries/ ) {
                push @nwells, $mn, $on, $pn, $qn, $rn, $sn, $tn, $yn;
            }
        }
    }
}

# print "@nwells\n";

#Places results into arrays of arrays for easy computation in excel. u
+ses natatime again the arrays
# are of a specific size. per row. 
my @AAR;
{
    my $iter = natatime 7, @righ;
    while ( my @tmp = $iter->() ) {
        push @AAR, \@tmp;
    }
}

my @BAR;
{
    my $iter = natatime 4, @peg;
    while ( my @tmp = $iter->() ) {
        push @BAR, \@tmp;
    }
}

my @CAR;
{
    my $iter = natatime 23, @produ;
    while ( my @tmp = $iter->() ) {
        push @CAR, \@tmp;
    }
}

my @DAR;
{
    my $iter = natatime 8, @nwells;
    while ( my @tmp = $iter->() ) {
        push @DAR, \@tmp;
    }
}

#####The new excel sheets that should contain the results
my $workbooknew  = Excel::Writer::XLSX->new('re1.xlsx');
my $worksheetnew = $workbooknew->add_worksheet();

$worksheetnew->keep_leading_zeros();
$worksheetnew->set_column( 'A:G', 30 );
$worksheetnew->write_col( 'A2', \@AAR );

####This is the data on the legalnumber - introduce keep_leading zeroe
+s for correct legal form
my $workbooknew1  = Excel::Writer::XLSX->new('re2.xlsx');
my $worksheetnew1 = $workbooknew1->add_worksheet();

$worksheetnew1->keep_leading_zeros();
$worksheetnew1->set_column( 'A:D', 15 );
$worksheetnew1->write_col( 'A2', \@BAR );

# # ####This is the data on the production of the wells - introduce ke
+ep_leading zeroes for correct legal form
my $workbooknew2  = Excel::Writer::XLSX->new('re3.xlsx');
my $worksheetnew2 = $workbooknew2->add_worksheet();
my $worksheetnew3 = $workbooknew2->add_worksheet();

$worksheetnew2->set_column( 'A:W', 50 );
$worksheetnew2->write_col( 'A2', \@CAR );
$worksheetnew3->set_column( 'A:H', 30 );
$worksheetnew3->write_col( 'A2', \@DAR );
[download]

Comment on Out of memory and While replacements with excel XLSX application Download Code

Replies are listed 'Best First'.
Re: Out of memory and While replacements with excel XLSX application by jmcnamara (Monsignor) on Oct 08, 2012 at 10:02 UTC
The "out of memory" issue is probably caused by Excel::Writer::XLSX. Especially if you have 1 million rows. You can reduce its memory usage (almost completely) by setting the workbook `set_optimization()` method. `... my $workbook = Excel::Writer::XLSX->new( file.xlsx' ); $workbook->set_optimization(); my $worksheet = $workbook->add_worksheet(); ...` [download] See the Speed and Memory Usage section of the Excel::Writer::XLSX docs for a full explanation. -- John.	[reply] [d/l]
Re^2: Out of memory and While replacements with excel XLSX application by Weinstar (Initiate) on Oct 09, 2012 at 05:09 UTC
This answer was the first one I succefully implemeted. It was a solution to the original question of how to avoid a "out of memory" return, while still writing to .XLSX files. Thanks to the poster. The total system memory use never exceeded 4GB as before and the total runtime seemed unchanged. I will test some the other solutions to see if the runtime can be optimized through better coding on my part.	[reply]
Re: Out of memory and While replacements with excel XLSX application by aufflick (Deacon) on Oct 08, 2012 at 04:37 UTC
Welcome to coding! I haven't used XLSX myself and this might be a naive suggestion but could your situation be satisfied with CSV files as your output (which you could then convert to XLSX files in Excel if you want to send XLSX to others instead of CSV)? Especially if you have huge amounts of data you can simply print to a raw text file, but you can also use very streamlined modules like Text::CSV.	[reply]
Re: Out of memory and While replacements with excel XLSX application by marquezc329 (Scribe) on Oct 08, 2012 at 09:00 UTC
As anonymous before me stated, use subroutines to break up your program into smaller more manageable chunks. Then try to optimize each segment. Not only will this get you closer to solving your problems and more than likely eliminate alot of the memory usage along the way, but it will allow you to post a more concentrated chunk of code that is more likely to attract the attention of the monks. From perlfaq3: "When it comes to time-space tradeoffs, Perl nearly always prefers to throw memory at a problem. Scalars in Perl use more memory than strings in C, arrays take more than that, and hashes use even more." You have created an enormous amount of temporary scalars that are all named very similarly (i.e. `$c1..$c39`) and used in very repetitive code. To start off, maybe try creating a subroutine to handle those more efficiently. `sub extract { my @arrays = (\@right, \@legal, \@prod); foreach 1..39 { foreach my $col ($_) { next unless defined $Sheet->Cells($row, $col)->{'Value'}; my $c = $Sheet->Cells($row, $col)->{'Value'}; map {push @$_, $c} @arrays; } } }` [download] This is just an example, but I'm sure if you go through your program you'll be able to find multiple situations where things can be cut down and made faster, simpler, and easier to maintain/read. It might do you well to clean up your comments, and include a sample of the data you plan on working with before making a post like this. It'll increase the likelihood that you'll get help from the guys that know most. You might try reading How (Not) To Ask A Question. Good Luck!	[reply] [d/l] [select]
Re: Out of memory and While replacements with excel XLSX application by Anonymous Monk on Oct 08, 2012 at 08:00 UTC
I have a feeling I am copying the data into memory several times, but I do not know how to avoid this. Rewrite your program into subroutines, use lots of references, and watch the size of your program at each stage There is a lot of copy/paste duplication in your program that should be rewritten There are a lot of short variable names, way too many to keep track of, way too many to help you substantially rewrite what you posted Also you say The total amount of rows is close to 1 million. and that is close to the MAXIMUM LIMIT for Excel of 1,048,576 rows by 16,384 columns You might switch to DBD::SQLite, see Loading 283600 records (WordNet), Re: Is there a way(maybe tool?) to know the amount of memory an array takes?, dynamic bulk insert in sqlite	[reply]
Re^2: Out of memory and While replacements with excel XLSX application by Anonymous Monk on Oct 08, 2012 at 09:29 UTC
Here is a start at refactoring Read more... (40 kB) I end up with fudge10 <Reveal this spoiler>	[reply] [d/l] [select]
Re^3: Out of memory and While replacements with excel XLSX application by Anonymous Monk on Oct 08, 2012 at 09:41 UTC
ValueTargetCols should loop over rows first then cols, but it probably makes no difference in the results in this case	[reply]
Re: Out of memory and While replacements with excel XLSX application by afoken (Chancellor) on Oct 08, 2012 at 17:30 UTC
I need to go over 20 seperate excel files (.xls) and grap values based on certain patterns and then copy them into new .XLSX sheets. The total amount of rows is close to 1 million. I think an Excel sheet with one million lines is a clear case for using a relational database (plus a report generator) instead of Excel. If you have no better idea, use MS Access as a front-end to some stripped-down MS SQL Server (last time I used it the stripped-down version was called MSDE) or PostgreSQL. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: Out of memory and While replacements with excel XLSX application by Weinstar (Initiate) on Oct 08, 2012 at 22:57 UTC
Hello. Thank you to all of you for your reponses. There seems to be a lot to go on and a general sentiment, that this is doable, and with a lot of optimization, so that is great. It will proberbly take me some time to work through it all, but I will post the final update, as soon as i am done, for future reference. Again I thank you for taking your time to look over my problem.	[reply]
Re^2: Out of memory and While replacements with excel XLSX application by Anonymous Monk on Jul 10, 2014 at 11:08 UTC
Hi, I'm using Spreadsheet::XLSX to parse the xlsx file. for around 20000 rows , the memory it takes around 800 MB, though size of file is 5 MB. Can anybody helps me to optimize it? following is the code: my ($self, $in_file, $out_file) = @_; $count = @_; $count -= 1; if(2 != $count) { util->logit("[$PARENT_PROC_NAME]: xlsx_to_csv:invalid args [@_ +] [$count]"); return $FAILURE; } my $excel = Spreadsheet::XLSX -> new ($in_file); my $line; my $out_csv_name = $out_file; my $orig_out_file = $out_csv_name; #binmode CSV; binmode CSV, ":utf8"; my $Csv = Text::CSV_XS->new({ 'quote_char' => '"', 'escape_char' => '"', 'sep_char' => ',', 'binary' => 1, }); my $i = 1; foreach my $sheet (@{$excel -> {Worksheet}}) { $i++; $out_csv_name = "$orig_out_file$underscore$sheet->{Name}.csv"; #$out_csv_name =~ s/\s+//g; @ret_array[$i] = $out_csv_name; open CSV, "> $out_csv_name" \|\| die "Cannot create csv file: $! +" ; $sheet -> {MaxRow} \|\|= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) { my @Row; $sheet -> {MaxCol} \|\|= $sheet -> {MinCol}; foreach my $col ($sheet -> {MinCol} .. $sheet -> {MaxCol} +) { my $cell = $sheet -> {Cells} [$row] [$col]; my $Value = ""; if ($cell) { #$Value = $cell->Value; #rounds off #$Value = $cell->{_Value}; #rounds off #$Value = $cell->Value(); #rounds off #$Value = $cell->{Val}; $Value = $cell->unformatted(); #my $avl = $sheet->{Cells}[$row][$col]{Val}; #print "avl:[$avl]\n"; #$Value = $cell->value(); if ($Value eq 'GENERAL') { $Value = $cell->{Val}; } } last if $col == $sheet->{MinCol} and !$Value; push(@Row, $Value); } next unless @Row; my $Status = $Csv->combine(@Row); if (!defined $Status) { my $Error = $Csv->error_input(); warn "ERROR FOUND!: $Error"; } if (defined $Status) { my $Line = $Csv->string(); print CSV "$Line\n"; } [download] I don't know how to do it...I'm new to perl. Kindly guide me.	[reply] [d/l]
Re^3: Out of memory and While replacements with excel XLSX application by AppleFritter (Vicar) on Jul 10, 2014 at 11:51 UTC
Can't help you right now (sorry), but as a quick suggestion -- it's best to post your question as a new top-level node in Seekers of Perl Wisdom.	[reply]


Syntactic Confectionery Delight
	PerlMonks