Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Out of memory using Spreadsheet::ParseExcel

by talexb (Chancellor)
on Feb 25, 2002 at 17:45 UTC ( [id://147349]=perlquestion: print w/replies, xml ) Need Help??

talexb has asked for the wisdom of the Perl Monks concerning the following question:

Has anyone experience handling 'Out of memory!' problems while parsing a large (about 1.3M) Excel spreadsheet through Spreadsheet::ParseExcel?

I am running this from the command line on a Linux box running FreeBSD 4.1.1-STABLE and Perl 5.6.0. I suppose one options is to start writing C: I'm quite capable of doing that, but I don't really have time for this project.

--t. alex

"There was supposed to be an earth-shattering kaboom!" --Marvin the Martian

Replies are listed 'Best First'.
Re: Out of memory using Spreadsheet::ParseExcel
by dragonchild (Archbishop) on Feb 25, 2002 at 19:08 UTC
    Yes. Spreadsheet::ParseExcel uses a rather inefficient object model. What you can do is edit the Cell class and remove unneeded stuff, like Font and Color. That will reduce the memory needed from hundreds of Gb to dozens of Gb. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.


      Could you give some additional information about what is required. It might be helpful to other people faced with the same problem. I've seen questions like this on comp.lang.perl.* as well.

      --
      John.

        In Spreadsheet/ParseExcel.pm, there's a function called _NewCell(). What I've done is make it look like this:
        sub _NewCell($$$%) { my($oBook, $iR, $iC, %rhKey)=@_; my($sWk, $iLen); return undef unless(defined $oBook->{_CurSheet}); my $oCell = Spreadsheet::ParseExcel::Cell->new( Val => $rhKey{Val}, FormatNo=> $rhKey{FormatNo}, Format => $rhKey{Format}, # Code => $rhKey{Code}, # Type => $oBook->{FmtClass}->ChkType( # $rhKey{Numeric}, # $rhKey{Format}->{FmtIdx}), ); # $oCell->{_Kind} = $rhKey{Kind}; $oCell->{_Value} = $oBook->{FmtClass}->ValFmt($oCell, $oBook); # if($rhKey{Rich}) { # my @aRich = (); # my $sRich = $rhKey{Rich}; # for(my $iWk=0;$iWk<length($sRich); $iWk+=4) { # my($iPos, $iFnt) = unpack('v2', substr($sRich, $iWk)); # push @aRich, [$iPos, $oBook->{Font}[$iFnt]]; # } # $oCell->{Rich} = \@aRich; # } if(defined $_CellHandler) { if(defined $_Object){ no strict; ref($_CellHandler) eq "CODE" ? $_CellHandler->($_Object, $oBook, $oBook->{_CurShe +et}, $iR, $iC, $oCell) : $_CellHandler->callback($_Object, $oBook, $oBook-> +{_CurSheet}, $iR, $iC, $oCell); } else{ $_CellHandler->($oBook, $oBook->{_CurSheet}, $iR, $iC, $oC +ell); } } unless($_NotSetCell) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Cells}[$iR][$iC] = $oCell; } return $oCell; }
        Personally, I think that Takanori Kawai didn't fully understand the language when he wrote it.
        1. You don't need Hungarian notation when writing Perl. It obfuscates more than it helps.
        2. The use of prototypes on method calls doesn't do anything but confuse the reader.
        3. Comments that don't really say much. ("Skip Font4" is a direct quote. Why?!?)
        In his defence, it does say that this is an alpha release. I just wish the beta would come out. I could change it, but I don't feel comfortable as I don't know the formatting he's working with.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Out of memory using Spreadsheet::ParseExcel
by jmcnamara (Monsignor) on Feb 25, 2002 at 22:42 UTC

    You could also try xlHtml which is an Excel to HTML converter written in C.

    You could then parse the information that you need out of the HTML using HTML::TableExtract or some other tool.

    --
    John.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://147349]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2024-04-26 06:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found