Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Out of memory error while parsing a specific excel file

by pankaj_it09 (Scribe)
on Feb 16, 2009 at 07:48 UTC ( [id://743996]=perlquestion: print w/replies, xml ) Need Help??

pankaj_it09 has asked for the wisdom of the Perl Monks concerning the following question:

The perl code is as below :
use strict; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handler, NotSetCell => 1 ); my $workbook = $parser->Parse('testfile.xls'); sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; print $cell->unformatted(), "\n"; }
The system output(error) is as below:
D:\Perl\bin\search tool>perl testa.pl UTF-16 surrogate 0xdb79 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdbb1 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xd83e at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdff8 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdbff at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdd98 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xd9bf at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdcd7 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdde6 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdabe at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdb71 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xd912 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdab0 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. Unicode character 0xfdde is illegal at D:/Perl/site/lib/Spreadsheet/ ParseExcel/FmtDefault.pm UTF-16 surrogate 0xdc77 at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm + line 1015. Use of uninitialized value in length at D:/Perl/site/lib/Spreadsheet/ ParseExcel.pm line 1951. Use of uninitialized value $sTxt in unpack at D:/Perl/site/lib/ Spreadsheet/ParseExcel/FmtDefa substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm + line 1020. UTF-16 surrogate 0xdeec at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. UTF-16 surrogate 0xdd7e at D:/Perl/site/lib/Spreadsheet/ParseExcel/ FmtDefault.pm line 81. substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm + line 1196. Use of uninitialized value in unpack at D:/Perl/site/lib/Spreadsheet/ ParseExcel.pm line 1196. substr outside of string at D:/Perl/site/lib/Spreadsheet/ParseExcel.pm + line 1196. Use of uninitialized value in unpack at D:/Perl/site/lib/Spreadsheet/ ParseExcel.pm line 1196. Out of memory!
The system information is as below :
Perl version : 5.010000 OS name : MSWin32 Module versions: (not all are required) Spreadsheet::ParseExcel 0.49 Scalar::Util 1.19 Unicode::Map (not installed) Spreadsheet::WriteExcel (not installed) Parse::RecDescent (not installed) File::Temp 0.18 OLE::Storage_Lite 0.18 IO::Stringy 2.110

Replies are listed 'Best First'.
Re: Out of memory error while parsing a specific excel file
by Anonymous Monk on Feb 16, 2009 at 08:00 UTC
      If you want I can give you the excel file.
Re: Out of memory error while parsing a specific excel file
by gone2015 (Deacon) on Feb 16, 2009 at 12:47 UTC

    The trick is to remember that the source for these things is not secret...

    ...looking at Spreadsheet::ParseExcel::FmtDefault I see:

    78: sub TextFmt { 79: my ( $oThis, $sTxt, $sCode ) = @_; 80: return $sTxt if ( ( !defined($sCode) ) || ( $sCode eq '_nativ +e_' ) ); 81: return pack( 'U*', unpack( 'n*', $sTxt ) ); 82: }
    which gives us a clue at to what's going on.

    Being an Excel file we may expect it to contain UTF-16. Line 81 clearly expects $sTxt to contain little-endian 16-bit Unicode character values, and is decoding those to a Perl "utf8" string. This is throwing warnings at you when it sees Unicode Surrogate values. It's not a good way of decoding UTF-16.

    Looking at the values complained of (for example: 0xdb79, 0xdbb1, 0xd83e, ...) don't look plausible, because they correspond to Unicode code blocks which are currently undefined. So, either the data is broken, or some part of the parsing is broken.

    ...this doesn't tell you the cause of the problem. But hopefully it suggests where to look...

    I'd try cutting down the input file, looking for the minimum size file that goes wrong. Given something small enough, you could attack the problem with the debugger...

      Do you think that the program is terminating because of the UTF encoding errors ?
      If they are just the warnings then we can ignore them as of now.

      Currently we don't want the program to terminate the execution.

        I cannot tell how or whether the UTF complaints are connected to the final out of memory!

        They do, however, suggest that the problem spreadsheet contains stuff which is throwing Spreadsheet::ParseExcel off track -- possibly stuff which it doesn't know about, or stuff which is invalid, or stuff which tickles a bug...

        However, when faced with an obscure symptom, I find it valuable to address the visible ones... at worst that clears away the clutter, at best one of the visible symptoms will turn out to be related to the obscure one.

        In any event, I would try cutting down the problem spreadsheet to the smallest possible file that provokes the symptoms, and see where that gets me.

      I think the program termination is not due to UTF encoding errors. Its probably due to substr errors.
Re: Out of memory error while parsing a specific excel file
by jmcnamara (Monsignor) on Feb 18, 2009 at 13:32 UTC

    Answered on the Spreadsheet::ParseExcel Google Group here.

    The problem is that the particular file is encrypted.

    --
    John.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://743996]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-20 20:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found