Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Out of memory error while parsing a specific excel file

by gone2015 (Deacon)
on Feb 16, 2009 at 12:47 UTC ( [id://744064]=note: print w/replies, xml ) Need Help??


in reply to Out of memory error while parsing a specific excel file

The trick is to remember that the source for these things is not secret...

...looking at Spreadsheet::ParseExcel::FmtDefault I see:

78: sub TextFmt { 79: my ( $oThis, $sTxt, $sCode ) = @_; 80: return $sTxt if ( ( !defined($sCode) ) || ( $sCode eq '_nativ +e_' ) ); 81: return pack( 'U*', unpack( 'n*', $sTxt ) ); 82: }
which gives us a clue at to what's going on.

Being an Excel file we may expect it to contain UTF-16. Line 81 clearly expects $sTxt to contain little-endian 16-bit Unicode character values, and is decoding those to a Perl "utf8" string. This is throwing warnings at you when it sees Unicode Surrogate values. It's not a good way of decoding UTF-16.

Looking at the values complained of (for example: 0xdb79, 0xdbb1, 0xd83e, ...) don't look plausible, because they correspond to Unicode code blocks which are currently undefined. So, either the data is broken, or some part of the parsing is broken.

...this doesn't tell you the cause of the problem. But hopefully it suggests where to look...

I'd try cutting down the input file, looking for the minimum size file that goes wrong. Given something small enough, you could attack the problem with the debugger...

Replies are listed 'Best First'.
Re^2: Out of memory error while parsing a specific excel file
by pankaj_it09 (Scribe) on Feb 17, 2009 at 06:01 UTC
    Do you think that the program is terminating because of the UTF encoding errors ?
    If they are just the warnings then we can ignore them as of now.

    Currently we don't want the program to terminate the execution.

      I cannot tell how or whether the UTF complaints are connected to the final out of memory!

      They do, however, suggest that the problem spreadsheet contains stuff which is throwing Spreadsheet::ParseExcel off track -- possibly stuff which it doesn't know about, or stuff which is invalid, or stuff which tickles a bug...

      However, when faced with an obscure symptom, I find it valuable to address the visible ones... at worst that clears away the clutter, at best one of the visible symptoms will turn out to be related to the obscure one.

      In any event, I would try cutting down the problem spreadsheet to the smallest possible file that provokes the symptoms, and see where that gets me.

Re^2: Out of memory error while parsing a specific excel file
by pankaj_it09 (Scribe) on Feb 17, 2009 at 06:07 UTC
    I think the program termination is not due to UTF encoding errors. Its probably due to substr errors.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://744064]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2024-04-19 09:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found