Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The trick is to remember that the source for these things is not secret...

...looking at Spreadsheet::ParseExcel::FmtDefault I see:

78: sub TextFmt { 79: my ( $oThis, $sTxt, $sCode ) = @_; 80: return $sTxt if ( ( !defined($sCode) ) || ( $sCode eq '_nativ +e_' ) ); 81: return pack( 'U*', unpack( 'n*', $sTxt ) ); 82: }
which gives us a clue at to what's going on.

Being an Excel file we may expect it to contain UTF-16. Line 81 clearly expects $sTxt to contain little-endian 16-bit Unicode character values, and is decoding those to a Perl "utf8" string. This is throwing warnings at you when it sees Unicode Surrogate values. It's not a good way of decoding UTF-16.

Looking at the values complained of (for example: 0xdb79, 0xdbb1, 0xd83e, ...) don't look plausible, because they correspond to Unicode code blocks which are currently undefined. So, either the data is broken, or some part of the parsing is broken.

...this doesn't tell you the cause of the problem. But hopefully it suggests where to look...

I'd try cutting down the input file, looking for the minimum size file that goes wrong. Given something small enough, you could attack the problem with the debugger...


In reply to Re: Out of memory error while parsing a specific excel file by gone2015
in thread Out of memory error while parsing a specific excel file by pankaj_it09

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (7)
As of 2024-04-17 14:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found