Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)

by marto (Cardinal)
on May 29, 2020 at 12:50 UTC ( [id://11117463]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)
in thread Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)

"...I found it gave a lot of warnings for empty cells and it did not even handle the utf-8 euro symbol at all"

Emphasis mine. From rt://40061:

'The Euro symbol isn't stored in Excel as a single character. It is stored as a UTF-16 character and is returned by Spreadsheet::ParseExcel as a Perl utf8 character.'

'This is what you are seeing and in terms of the way Spreadsheet::ParseExcel handles Unicode it is the correct behaviour.'

'How you should handle the utf8 string from there depends on what you want to do with it.'

further on..

"I've come to the conclusion the there is no great solution for parsing Excel files with Perl. That makes me sad. "

If you are saddened by what is available, extend what's there or write your own. Alternatively Spreadsheet::Read:

#!/usr/bin/perl use strict; use warnings; use feature 'say'; use Spreadsheet::Read; use open ':std', ':encoding(UTF-8)'; my $book = ReadData('euro.xlsx'); my $cell = $book->[1]{A1}; say $cell;

Output:

€40

Replies are listed 'Best First'.
Re^4: Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)
by nysus (Parson) on May 29, 2020 at 13:07 UTC

    Spreadsheet::Read also uses Spreadsheet::XLSX under the hood. Buggy or not, I found that Spreadsheet::XLSX handled the euro symbol without issue via Spreasheet::BasicRead.

    I think the heart of the problem, as I've discovered, is that the modules are trying to applying a custom format improperly and there is really no reason for these modules to try to apply the custom formatting because all it does is insert some padding into the cell. To address this, I'm just going to strip out the weird formatting command in the format:  _€, and not worry about this anymore.

    All the XLSX modules, with the exception of Spreadsheet::Read, haven't been touched in about 4 years and have long issue queues. But Spreadsheet::Read relies on modules that are not maintained.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

      Spreadsheet::Read just uses what is available to parse the source given, and prefers Spreadsheet::ParseXLSX over Spreadsheet::XLSX unless specifically told to use the buggy module, which is supported because it can be supported (and because the better module did not yet exist when the buggy module was readily available when I started Spreadsheet::Read. It so only relies on (bad) modules if there are no better alternatives available.

      If you find bugs, real bugs, in the parser, make a ticket. If you find bugs in the wrapper as it does not deal with the parser correctly, let me know.

      I am working with Corion to get his new parsers for ODS supported too, but that is currently too unstable.

      As I use XLSX for quite a number of (big) jobs, I am fairly certain it behaves well. I must admit that I never had to deal with buggy custom formats, so I never hit any trouble.


      Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11117463]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-04-18 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found