Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)

by nysus (Parson)
on May 28, 2020 at 21:06 UTC ( #11117438=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I've gotten the ball to the 4 yard line on a problem I've been working on wrt to the Spreadsheet::XLSX module. Looking for help banging this problem into the end zone. I don't know enough about XLSX parsing to properly fix this.

The problem:

I'm parsing a .xlsx file generated by quickbooks. The spreadsheet has some custom format codes in it. Here's the appropriate xml from the styles.xml file embedded in the xlsx archive:

<numFmt numFmtId="165" formatCode="#,##0.00\ _Ä"/><numFmt numFmtId="16 +6" formatCode="&quot;$&quot;* #,##0.00\ _Ä"/>

When extracting data from the cell with this custom format code, the value looked like this:

'20.00  Ä'

Note the euro symbol tacked on to the end. This symbol does not show up when viewing the spreadsheet.

The hack:

To get this working the way I wanted, I hacked the Spreadsheet::XLSX module. I made two changes. One to the __load_styles function:

sub __load_styles { my ($zip) = @_; my $member_styles = $zip->memberNamed('xl/styles.xml'); my @styles = (); my %style_info = (); if ($member_styles) { my $formatter = Spreadsheet::XLSX::Fmt2007->new(); foreach my $t ($member_styles->contents =~ /xf\ numFmtId="([^" +]*)"(?!.*\/cellStyleXfs)/gsm) { #" push @styles, $t; } my $default = $1 || ''; foreach my $t1 (@styles) { $member_styles->contents =~ /numFmtId="$t1" formatCode="([ +^"]*)/; my $formatCode = $1 || ''; if ($formatCode eq $default || not($formatCode)) { if ($t1 == 9 || $t1 == 10) { $formatCode = '0.00000%'; } elsif ($t1 == 14) { $formatCode = 'yyyy-mm-dd'; } else { $formatCode = ''; } } $formatCode = $formatter->FmtStringDef($t1); $style_info{$t1} = $formatCode; $default = $1 || ''; } } return (\@styles, \%style_info); }

The other change I made was to the Spreadsheet::XLSX::Fmt2007.pm module where I added in the hex values for "165" and "166" along with the format I wanted without the euro sign:

0xA5 => '#,##0.00', 0xA6 => '&quot;$&quot;* #,##0.00',

Obviously this is a bad hack. What is the proper way to fix these modules so I can have custom fomatters that work as intended when parsing an xlsx file?

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)
by Tux (Canon) on May 29, 2020 at 08:41 UTC

    I say it once more: DO NOT USE Spreadsheet::XLSX ! It is dead, buggy and not maintained anymore!

    Chasing bugs in this old crap is a waste of time, as the bugs won't get fixed anyway. The module has served its goal and was useful, very useful, when it was written (and maintained), but please people, stop using it.

    Install and use Spreadsheet::ParseXLSX, rewrite you script to match its API and then check if you still have problems.


    Enjoy, Have FUN! H.Merijn

      I did try Spreadsheet::ParseXLSX (stumbled on it while trying to see what to do about this problem) but I found it gave a lot of warnings for empty cells and it did not even handle the utf-8 euro symbol at all. I've come to the conclusion the there is no great solution for parsing Excel files with Perl. That makes me sad.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        "...I found it gave a lot of warnings for empty cells and it did not even handle the utf-8 euro symbol at all"

        Emphasis mine. From rt://40061:

        'The Euro symbol isn't stored in Excel as a single character. It is stored as a UTF-16 character and is returned by Spreadsheet::ParseExcel as a Perl utf8 character.'

        'This is what you are seeing and in terms of the way Spreadsheet::ParseExcel handles Unicode it is the correct behaviour.'

        'How you should handle the utf8 string from there depends on what you want to do with it.'

        further on..

        "I've come to the conclusion the there is no great solution for parsing Excel files with Perl. That makes me sad. "

        If you are saddened by what is available, extend what's there or write your own. Alternatively Spreadsheet::Read:

        #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Spreadsheet::Read; use open ':std', ':encoding(UTF-8)'; my $book = ReadData('euro.xlsx'); my $cell = $book->[1]{A1}; say $cell;

        Output:

        Ä40
        Maybe fix those problems instead then? Post an SSCCE

      And based on the issue queue which looks to be largely ignored, I'd say this ParseXLSX is abandonware.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        IMHO it shows two things:

        1. People use it
        2. It needs more people to help fixing the issues

        Don't moan, help!


        Enjoy, Have FUN! H.Merijn
Re: Looking for assistance for proper fix for Spreadsheet::XLSX bug(?)
by swl (Vicar) on May 28, 2020 at 22:10 UTC

    The first thing is to report this as a bug, but there appear to have been no responses to rt bugs since the last release.

    If it's just local patching then you could try superclassing Spreadsheet::XLSX. For example (untested and incomplete):

    package Spreadsheet::XLSX::Fourecks; use parent qw /Spreadsheet::XLSX/; sub __load_styles { ... } 1;

      Thanks. But how is the parsing software supposed to handle those custom format codes? Iím not sure how it is supposed to work. I donít want to have to add a new format each time something similar pops up.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        Good point. My suggestion only patches in your own method/s. It does not generalise the parsing.

        I know nothing about excel custom format codes, so unfortunately cannot help on that front.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11117438]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2022-05-24 18:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (84 votes). Check out past polls.

    Notices?