Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Convert XLSX to TSV and remove CRLF in cells

by jandrew (Chaplain)
on Jun 17, 2015 at 18:10 UTC ( [id://1130853]=note: print w/replies, xml ) Need Help??


in reply to Convert XLSX to TSV and remove CRLF in cells

As the author of Spreadsheet::XLSX::Reader::LibXML I can confirm that the package was not written to be the fastest (skip to the last paragraph for those packages). I would however, like to certify an element of your initial question. You list Text::Iconv as one of the modules that you use. Since that is an option for Spreadsheet::XLSX and not For Spreadsheet::XLSX::Reader::LibXML which spreadsheet parser are you using?

MidLifeXis++ (xkcd++) already gave you the simple optimization to set group_return_type to 'unformatted' (fastest) or 'value' (a little less fast)

You mentioned a desire to get a whole row (#3). Use the fetchrow_arrayref command if you wish. The 'fetchrow_array' and 'fetchrow_hashref' commands are also documented in the Worksheet pod

Even with those elements this parser may still not be as fast as you want. I am always interested in improving my parser. My preference for improvement requests is for you to open an issue in my github repo so you can attach any files you are allowed or are willing to share for testing. I would be happy to see if there are speed optimizations available.

With all that said, Spreadsheet::XLSX and Spreadsheet::ParseXLSX are both faster XLSX parsers by design.

update:s/ParseExcel/ParseXLSX/
  • Comment on Re: Convert XLSX to TSV and remove CRLF in cells

Replies are listed 'Best First'.
Re^2: Convert XLSX to TSV and remove CRLF in cells
by Tux (Canon) on Jun 17, 2015 at 19:15 UTC

    Please do not promote Spreadsheet::XLSX (I deliberately omit the link so it is harder to click). It is deprecated and buggy. DO use Spreadsheet::ParseXLSX!


    Enjoy, Have FUN! H.Merijn

      Tux I agree that Spreadsheet::XLSX is buggy. However, it is my experience that XML::Twig segfaults due to a perl bug in Windows perls prior to 5.15. Since Spreadsheet::ParseXLSX is built on that it makes both of these packages buggy for a certain population of users. (Which is partly why I wrote my package on XML::LibXML.)

      On the other hand I have run into a lot of implementations of both of these packages where people are quite happy with them. Additionally for small spreadsheets where you are only extracting data and not formats, Spreadsheet::XLSX tends to be faster.

      update:I think Spreadsheet::ParseXLSX also fails to open Excel sheets that contain dedicated chartsheets (not worksheets). The XML::Twig RT que is a bit daunting, and the current release on CPAN testers has open fails. Otherwise I agree that Spreadsheet::ParseXLSX is a really great module.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1130853]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-04-23 21:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found