Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Unicode path with Spreadsheet::XLSX

by Anonymous Monk
on Mar 17, 2018 at 18:20 UTC ( [id://1211157]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I want to read an XLSX file on Windows 10 which is saved in a folder with not Latin characters. Even if I tried different ways and modules, it fails. In my example the folder+filename contains Russian characters. With Latin alphabets everything works fine. The problems seems related to the module Zip::Archive. Any suggestions?

use Spreadsheet::XLSX; use Win32::LongPath; use Win32; use utf8; my $path="Ршзефф\\faoü.xlsx"; #my $path="Ршзефф.xlsx"; #my $path="faoü.xlsx"; #my $MYPath = Win32::GetANSIPathName ($path); my $MYPath = shortpathL ($path); print "$MYPath \n\n"; my $excel = Spreadsheet::XLSX -> new ($MYPath); my $counter=0; foreach my $sheet (@{$excel -> {Worksheet}}) { if ($counter eq 0){#work only on first sheet now $sheet -> {MaxRow} ||= $sheet -> {MinRow}; foreach my $row ($sheet -> {MinRow} .. $sheet +-> {MaxRow}) { $sheet -> {MaxCol} ||= $sheet -> {MinCol}; + foreach my $col ($sheet -> {MinCol} .. $s +heet -> {MaxCol}) { my $cell = $sheet -> {Cells} [$row] [$ +col]; if ($cell) { my $cell_value = ($cell -> {Va +l}); $cell_value =~ s/\n//g; print $cell_value; } print "\t"; } print "\n"; } $counter=1; } }

Replies are listed 'Best First'.
Re: Unicode path with Spreadsheet::XLSX
by huck (Prior) on Mar 18, 2018 at 02:18 UTC

    At http://cpansearch.perl.org/src/MIKEB/Spreadsheet-XLSX-0.15/lib/Spreadsheet/XLSX.pm i find

    sub __load_zip { my ($filename) = @_; my $zip = Archive::Zip->new(); if (ref $filename) { $zip->readFromFileHandle($filename) == Archive::Zip::AZ_OK or +die("Cannot open data as Zip archive"); } else { $zip->read($filename) == Archive::Zip::AZ_OK or die("Cannot op +en $filename as Zip archive"); } return $zip; }
    which seems to suggest that you can pass a open filehandle instead of a path to new. You may want to open the file yourself and pass the filehandle instead of the path. something like
    open (my $MYfh,'<',$MYpath) or die "cant open $MYPath @!"; my $excel = Spreadsheet::XLSX -> new ($MYfh);
    Or including info from http://search.cpan.org/~rboisvert/Win32-LongPath-1.03/lib/Win32/LongPath.pm
    my $MYfh; openL (\$MYfh, '<:encoding(UTF-8)', $MYPath) or die ("unable to open $file ($^E)"); my $excel = Spreadsheet::XLSX -> new ($MYfh);
    Just an UNTESTED thought

Re: Unicode path with Spreadsheet::XLSX
by afoken (Chancellor) on Mar 18, 2018 at 10:05 UTC

    Files on Windows generally have a "short" name that is compatible with legacy software (i.e. DOS, Windows 3.x Applications). It is limited to the old DOS "8.3" schema and contains no spaces, no tricky characters, just a subset of printable ASCII.

    Demo:

    C:\>mkdir temp C:\>cd temp C:\temp>echo hi > "foo bar.batz" C:\temp>echo hello > "metäl ümläüts äre silly.öpiniön" C:\temp>dir /x Volume in drive C is System Volume Serial Number is D4F2-C5FF Directory of C:\temp 18.03.2018 10:56 <DIR> . 18.03.2018 10:56 <DIR> .. 18.03.2018 10:54 5 FOOBAR~1.BAT foo bar.batz 18.03.2018 10:56 8 METLML~1.PIN metäl ümläüts äre sil +ly.öpiniön 2 File(s) 13 bytes 2 Dir(s) 56.432.050.176 bytes free C:\temp>

    Those names are not limited to legacy applications. Any application can use them, including perl:

    C:\temp>perl -pe 1 foobar~1.bat metlml~1.pin hi hello C:\temp>

    Another, unrelated benefit of those legacy aliases is that they tend to be shorter than the original name. This can be useful to handle files with ridiculously long names, longer than Windows' limits usually allow. See deepcopy for details.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Unicode path with Spreadsheet::XLSX
by Anonymous Monk on Mar 18, 2018 at 12:30 UTC

    Thank you for your suggestions.

    At the end, my original script works using both:

    my $MYPath = Win32::GetANSIPathName ($path); my $MYPath = shortpathL ($path);

    However, $MYPath cannot be in the same folder as the script itself! Moving it somewhere else, for example on the Desktop, and reading it with:

    my $PathDesktop = File::HomeDir->my_desktop; my $MYPath= $PathDesktop . "\\&#1056;&#1096;&#1079;&#1077;&#1092;&#109 +2;.xlsx";

    allows the script to work fine.

    The idea of copying the file with a 'simple' file name is worth using, as it is simple and effective. Direct reading the file handle did not work for me.

      Some more observations. The same script using the short path provided by the two modules introduced above seems to fail if the files are not on the same hard drive of the Windows OS. I do not know if it makes sense: but when the same file/folder with unicode is on another drive, it just fails.

Re: Unicode path with Spreadsheet::XLSX
by Anonymous Monk on Mar 18, 2018 at 00:51 UTC

    Make a temporary copy with simple name, don't fight someone else's problems.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1211157]
Approved by davies
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found