Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Cross-platform accented character file names sorting

by perlimpinpin (Initiate)
on May 19, 2015 at 15:49 UTC ( [id://1127147]=perlquestion: print w/replies, xml ) Need Help??

perlimpinpin has asked for the wisdom of the Perl Monks concerning the following question:

Most Reverent Monks,

The included script reads a directory containing Latin-1 accented characters and displays a correctly sorted list on both Linux and Windows OS, but a few changes are needed:

- Linux : Uncomment 'use utf8::all', save with the default utf-8 encoding and run.

- Windows : Comment out 'use utf8::all' (line 8), save with the default iso-8859-1 or ANSI encoding, chcp 1252 on the command line and run.

To test accented characters, create a subdirectory named 'test' containing several files whose name start with normal uc and lc ascii characters and Latin-1 (Western Europe) accented characters (example: Drives, eval1, Eval2, éval3, Éval4, files, Übermensch, utilities). This is the sorted directory you'll get with ls (Linux) or dir (Windows), or with any graphical file and directory manager.

use utf8::all; # Comment out for Windows use Unicode::Collate; # No argument: current directory; com. line accepts dir. name. my $dir = ($ARGV[0] ? shift : '.'); opendir(my $dh, $dir) or die "\n\tCannot open directory : $!\n"; my @list = grep {!/^[\.]{1,2}$/} readdir $dh; #^ skips '.' and '..' print "$_\n" for @list; print "\tEnd unsorted\n\n"; my $collator = Unicode::Collate->new(level => 1); my @entries = $collator->sort(@list); print "$_\n" for (@entries); print "\tEnd sorted\n\n";

Looking for a simpler way, I added the following snippet, which doesn't work:

[...] use Config; use utf8::all if $Config{osname} eq 'Linux'; # perl adamantly ignores +the condition [...]

Further, perl cannot chcp on a Windows terminal.

My question : Is it possible to write a 'universal script' that would automatically detect the OS and act accordingly?

-0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0 - 0

Thank you so much, Monks!

With $^O, I get 'MSWin32' on my Windows 8 (64 bits) machine. So, just add the two following lines to my script:

use if $^O ne 'MSWin32', 'utf8::all'; system('chcp 1252') if $^O eq 'MSWin32';

Kludgy, but it does the job on both Linux and Windows, and possibly on Unix and Mac, too. If the user still gets funny characters, he has to manually save his file with the correct encoding, iso-8859-1 or ANSI for Windows or UTF-8 for most other OSes (untested). This is apparently the only thing that Perl cannot do for the unwary user!

'Confundant omnes , ultimus alienat'

Replies are listed 'Best First'.
Re: Cross-platform accented character file names sorting
by Athanasius (Archbishop) on May 19, 2015 at 16:34 UTC

    Hello perlimpinpin, and welcome to the Monastery!

    use Config; use utf8::all if $Config{osname} eq 'Linux'; # perl adamantly ignores +the condition

    You can’t use Perl’s usual if here, you must use the pragma if, which has a different syntax:

    use Config; use if $Config{osname} eq 'Linux', 'utf8::all';

    Note also that the Config module isn’t needed for this test. Either $ENV{OS} or $^O will give you the information you need.

    Update 1: Struck out $ENV{OS}, thanks to afoken, below.

    Update 2: Changed utf8::all to 'utf8::all' (i.e., added quotes) to avoid a syntax error when the condition fails.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Note also that the Config module isn’t needed for this test. Either $ENV{OS} or $^O will give you the information you need.

      $ENV{OS} is specific for Windows (I think NT and following, never saw it on 3.x or 9x), it is not set on Linux. A malicious user could set it to any nonsense value. Try to avoid that.

      $^O is reliable, returning MSWin32 for each and every Windows version (except perhaps Windows CE / Mobile), including 64 bit variants. After checking $^O eq 'MSWin32', the type (NT-based or DOS-based) and the exact version can be checked using Win32::IsWinNT(), Win32::IsWin95(), Win32::GetOSVersion(), Win32::GetOSName(), and Win32::GetOSDisplayName(). All of these functions are documented in Win32, those marked with [CORE] are built into the perl executable and are available without loading the Win32 module.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Cross-platform accented character file names sorting
by Anonymous Monk on May 20, 2015 at 00:58 UTC
Re: Cross-platform accented character file names sorting
by Anonymous Monk on May 20, 2015 at 07:00 UTC
    Try Encode::Locale to decode filenames from bytes to internal unicode representation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1127147]
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-18 06:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found