Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: UTF-8 lexicographic string sort

by rdiez (Acolyte)
on Apr 23, 2020 at 12:02 UTC ( [id://11115941]=note: print w/replies, xml ) Need Help??


in reply to Re: UTF-8 lexicographic string sort
in thread UTF-8 lexicographic string sort

I am not sure that your code is correct.

Let us look at this snippet your suggested:

  decode('UTF-8', $File::Find::name)

Let us look at the documentation for Encode::decode:

This function returns the string that results from decoding the scalar value OCTETS, assumed to be a sequence of octets in ENCODING, into Perl's internal form.

Your code is therefore assuming that $File::Find::name is in UTF-8, but this may not be correct.

Replies are listed 'Best First'.
Re^3: UTF-8 lexicographic string sort
by Corion (Patriarch) on Apr 23, 2020 at 12:08 UTC

    Finding the correct encoding for the filesystem is up to you.

    I'm not aware of any good way to find/know the encoding of the names in a filesystem, so you will have to apply your own knowledge there.

      I do not understand why the encoding used by the filesystem is relevant. I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

        I think you misunderstand. Even though the API of File::Find returns "strings", these will not compare properly with other strings because you haven't told what encoding the strings from the filesystem are in.

        This encoding is not known to Perl, and it is also not always known to the OS.

        Your assumption of Perl encapsulating the filesystem API string types is wrong.

        I assume that File::Find, and ultimately the Perl runtime, will abstract all that knowledge and give me a Perl string that my script can safely work with. It does not matter if the filesystem underneath is Windows NTFS and its encoding is UCS-2. The Perl string with the filename will certainly never have UCS-2 encoding.

        See the links in my node here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115941]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-24 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found