Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: UTF-8 and readdir, etc.

by Anonymous Monk
on Sep 12, 2019 at 22:24 UTC ( [id://11106098]=note: print w/replies, xml ) Need Help??


in reply to UTF-8 and readdir, etc.

The comments here are appallingly ignorant, and sadly the perl implementation on Windows follows suit. NTFS filenames are encoded in UTF-16, and perl *could* handle that correctly, but it doesn't. So you have to use something like Win32::Unicode, or if you're using cygwin (as I am), you have to use decode_utf8 when reading directories. Note that File::Find doesn't know this, so that's not usable on Windows.

Replies are listed 'Best First'.
Re^2: UTF-8 and readdir, etc.
by Your Mother (Archbishop) on Sep 12, 2019 at 22:52 UTC

    Ignorance, and terrible design, abounds–

    NTFS stores file names in Unicode.The Horse’s Mouth :(

    –and–

    NTFS allows any sequence of 16-bit values for name encoding (file names, stream names, index names, etc.) except 0x0000. This means (case insensitive) UTF-16 code units are supported, but the file system does not check whether a sequence is valid UTF-16 (it allows any sequence of short values, not restricted to those in the Unicode standard) –Wackypardia

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11106098]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 16:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found