in reply to UTF-8 and readdir, etc.
The comments here are appallingly ignorant, and sadly the perl implementation on Windows follows suit. NTFS filenames are encoded in UTF-16, and perl *could* handle that correctly, but it doesn't. So you have to use something like Win32::Unicode, or if you're using cygwin (as I am), you have to use decode_utf8 when reading directories. Note that File::Find doesn't know this, so that's not usable on Windows.
Re^2: UTF-8 and readdir, etc.
by Your Mother (Archbishop) on Sep 12, 2019 at 22:52 UTC
|
Ignorance, and terrible design, abounds–
NTFS stores file names in Unicode. –The Horse’s Mouth :(
–and–
NTFS allows any sequence of 16-bit values for name encoding (file names, stream names, index names, etc.) except 0x0000. This means (case insensitive) UTF-16 code units are supported, but the file system does not check whether a sequence is valid UTF-16 (it allows any sequence of short values, not restricted to those in the Unicode standard) –Wackypardia
| [reply] |
|